Skip to main content
Log in

Do Log Factors Matter? On Optimal Wavelet Approximation and the Foundations of Compressed Sensing

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

A signature result in compressed sensing is that Gaussian random sampling achieves stable and robust recovery of sparse vectors under optimal conditions on the number of measurements. However, in the context of image reconstruction, it has been extensively documented that sampling strategies based on Fourier measurements outperform this purportedly optimal approach. Motivated by this seeming paradox, we investigate the problem of optimal sampling for compressed sensing. Rigorously combining the theories of wavelet approximation and infinite-dimensional compressed sensing, our analysis leads to new error bounds in terms of the total number of measurements m for the approximation of piecewise \(\alpha \)-Hölder functions. Our theoretical findings suggest that Fourier sampling outperforms random Gaussian sampling when the Hölder exponent \(\alpha \) is large enough. Moreover, we establish a provably optimal sampling strategy. This work is an important first step towards the resolution of the claimed paradox and provides a clear theoretical justification for the practical success of compressed sensing techniques in imaging problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. This also provides some explanation as to why attempts to modify devices such as MR scanners to produce Gaussian-like measurements (see, for example, [42, 58, 59]) have not been widely adopted.

  2. In particular, by ‘Fourier measurements’ we mean samples of the continuous Fourier transform of f, not its discrete Fourier transform. Not only is this more convenient for the analysis, it is also more relevant in practice, since modalities such as MRI are based on the continuous Fourier transform [10].

  3. There are some theoretical results for QCBP in the presence of unknown noise [19, 32, 37, 70]. However, except in specific cases, these involve additional factors (so-called quotients) which are difficult to estimate.

  4. The entries of the cross-Gramian matrix U (8.1) used in this sampling strategy are computed by applying the inverse discrete wavelet and Fourier transforms to the first N elements of the canonical basis of the augmented space \({\mathbb {R}}^{16N}\). Then, only the N entries corresponding to the frequencies of interest are kept. This augmentation makes the computation of U more accurate.

  5. In order to avoid discretization effects related to the wavelet crime, the vector d of wavelet coefficients is computed by sampling the function f on a uniform grid of 16N points, applying the discrete wavelet transform and then keeping the first N of entries of the resulting vector.

References

  1. B. Adcock, V. Antun, and A. C. Hansen. Uniform recovery in infinite-dimensional compressed sensing and applications to structured binary sampling. arXiv:1905.00126, 2019.

  2. B. Adcock, A. Bao, and S. Brugiapaglia. Correcting for unknown errors in sparse high-dimensional function approximation. Numer. Math. (to appear), 2019.

  3. B. Adcock, C. Boyer, and S. Brugiapaglia. On the gap between local recovery guarantees in compressed sensing and oracle estimates. arXiv:1806.03789, 2018.

  4. B. Adcock and A. C. Hansen. Generalized sampling and infinite-dimensional compressed sensing. Found. Comput. Math., 16(5):1263–1323, 2016.

    Article  MathSciNet  MATH  Google Scholar 

  5. B. Adcock and A. C. Hansen. Compressive Imaging: Structure, Sampling, Learning. Cambridge University Press (in press), 2021.

  6. B. Adcock, A. C. Hansen, G. Kutyniok, and J. Ma. Linear stable sampling rate: Optimality of 2D wavelet reconstructions from Fourier measurements. SIAM J. Math. Anal., 47(2):1196–1233, 2015.

    Article  MathSciNet  MATH  Google Scholar 

  7. B. Adcock, A. C. Hansen, and C. Poon. On optimal wavelet reconstructions from Fourier samples: linearity and universality of the stable sampling rate. Appl. Comput. Harmon. Anal., 36(3):387–415, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  8. B. Adcock, A. C. Hansen, C. Poon, and B. Roman. Breaking the coherence barrier: A new theory for compressed sensing. Forum Math. Sigma, 5, 2017.

  9. B. Adcock, A. C. Hansen, and B. Roman. The quest for optimal sampling: computationally efficient, structure-exploiting measurements for compressed sensing. In Compressed Sensing and Its Applications. Springer, 2015.

  10. B. Adcock, A. C. Hansen, B. Roman, and G. Teschke. Generalized sampling: stable reconstructions, inverse problems and compressed sensing over the continuum. Advances in Imaging and Electron Physics, 182:187–279, 2014.

    Article  Google Scholar 

  11. G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. Kittle. Compressive coded aperture spectral imaging: An introduction. IEEE Signal Process. Mag., 31(1):105–115, 2014.

    Article  Google Scholar 

  12. R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hedge. Model-based compressive sensing. IEEE Trans. Inform. Theory, 56(4):1982–2001, 2010.

    Article  MathSciNet  MATH  Google Scholar 

  13. A. Bastounis, B. Adcock, and A. C. Hansen. From global to local: Getting more from compressed sensing. SIAM News, 2017.

  14. A. Bastounis and A. C. Hansen. On the absence of the RIP in real-world applications of compressed sensing and the RIP in levels. SIAM J. Imaging Sci., 2017 (to appear).

  15. A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4):791–806, 2011.

    Article  MathSciNet  MATH  Google Scholar 

  16. V. Boominathan, J. K. Adams, M. S. Asif, B. W. Avants, J. T. Robinson, R. G. Baraniuk, A. C. Sankaranarayanan, and A. Veeraraghavan. Lensless imaging: A computational renaissance. IEEE Signal Process. Mag., 33(5):23–35, 2016.

    Article  Google Scholar 

  17. C. Boyer, J. Bigot, and P. Weiss. Compressed sensing with structured sparsity and structured acquisition. Appl. Comput. Harm. Anal., 46(2):312–350, 2017.

    Article  MathSciNet  MATH  Google Scholar 

  18. D. J. Brady, K. Choi, D. L. Marks, R. Horisaki, and S. Lim. Compressive holography. Opt. Express, 17:13040–13049, 2009.

    Article  Google Scholar 

  19. S. Brugiapaglia and B. Adcock. Robustness to unknown error in sparse regularization. IEEE Trans. Inform. Theory, 64(10):6638–6661, 2018.

    Article  MathSciNet  MATH  Google Scholar 

  20. T. Cai and A. Zhang. Sparse representation of a polytope and recovery of sparse signals and low-rank matrices. IEEE Trans. Inform. Theory, 60(1):122–132, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  21. E. Candès. The restricted isometry property and its implications for compressed sensing. C. R. Math. Acad. Sci. Paris, 346(9-10):589–592, 2008.

    Article  MathSciNet  MATH  Google Scholar 

  22. E. J. Candès and D. L. Donoho. New tight frames of curvelets and optimal representations of objects with piecewise c2 singularities. Comm. Pure Appl. Math, 57(2):219–266, 2004.

    Article  MathSciNet  MATH  Google Scholar 

  23. E. J. Candès and Y. Plan. A probabilistic and RIPless theory of compressed sensing. IEEE Trans. Inform. Theory, 57(11):7235–7254, 2011.

    Article  MathSciNet  MATH  Google Scholar 

  24. E. J. Candès and J. Romberg. Sparsity and incoherence in compressive sampling. Inverse Problems, 23(3):969–985, 2007.

    Article  MathSciNet  MATH  Google Scholar 

  25. E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489–509, 2006.

    Article  MathSciNet  MATH  Google Scholar 

  26. A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision, 40(1):120–145, 2011.

    Article  MathSciNet  MATH  Google Scholar 

  27. N. Chauffert, P. Ciuciu, J. Kahn, and P. Weiss. Variable density sampling with continuous trajectories. SIAM J. Imaging Sci., 7(4):1962–1992, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  28. A. Chkifa, N. Dexter, H. Tran, and C. G. Webster. Polynomial approximation via compressed sensing of high-dimensional functions on lower sets. Math. Comp., 87:1415–1450, 2018.

    Article  MathSciNet  MATH  Google Scholar 

  29. A. Cohen, W. Dahmen, and R. A. DeVore. Compressed sensing and best \(k\)-term approximation. J. Amer. Math. Soc., 22(1):211–231, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  30. I. Daubechies. Ten Lectures on Wavelets, volume 61 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992.

  31. M. A. Davenport, M. F. Duarte, Y. C. Eldar, and G. Kutyniok. Introduction to compressed sensing. In Compressed Sensing: Theory and Applications. Cambridge University Press, 2011.

  32. R. DeVore, G. Petrova, and P. Wojtaszczyk. Instance-optimality in probability with an \(\ell _1\)-minimization decoder. Appl. Comput. Harmon. Anal., 27(3):275–288, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  33. R. A. DeVore. Nonlinear approximation. Acta Numer., 7:51–150, 1998.

    Article  MATH  Google Scholar 

  34. M. F. Duarte, M. A. Davenport, D. Takhar, J. Laska, K. Kelly, and R. G. Baraniuk. Single-pixel imaging via compressive sampling. IEEE Signal Process. Mag., 25(2):83–91, 2008.

    Article  Google Scholar 

  35. M. F. Duarte and Y. C. Eldar. Structured compressed sensing: from theory to applications. IEEE Trans. Signal Process., 59(9):4053–4085, 2011.

    Article  MathSciNet  MATH  Google Scholar 

  36. J. A. Fessler. Optimization methods for MR image reconstruction. arXiv:1903.03510, 2019.

  37. S. Foucart. Stability and robustness of \(\ell _1\)-minimizations with Weibull matrices and redundant dictionaries. Linear Algebra Appl., 441:4–21, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  38. S. Foucart and H. Rauhut. A Mathematical Introduction to Compressive Sensing. Birkhauser, 2013.

    Book  MATH  Google Scholar 

  39. M. Gataric and C. Poon. A practical guide to the recovery of wavelet coefficients from Fourier measurements. SIAM J. Sci. Comput., 38(2):A1075–A1099, 2016.

    Article  MathSciNet  MATH  Google Scholar 

  40. M. E. Gehm and D. J. Brady. Comopressive sensing in the EO/IR. Applied Optics, 54(8):C14–C22, 2015.

    Article  Google Scholar 

  41. C. G. Graff and E. Y. Sidky. Compressive sensing in medical imaging. Appl. Opt., 54:C23–C44, 2015.

    Article  Google Scholar 

  42. J. Haldar, D. Hernando, and Z. Liang. Compressed-sensing MRI with random encoding. IEEE Trans. Med. Imaging, 30(4):893–903, 2011.

    Article  Google Scholar 

  43. D. J. Holland, M. J. Bostock, L. F. Gladden, and D. Nietlispach. Fast multidimensional NMR spectroscopy using compressed sensing. Angew. Chem. Int. Ed., 50(29), 2011.

  44. G. Huang, H. Jiang, K. Matthews, and P. Wilford. Lensless imaging by compressive sensing. In 20th IEEE International Conference on Image Processing, 2013.

  45. http://www3.gehealthcare.in/~/media/images/product/product-categories/magnetic-resonance-imaging/optima-mr450w-1-5t-with-gem-suite/1-clinical/optima_mr450w_with_gem_suite_brainpropt2_clinical.jpg.

  46. O. Katz, Y. Bromberg, and Y. Silberberg. Compressive ghost imaging. Appl. Phys. Lett., 95:131110, 2009.

    Article  Google Scholar 

  47. K. Kazimierczuk and V. Y. Orekhov. Accelerated NMR spectroscopy by using compressed sensing. Angew. Chem. Int. Ed., 50(24), 2011.

  48. F. Krahmer and R. Ward. Stable and robust recovery from variable density frequency samples. IEEE Trans. Image Proc., 23(2):612–622, 2013.

    Article  Google Scholar 

  49. G. Kutyniok and W.-Q. Lim. Optimal compressive imaging of Fourier data. SIAM J. Imaging Sci., 11(1):507–546, 2018.

    Article  MathSciNet  MATH  Google Scholar 

  50. C. Li and B. Adcock. Compressed sensing with local structure: uniform recovery guarantees for the sparsity in levels class. Appl. Comput. Harmon. Anal., 46(3):453–477, 2019.

    Article  MathSciNet  MATH  Google Scholar 

  51. M. Lustig, D. L. Donoho, and J. M. Pauly. Sparse MRI: the application of compressed sensing for rapid MRI imaging. Magn. Reson. Med., 58(6):1182–1195, 2007.

    Article  Google Scholar 

  52. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressed Sensing MRI. IEEE Signal Process. Mag., 25(2):72–82, March 2008.

    Article  Google Scholar 

  53. S. G. Mallat. A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press, 3 edition, 2009.

    MATH  Google Scholar 

  54. R. F. Marcia, R. M. Willett, and Z. T. Harmany. Compressive optical imaging: Architectures and algorithms. In G. Cristobal, P. Schelken, and H. Thienpont, editors, Optical and Digital Image Processing: Fundamentals and Applications, pages 485–505. Wiley New York, 2011.

    Chapter  Google Scholar 

  55. K. Marwah, G. Wetzstein, Y. Bando, and R. Raskar. Compressive light field photography using overcomplete dictionaries and optimized projections. ACM Trans. Graph., 32(46), 2013.

  56. C. Poon. On the role of total variation in compressed sensing. SIAM J. Imaging Sci., 8(1):682–720, 2015.

    Article  MathSciNet  MATH  Google Scholar 

  57. C. Poon. Structure dependent sampling in compressed sensing: theoretical guarantees for tight frames. Appl. Comput. Harm. Anal., 42(3):402–451, 2017.

    Article  MathSciNet  MATH  Google Scholar 

  58. G. Puy, J. P. Marques, R. Gruetter, J. Thiran, D. Van De Ville, P. Vandergheynst, and Y. Wiaux. Spread spectrum Magnetic Resonance Imaging. IEEE Trans. Med. Imaging, 31(3):586–598, 2012.

    Article  Google Scholar 

  59. X. Qu, Y. Chen, X. Zhuang, Z. Yan, D. Guo, and Z. Chen. Spread spectrum compressed sensing MRI using chirp radio frequency pulses. arXiv:1301.5451, 2013.

  60. B. Roman, A. Bastounis, B. Adcock, and A. C. Hansen. On fundamentals of models and sampling in compressed sensing. Preprint, 2015.

  61. B. Roman, A. C. Hansen, and B. Adcock. On asymptotic structure in compressed sensing. arXiv:1406.4178, 2014.

  62. J. Romberg. Imaging via compressive sampling. IEEE Signal Process. Mag., 25(2):14–20, 2008.

    Article  Google Scholar 

  63. V. Studer, J. Bobin, M. Chahid, H. Moussavi, E. Candès, and M. Dahan. Compressive fluorescence microscopy for biological and hyperspectral imaging. Proc. Natl Acad. Sci. USA, 109(26):1679—1687, 2011.

    Google Scholar 

  64. Y. Traonmilin and R. Gribonval. Stable recovery of low-dimensional cones in Hilbert spaces: One RIP to rule them all. Appl. Comput. Harm. Anal., 45(1):170–205, 2018.

    Article  MathSciNet  MATH  Google Scholar 

  65. Y. Tsaig and D. L. Donoho. Extensions of compressed sensing. Signal Process., 86(3):549–571, 2006.

    Article  MATH  Google Scholar 

  66. E. van den Berg and M. P. Friedlander. SPGL1: A solver for large-scale sparse reconstruction, June 2007. http://www.cs.ubc.ca/labs/scl/spgl1.

  67. E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput., 31(2):890–912, 2008.

    Article  MathSciNet  MATH  Google Scholar 

  68. Z. Wang and G. R. Arce. Variable density compressed image sampling. IEEE Trans. Image Proc., 19(1):264–270, 2010.

    Article  MathSciNet  MATH  Google Scholar 

  69. Y. Wiaux, L. Jacques, G. Puy, A. M. M. Scaife, and P. Vandergheynst. Compressed sensing imaging techniques for radio interferometry. Mon. Not. R. Astron. Soc., 395(3):1733–1742, 2009.

    Article  Google Scholar 

  70. P. Wojtaszczyk. Stability and instance optimality for Gaussian measurements in compressed sensing. Found. Comput. Math., 10(1):1–13, 2010.

    Article  MathSciNet  MATH  Google Scholar 

  71. L. Zhu, W. Zhang, D. Elnatan, and B. Huang. Faster STORM using compressed sensing. Nature Methods, 9:721—723, 2012.

    Article  Google Scholar 

Download references

Acknowledgements

The authors extend their thanks to Vegard Antun (University of Oslo), who performed the experiment in Fig. 1. They also would like to thank Anders C. Hansen, Bradley J. Lucier and Clarice Poon. S.B. acknowledges the support of the PIMS Postdoctoral Training Centre in Stochastics, the Department of Mathematics of Simon Fraser University, NSERC, the Faculty of Arts and Science of Concordia University, and the CRM Applied Math Lab. This work was supported by the PIMS CRG in “High-dimensional Data Analysis” and by NSERC through Grant R611675.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Brugiapaglia.

Additional information

Communicated by Albert Cohen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Fourier Transform and Series

Given \(f \in L^1({\mathbb {R}}) \cap L^2({\mathbb {R}})\), we define the Fourier transform as

$$\begin{aligned} {\hat{f}}(\omega ) = \int ^{\infty }_{-\infty }f(t) {\mathrm {e}}^{-{\mathrm {i}}\omega t} \,{\mathrm {d}}t. \end{aligned}$$

If \(f \in L^2([0,1])\), then we can write f as its Fourier series

$$\begin{aligned} f = \sum _{n \in {\mathbb {Z}}} \langle f, \gamma _n \rangle _{L^2} \gamma _n, \end{aligned}$$

where

$$\begin{aligned} \gamma _n(t) = {\mathrm {e}}^{2 \pi {\mathrm {i}}n t},\quad n \in {\mathbb {Z}}, \end{aligned}$$
(A.1)

is the Fourier basis for \(L^2([0,1])\). If we consider f as a function in \(L^2({\mathbb {R}})\) that is zero outside [0, 1], then \(\langle f, \gamma _n \rangle _{L^2} = {\hat{f}}(2 \pi n)\). For convenience, we also re-index this basis over \({\mathbb {N}}\) as follows:

$$\begin{aligned} \gamma _{2n-1} = {\mathrm {e}}^{-2 \pi {\mathrm {i}}(n-1) t},\qquad \gamma _{2n} = {\mathrm {e}}^{2 \pi {\mathrm {i}}n t},\quad n \in {\mathbb {N}}. \end{aligned}$$
(A.2)

Orthogonal Wavelet Bases of \(L^2([0,1])\)

Let \(\varphi \) and \(\psi \) be the scaling function and mother wavelet, respectively, of the Daubechies’ wavelet with \(p \ge 1\) vanishing moments. Write

$$\begin{aligned} \varphi _{j,k}(x) = 2^{j/2} \varphi (2^j x - k),\ \psi _{j,k}(x) = 2^{j/2} \psi (2^j x - k),\quad j , k \in {\mathbb {Z}}. \end{aligned}$$

Since we work with functions on the interval [0, 1], we need an orthonormal wavelet basis of \(L^2([0,1])\). We construct this via periodization (see (5.1) and [53, Sect. 7.5.1] for more details). Define the coarsest scale

$$\begin{aligned} j_0 = \left\{ \begin{array}{ll} 0 &{}\quad p =1 \\ \lceil \log _2(2p) \rceil &{}\quad p \ge 2 \end{array} \right. . \end{aligned}$$
(B.1)

(In general, one could allow any fixed \(j_0\) greater than or equal to the right-hand side. However, this does not affect any of the results in the paper; hence, we simply specify \(j_0\) exactly.) We recall that Daubechies’ wavelets with p vanishing moments have the smallest possible support, of length \(2p-1\). We assume the scaling function \(\varphi \) and the mother wavelet \(\psi \) to be supported on \([0, 2p-1]\) and \([-p+1,p]\), respectively. Then, the set of functions

$$\begin{aligned} \{ \varphi ^{{\mathrm {per}}}_{j_0,k} : k = 0,\ldots ,2^{j_0}-1 \} \cup \{ \psi ^{{\mathrm {per}}}_{j,k} : k = 0,\ldots ,2^{j-1}, \ j \ge j_0 \}, \end{aligned}$$
(B.2)

is an orthonormal basis of \(L^2([0,1])\), referred to as the periodized Daubechies wavelet basis. We note in passing that

$$\begin{aligned} \psi ^{{\mathrm {per}}}_{j,k} = \psi _{j,k},\quad \varphi ^{{\mathrm {per}}}_{j,k} = \varphi _{j,k},\quad k = p-1,\ldots ,2^j-p, \end{aligned}$$

that is, wavelets that are fully supported in [0, 1] are unchanged, and

$$\begin{aligned} \varphi ^{{\mathrm {per}}}_{j,k}&= \varphi _{j,k} + \varphi _{j,2^j+k},\quad \psi ^{{\mathrm {per}}}_{j,k} = \psi _{j,k} + \psi _{j,2^j+k},\qquad k = 0,\ldots ,p-2, \\ \varphi ^{{\mathrm {per}}}_{j,k}&= \varphi _{j,k} + \varphi _{j,2^j-p-k},\quad \psi ^{{\mathrm {per}}}_{j,k} = \psi _{j,k} + \psi _{j,2^j-p-k},\qquad k = 2^j-p+1,\ldots ,2^j-1, \end{aligned}$$

where the functions in the right-hand sides are implicitly restricted to [0, 1]. As needed, we order the basis (B.2) in the usual way, rewriting it as \(\{ \phi _n \}_{n \in {\mathbb {N}}}\), where

$$\begin{aligned} \begin{aligned} \phi _{n+1}&= \varphi ^{{\mathrm {per}}}_{j_0,n}, \quad n = 0,\ldots ,2^{j_0}-1 \\ \phi _{2^{j} + n + 1}&= \psi ^{{\mathrm {per}}}_{j,n}, \quad n = 0,\ldots ,2^j-1,\ j \ge j_0. \end{aligned} \end{aligned}$$
(B.3)

Proof of Theorem 9.1

The technical tools we need to prove this theorem were introduced in [1], where a similar result was proven for the weighted quadratically constrained basis pursuit decoder.

We require several concepts from [1]. First, we introduce several additional pieces of notation. Given sparsity levels \({\mathbf {M}} = (M_1,\ldots ,M_r)\) and local sparsities \({\mathbf {s}} = (s_1,\ldots ,s_r)\), let

$$\begin{aligned} D_{{\mathbf {s}},{\mathbf {M}}} = \left\{ {\varDelta }\subseteq \{1,\ldots ,N\} : | {\varDelta }\cap \{ M_{k-1}+1,\ldots ,M_k \} | \le s_k \right\} , \end{aligned}$$

be the set of all possible supports of an \(({\mathbf {s}},{\mathbf {M}})\)-sparse vector. Given positive weights \(w = (w_i)^{M}_{i=1} \in {\mathbb {C}}^{M}\) and a set \({\varDelta }\subseteq \{1,\ldots ,M\}\), we define its weighted cardinality as follows:

$$\begin{aligned} |{\varDelta }|_w = \sum _{i \in {\varDelta }} (w_i)^2. \end{aligned}$$

The conventional tool in compressed sensing for establishing recovery guarantees is the so-called Restricted Isometry Property (RIP). In our case, we require an generalized version of the RIP. This takes into account the sparsity in levels structure, and the fact that the measurement matrix A satisfies (9.3), rather than the more standard condition \({\mathbb {E}}(A^*A) = I\).

Definition 9

(G-adjusted RIP in Levels) Let \({\mathbf {M}} = (M_1,\ldots ,M_r)\) be sparsity levels, \({\mathbf {s}} = (s_1,\ldots ,s_r)\) be local sparsities and \(G \in {\mathbb {C}}^{M \times M}\) be invertible, where \(M = M_r\) is the sparsity bandwidth. The \(({\mathbf {s}},{\mathbf {M}})\)th G-adjusted Restricted Isometry Constant in Levels (G-RICL) \(\delta _{{\mathbf {s}},{\mathbf {M}},G}\) of a matrix \(A \in {\mathbb {C}}^{m \times M}\) is the smallest \(\delta \ge 0\) such that

$$\begin{aligned} (1-\delta ) {\left\| G x\right\| }^2_{\ell ^2} \le {\left\| A x\right\| }^2_{\ell ^2} \le (1+\delta ) {\left\| G x\right\| }^2_{\ell ^2},\quad \forall x \in {\varSigma }_{{\mathbf {s}},{\mathbf {M}}}. \end{aligned}$$

If \(0< \delta _{{\mathbf {s}},{\mathbf {M}},G} < 1\), then the matrix is said to have the G-adjusted Restricted Isometry Property in levels (G-RIPL) of order \(({\mathbf {s}},{\mathbf {M}})\).

In our setting, if N, M are such that \(P_N UP_M\) is full rank (in particular, if the balancing property holds), then G will be taken as the unique positive definite square-root of the positive definite matrix \(P_M U^* P_N U P_M\). We write \(G = \sqrt{P_M U^* P_N U P_M}\) in this case.

The following result [1, Thm. 3.6] gives conditions under which the matrix A satisfies the G-RIPL:

Theorem C.1

Let \(0< \delta ,\varepsilon <1\), \(M \ge 2\), \(1 \le {\tilde{r}} \le r \le N\) and \({\mathbf {M}} = (M_1,\ldots ,M_r)\) and \({\mathbf {s}} = (s_1,\ldots ,s_r)\) be sparsity levels and local sparsities, respectively, where \(s = s_1+\cdots +s_r \ge 2\) and \(M_r = M\). Let \({\varOmega }\) be an \(({\mathbf {N}},{\mathbf {m}})\)-multilevel random subsampling pattern with r levels and saturation \({\tilde{r}}\), and \(N = N_r\). Suppose that N, M are such that \(P_N U P_M\) is full rank, where U is as in (8.1) and consider the matrix A given by (8.4). If

$$\begin{aligned} m_k \gtrsim \delta ^{-2} \cdot {\left\| G^{-1}\right\| }^2_{\ell ^2} \cdot \left( \sum ^{r}_{k=1} s_k \mu \left( U^{(k,l)} \right) \right) \cdot L,\qquad k = {\tilde{r}}+1,\ldots ,r, \end{aligned}$$

where

$$\begin{aligned} L = r \cdot \log (m)\cdot \log ^2(s) \cdot \log (N) + \log (\varepsilon ^{-1}), \end{aligned}$$

then, with probability at least \(1-\varepsilon \), A satisfies the G-RIPL of order \(({\mathbf {s}},{\mathbf {M}})\) with constant \(\delta _{{\mathbf {s}},{\mathbf {M}},G} \le \delta \) and G given by \(G = \sqrt{P_M U^* P_N U P_M}\).

In order to establish Theorem 9.1, we next show that the G-RIPL implies stable and robust recovery. To do so, we first introduce the following generalization of the so-called robust Null Space Property (rNSP):

Definition 10

Let \({\mathbf {M}} = (M_1,\ldots ,M_r)\) be sparsity levels, \({\mathbf {s}} = (s_1,\ldots ,s_r)\) be local sparsities and \(w \in {\mathbb {C}}^{M}\) be positive weights, where \(M = M_r\). A matrix \(A \in {\mathbb {C}}^{m \times M}\) has the weighted robust null space property in levels (weighted rNSPL) of order \(({\mathbf {s}},{\mathbf {M}})\) with constants \(0< \rho < 1\) and \(\gamma > 0\) if

$$\begin{aligned} {\left\| P_{{\varDelta }} x\right\| }_{\ell ^2} \le \frac{\rho {\left\| P^{\perp }_{{\varDelta }} x\right\| }_{\ell ^1_w}}{\sqrt{|{\varDelta }|_w}} + \gamma {\left\| A x\right\| }_{\ell ^2}, \end{aligned}$$

for all \(x \in {\mathbb {C}}^M\) and \({\varDelta }\in D_{{\mathbf {s}},{\mathbf {M}}}\).

Suppose the weights \(w = (w_i)^{M}_{i=1}\) are of the form (9.8), i.e. constant on the sparsity levels, and define

$$\begin{aligned} \xi = \xi ({\mathbf {s}},{\mathbf {w}}) = \sum ^{r}_{k=1} (w^{(k)})^2 s_k,\qquad \zeta = \zeta ({\mathbf {s}},{\mathbf {w}}) = \min _{k=1,\ldots ,r} \left\{ (w^{(k)})^2 s_k \right\} . \end{aligned}$$
(C.1)

The following combines Lemmas 5.2 and 5.3 of [1]:

Lemma 6

Suppose that A has the weighted rNSPL of order \(({\mathbf {s}},{\mathbf {M}})\) with constants \(0< \rho < 1\) and \(\gamma > 0\). Let \(x,z \in {\mathbb {C}}^M\). Then,

$$\begin{aligned} {\left\| z - x\right\| }_{\ell ^1_w} \le \frac{1+\rho }{1-\rho } \left( 2 \sigma _{{\mathbf {s}},{\mathbf {M}}}(x)_{\ell ^1_w} + {\left\| z\right\| }_{\ell ^1_w} - {\left\| x\right\| }_{\ell ^1_w} \right) + \frac{2 \gamma }{1-\rho } \sqrt{\xi } {\left\| A (z-x)\right\| }_{\ell ^2}, \end{aligned}$$

and

$$\begin{aligned} {\left\| z - x\right\| }_{\ell ^2}\le & {} \left( \rho + (1+\rho ) (\xi / \zeta )^{1/4} / 2 \right) \frac{{\left\| z-x\right\| }_{\ell ^1_w}}{\sqrt{\xi }} \\&+ \left( 1 + (\xi / \zeta )^{1/4} / 2 \right) \gamma {\left\| A(z-x)\right\| }_{\ell ^2}. \end{aligned}$$

The G-RIPL implies the weighted rNSPL (see [1, Thm. 5.5]):

Theorem C.2

Let \(A \in {\mathbb {C}}^{m \times M}\) and \(G \in {\mathbb {C}}^{M \times M}\) be invertible. Let \({\mathbf {M}} = (M_1,\ldots ,M_r)\) and \({\mathbf {s}} = (s_1,\ldots ,s_r)\) be sparsity levels and local sparsities, respectively, and \({\mathbf {w}}\) be positive weights of the form (9.8). Let \(0< \rho < 1\), and suppose that A has the G-RIPL of order \(({\mathbf {t}},{\mathbf {M}})\) and constant 1/2, where \({\mathbf {t}} = (t_1,\ldots ,t_r)\) satisfies

$$\begin{aligned} t_l = \min \left\{ 2 \left\lceil 3 \frac{\kappa (G)^2}{\rho ^2} \frac{\xi ({\mathbf {s}},{\mathbf {w}})}{(w^{(l)})^2} \right\rceil , M_l - M_{l-1} \right\} ,\quad l =1,\ldots ,r, \end{aligned}$$
(C.2)

and \(\kappa (G)=\Vert G\Vert _{\ell ^2}\Vert G^{-1}\Vert _{\ell ^2}\) is the condition number of G with respect to the \(\ell ^2\)-norm. Then, there exists \(0 < \gamma \le \sqrt{2} {\left\| G^{-1}\right\| }_{\ell ^2}\) such that A has the weighted rNSPL of order \(({\mathbf {s}},{\mathbf {M}})\) with constants \(\rho \) and \(\gamma \).

Finally, we are now ready to prove Theorem 9.1:

Proof of Theorem 9.1

Recall that \(G^2 = P_M U^* P_N U P_M\). Hence, G is invertible since U has the balancing property (9.4), and moreover, we have

$$\begin{aligned} {\left\| G^{-1}\right\| }_{\ell ^2} \le 1/ \sqrt{\theta }. \end{aligned}$$
(C.3)

We also have \({\left\| G\right\| }_{\ell ^2} \le 1\) since U is unitary, and therefore, \(\kappa (G) \le 1/\sqrt{\theta }\).

Let \(t_l\) be given by (C.2) with \(\rho = 1/2\). Recalling (9.9) and (C.1), observe that

$$\begin{aligned} t_l \le 48 \frac{c_2^2 r s_l}{c_1^2 \theta }. \end{aligned}$$

Therefore,

$$\begin{aligned} t = t_1+\ldots +t_r \le 48 \frac{c_2^2 r }{c_1^2 \theta } s, \end{aligned}$$

and

$$\begin{aligned}&{\Vert G^{-1}\Vert }^2_{\ell ^2} \cdot \left( \sum ^{r}_{k = 1} t_l \mu \left( U^{(k,l)} \right) \right) \cdot \left( r \cdot \log (m) \cdot \log ^2(t) \cdot \log (M) + \log (\varepsilon ^{-1}) \right) \\&\quad \lesssim \theta ^{-2} \frac{c_2^2 r}{c_1^2} \cdot \left( \sum ^{r}_{k = 1} s_l \mu \left( U^{(k,l)} \right) \right) \\&\qquad \cdot \left( r \cdot \log (m) \cdot \log ^2(c_2^2 r s / (c_1^2 \theta )) \cdot \log (M) + \log (\varepsilon ^{-1}) \right) . \end{aligned}$$

Hence, condition (C) and Theorem C.1 imply that the matrix A has the G-RIPL of order \(({\mathbf {t}},{\mathbf {M}})\) with constant \(\delta _{{\mathbf {t}},{\mathbf {M}},G} \le 1/2\). It now follows from Theorem C.2 that A has the weighted rNSPL of order \(({\mathbf {s}},{\mathbf {M}})\) with constants \(\rho = 1/2\) and \(\gamma \le \sqrt{2} {\Vert G^{-1}\Vert }_{\ell ^2} \le \sqrt{2/\theta }\).

To complete the proof, we use Lemma 6 with \(z = {\hat{x}}\). Using this, (C.3) and the bounds

$$\begin{aligned} c_1^2 r s \le \xi \le c_2^2 r s,\qquad c_1^2 s \le \zeta \le c_2^2 s. \end{aligned}$$
(C.4)

we see that

$$\begin{aligned} {\left\| {\hat{x}} - x\right\| }_{\ell ^2} \le&\left( 1/2 + 3/4 (c_2^2 r / c_1^2)^{1/4} \right) \frac{{\left\| {\hat{x}} - x\right\| }_{\ell ^1_w}}{c_1\sqrt{r s}} \\&+ (1 + (c_2^2 r/c_1^2)^{1/4} /2) \sqrt{2/\theta } {\left\| A ({\hat{x}} - x)\right\| }_{\ell ^2} \\ \le&\left( 1+ (c_2^2 r / c_1^2)^{1/4} \right) \left[ \frac{{\left\| {\hat{x}} - x\right\| }_{\ell ^1_w}}{c_1\sqrt{r s}} + \sqrt{2/\theta } {\left\| A ({\hat{x}} - x)\right\| }_{\ell ^2} \right] \\ \le&\left( 1+ (c_2^2 r / c_1^2)^{1/4} \right) \left[ \frac{3}{c_1\sqrt{r s}} \left( 2 \sigma _{{\mathbf {s}},{\mathbf {M}}}(x)_{\ell ^1_w} + {\left\| {\hat{x}}\right\| }_{\ell ^1_w} - {\left\| x\right\| }_{\ell ^1_w} \right) \right. \\&\left. + 5 \sqrt{2/\theta } (c_2/c_1) {\left\| A ({\hat{x}} - x) \right\| }_{\ell ^2} \right] . \end{aligned}$$

We now use the fact that \({\hat{x}}\) is a minimizer, and therefore,

$$\begin{aligned} {\left\| {\hat{x}}\right\| }_{\ell ^1_w} - {\left\| x\right\| }_{\ell ^1_w} \le \frac{1}{\lambda } \left( {\left\| A x - y\right\| }_{\ell ^2} - {\left\| A {\hat{x}} - y\right\| }_{\ell ^2} \right) , \end{aligned}$$

Writing \({\left\| A ({\hat{x}} - x) \right\| }_{\ell ^2} \le {\left\| A {\hat{x}} - y\right\| }_{\ell ^2} + {\left\| A x - y\right\| }_{\ell ^2}\) and combining with the previous inequality now yields

$$\begin{aligned} {\left\| {\hat{x}} - x\right\| }_{\ell ^2} \le&\left( 1+ (c_2^2 r / c_1^2)^{1/4} \right) \\&\left[ \frac{6 \sigma _{{\mathbf {s}},{\mathbf {M}}}(x)_{\ell ^1_w} }{c_1 \sqrt{ r s}} + \left( 5 \sqrt{2/\theta } (c_2/c_1) +\frac{3}{c_1\sqrt{r s} \lambda } \right) {\left\| A x - y\right\| }_{\ell ^2} \right. \\&\quad \left. + \left( 5 \sqrt{2/\theta } (c_2/c_1) - \frac{3}{c_1\sqrt{r s} \lambda } \right) {\left\| A {\hat{x}} -y \right\| }_{\ell ^2} \right] \end{aligned}$$

The result now follows from the bound (D) on \(\lambda \) and the fact that \(e = y - A x\). \(\square \)

Proofs of Lemmas 23 and 4

Proof of Lemma 2

We first observe that \(\theta = \inf _{|\omega | \le \pi } | {\hat{\varphi }}(\omega ) |^2 > 0\) for the Daubechies wavelet basis [7, Remark 7.1]. Now let \(x = (x_n)^{N}_{n=1} \in {\mathbb {C}}^N\) with \({\left\| x\right\| }_{\ell ^2} = 1\) and write \(g = \sum ^{N}_{n=1} x_n \phi _n\) for the corresponding finite wavelet expansion. Observe that \( {\left\| g\right\| }^2_{L^2([0,1])} = {\left\| x\right\| }^2_{\ell ^2} = 1\). Let \(V^{{\mathrm {per}}}_{j} = {\mathrm {span}}\{ \varphi _{j,n} : n = 0,\ldots ,2^j-1 \}\) and \(W^{{\mathrm {per}}}_{j} = {\mathrm {span}}\{ \psi _{j,n} : n = 0,\ldots ,2^j-1 \}\). Then,

$$\begin{aligned} g \in V^{{\mathrm {per}}}_{j_0} \oplus W^{{\mathrm {per}}}_{j_0} \oplus \cdots \oplus W^{{\mathrm {per}}}_{j_0+r-1} = V^{{\mathrm {per}}}_{j_0+r}, \end{aligned}$$

and conversely every \(g \in V^{{\mathrm {per}}}_{j_0+r}\) with \( {\left\| g\right\| }^2_{L^2([0,1])} = 1\) is equivalent to a vector of coefficients \(x \in {\mathbb {C}}^M\) with \({\left\| x\right\| }_{\ell ^2} = 1\). Note also that

$$\begin{aligned} {\left\| P_N U P_N x\right\| }^2_{2} = \sum ^{N}_{n=1} | \langle g, \gamma _n \rangle |^2. \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned} \inf _{\begin{array}{c} x \in {\mathbb {C}}^N \\ {\left\| x\right\| }_{\ell ^2} = 1 \end{array}}&{\left\| P_N U P_N x\right\| }^2_{\ell ^2} = \inf \left\{ \sum ^{N}_{n=1} | \langle g, \gamma _n \rangle |^2 : g \in V^{{\mathrm {per}}}_{j_0+r},\ {\left\| g\right\| }_{L^2([0,1])} = 1 \right\} . \end{aligned} \end{aligned}$$
(D.1)

Fix a \(g \in V^{{\mathrm {per}}}_{j_0+r}\) with \({\left\| g\right\| }_{L^2([0,1])} = 1\) and write

$$\begin{aligned} g = \sum ^{N-1}_{k = 0} z_k \varphi ^{{\mathrm {per}}}_{r+j_0,k}, \end{aligned}$$

where \( {\left\| z\right\| }_{\ell ^2} = {\left\| g\right\| }_{L^2(0,1)} = 1\) and \(z = (z_k)^{N-1}_{k=0}\). Then, for any integer n,

$$\begin{aligned} {\hat{g}}(2 \pi n)&= N^{-1/2} {\hat{\varphi }} (2 \pi n / N) \sum ^{N-1}_{k=0} z_k {\mathrm {e}}^{-2 \pi {\mathrm {i}}n k /N} = N^{-1/2} {\hat{\varphi }} (2 \pi n / N) G( n/N), \end{aligned}$$

where \(G(x) = \sum ^{N-1}_{k=0} z_k {\mathrm {e}}^{-2 \pi {\mathrm {i}}k x}\) is a 1-periodic function. In the first equality, we have used that

$$\begin{aligned} \widehat{\varphi ^{{\mathrm {per}}}_{j,k}}(\omega ) = \widehat{\varphi _{j,k}}(\omega ) = 2^{-j/2}{\hat{\varphi }}(\omega /2^j) {\mathrm {e}}^{-{\mathrm {i}}\omega k/2^j}, \quad \forall j, k \in {\mathbb {Z}}, \; \forall \omega \in 2\pi {\mathbb {Z}}, \end{aligned}$$
(D.2)

and that \(N = 2^{j_0+r}\). Hence,

$$\begin{aligned} \sum ^{N}_{n=1} | \langle g, \gamma _n \rangle |^2= & {} \sum ^{N/2}_{n=-N/2+1} | {\hat{g}}(2 \pi n) |^2 \nonumber \\= & {} N^{-1} \sum ^{N/2}_{n=-N/2+1} \left| {\hat{\varphi }}(2 \pi n/N) \right| ^2 \left| G(n / N) \right| ^2. \end{aligned}$$
(D.3)

Using the fact that G is 1-periodic, we deduce that

$$\begin{aligned} \sum ^{N}_{n=1} | \langle g, \gamma _n \rangle |^2 \ge \inf _{|\omega | \le \pi } \left| {\hat{\varphi }}(\omega ) \right| ^2 N^{-1} \sum ^{N-1}_{n=0} |G(n/N)|^2. \end{aligned}$$

Now, since G is a trigonometric polynomial, it follows that

$$\begin{aligned} N^{-1} \sum ^{N-1}_{n=0} |G(n/N)|^2 = {\left\| G\right\| }^2_{L^2([0,1])} = {\left\| z\right\| }^2_{\ell ^2} = {\left\| g\right\| }^2_{L^2([0,1])} = 1. \end{aligned}$$

Therefore,

$$\begin{aligned} \sum ^{N}_{n=1} | \langle g, \gamma _n \rangle |^2 \ge \inf _{|\omega | \le \pi } \left| {\hat{\varphi }}(\omega ) \right| ^2 = \theta > 0. \end{aligned}$$

Since g was arbitrary, we deduce that

$$\begin{aligned} \inf _{\begin{array}{c} x \in {\mathbb {C}}^N \\ {\left\| x\right\| }_{\ell ^2} = 1 \end{array}} {\left\| P_N U P_N x\right\| }^2_{\ell ^2} \ge \theta . \end{aligned}$$

To complete the proof, we first recall that \(P_N - P_N U^* P_N U P_N\) is positive semidefinite (since U is unitary), and therefore,

$$\begin{aligned} {\left\| P_N - P_N U^* P_N U P_N\right\| }_{\ell ^2}&= \sup _{\begin{array}{c} x \in {\mathbb {C}}^N \\ {\left\| x\right\| }_{\ell ^2} = 1 \end{array}} \langle (P_N - P_N U^* P_N U P_N )x, x \rangle \\&= 1 - \inf _{\begin{array}{c} x \in {\mathbb {C}}^N \\ {\left\| x\right\| }_{\ell ^2} = 1 \end{array}} {\left\| P_N U P_N x\right\| }^2_{\ell ^2} \\&\le 1 - \theta , \end{aligned}$$

as required. \(\square \)

For Lemma 3, we first require the following:

Lemma 7

The (kl)th local coherence satisfies

$$\begin{aligned} \mu \left( U^{(k,l)} \right) \le 2^{1+ k-l} \max _{\omega \in B_k } \left| {\widehat{\psi }}(2\pi \omega /2^{l+j_0-1}) \right| ^2,\quad l > 1, \end{aligned}$$

and

$$\begin{aligned} \mu \left( U^{(k,1)} \right) \le 2^{k} \max \left\{ \max _{\omega \in B_k} \left| {\widehat{\psi }}(2\pi \omega /2^{j_0}) \right| ^2 , \max _{\omega \in B_k} \left| {\widehat{\varphi }}(2\pi \omega /2^{j_0}) \right| ^2 \right\} . \end{aligned}$$

Proof

By definition,

$$\begin{aligned} \mu \left( U^{(k,l)} \right) =|B_k| \max _{\omega \in B_k} \max _{0 \le n < 2^{j_0+l-1}} \left| \widehat{\psi ^{{\mathrm {per}}}_{j_0+l-1,n}}(2\pi \omega ) \right| ^2,\quad l > 1, \end{aligned}$$

and

$$\begin{aligned} \mu \left( U^{(k,1)} \right) =|B_k| \max \left\{ \max _{\omega \in B_k} \max _{0 \le n< 2^l} \left| \widehat{\psi ^{{\mathrm {per}}}_{j_0,n}}(2\pi \omega ) \right| ^2 , \max _{\omega \in B_k} \max _{0 \le n < 2^l} \left| \widehat{\varphi ^{{\mathrm {per}}}_{j_0,n}}(2\pi \omega ) \right| ^2 \right\} . \end{aligned}$$

Recall that \(|B_k| \le 2^{j_0+k}\). Moreover, recall relation (D.2) and note that an analogous formula holds for \(\widehat{\psi _{j,k}^{{\mathrm {per}}}}\). Since \(B_k\) is a set of integers, the result now follows immediately. \(\square \)

Proof of Lemma 3

By the previous lemma, it suffices to estimate the Fourier transform of the wavelet and scaling function in different regions of frequency space. First, suppose that \(k \ge l \ge 1\). Then, \(| \omega | \ge 2^{j_0+k-1}\) for \(\omega \in B_k\), and the smoothness conditions (2.1) give

$$\begin{aligned} | {\hat{\psi }}(2\pi \omega /2^{l+j_0-1}) | \lesssim 2^{-(q+1)(k-l)},\qquad | {\hat{\varphi }}(2\pi \omega /2^{l+j_0-1}) | \lesssim 2^{-(q+1)(k-l)}. \end{aligned}$$

The first estimate now follows from Lemma 7.

For the second estimate, we need to bound \(| {\hat{\psi }}(2 \pi \omega ) |\) for \(| \omega | \ll 1\). For this, we recall that \({\hat{\psi }}(z) = (-{\mathrm {i}}z)^{p} \chi _p(z)\) for some bounded function \(\chi _p(z)\) [53, Thm. 7.4]. Hence,

$$\begin{aligned} | {\hat{\psi }}(2 \pi \omega ) |^2 \le c_p |\omega |^{2p}. \end{aligned}$$

If \(l > k \ge 1\), then this and the previous lemma give

$$\begin{aligned} \mu \left( U^{(k,l)} \right) \le 2^{1+k-l} \max _{|\omega | \le 2^{j_0+k}} | {\hat{\psi }}(2\pi \omega /2^{l+j_0-1}) |^2 \lesssim c_p 2^{k-l} 2^{2p(k-l)}. \end{aligned}$$

The result now follows immediately. \(\square \)

Proof of Lemma 4

By direct calculation

$$\begin{aligned} {\left\| P_{{\varOmega }} D U P^{\perp }_M d\right\| }^2_{\ell ^2} \le \sum ^{r}_{k=1} \frac{N_k - N_{k-1}}{m_k} m_k \max _{N_{k-1} < i \le N_k} | \langle u_i, P^{\perp }_M d \rangle |^2, \end{aligned}$$

where \(u_i = U^* e_i\) is the ith row of U. Observe that

$$\begin{aligned} | \langle u_i, P^{\perp }_M d \rangle |^2 = \left| \sum _{j> M} u_{ij} d_j \right| ^2 \le \max _{j > M} |u_{ij} |^2 {\left\| P^{\perp }_M d\right\| }^2_{\ell ^1}. \end{aligned}$$

Hence,

$$\begin{aligned} {\left\| P_{{\varOmega }} D U P^{\perp }_M d\right\| }^2_{\ell ^2}&\le \sum ^{r}_{k=1} (N_k - N_{k-1}) \max _{\begin{array}{c} N_{k-1} < i \le N_k \\ j > M \end{array}} |u_{ij} |^2 {\left\| P^{\perp }_M d\right\| }^2_{\ell ^1} \\&= \sum ^{r}_{k=1} \mu \left( P^{N_{k-1}}_{N_k} U P^{\perp }_M \right) {\left\| P^{\perp }_M d\right\| }^2_{\ell ^1}, \end{aligned}$$

which gives

$$\begin{aligned} {\Vert P_{{\varOmega }} D U P^{\perp }_M d\Vert }_{\ell ^2} \le \left( \sum ^{r}_{k=1} \mu \left( P^{N_{k-1}}_{N_k} U P^{\perp }_M \right) \right) ^{1/2} {\Vert P^{\perp }_M d\Vert }_{\ell ^1}. \end{aligned}$$

Since \(M = M_r\), we now apply Lemma 3 to get

$$\begin{aligned} \mu \left( P^{N_{k-1}}_{N_k} U P^{\perp }_M \right) = \sup _{l > r} \mu \left( U^{(k,l)} \right) \le c_p 2^{-(2p+1)(r-k)}. \end{aligned}$$

Hence,

$$\begin{aligned} \sum ^{r}_{k=1} \mu \left( P^{N_{k-1}}_{N_k} U P^{\perp }_K \right) \le c_p \sum ^{r}_{k=1} 2^{-(2p+1)(r-k)} \le c_p . \end{aligned}$$

The result now follows. \(\square \)

Numerical Experiments

In this section, we discuss some technical details behind Fig. 2. Moreover, we provide further numerical evidence to support the comparison shown therein. We consider the function

$$\begin{aligned} f_K(x) = \sum _{i = 1}^{K} (-1)^{\text {mod}(i, 5)} \; x^{\text {mod}(i,3)} \; \text {sign}(x-(1.3)^{i-9}), \quad 0 \le x \le 1 . \end{aligned}$$
(E.1)

This function has K discontinuities in (0, 1) and its plot is shown in Fig. 4. We approximate \(f_K\) for \(K = 1,10,20\) using the four different encoder–decoder pairs described below.

Fig. 4
figure 4

The function \(f_K\) defined as in (E.1) for \(K=1,10,20\)

Fig. 5
figure 5

Comparison of different encoder–decoder pairs for the approximation of the function \(f_K\) defined in (E.1), using Haar (left) and db4 wavelets (right) and for \(K = 1\) (top), \(K=10\) (centre) and \(K = 20\) (bottom)

(Fourier\(\ell ^1\)): This strategy corresponds to the setting of Theorems 3.3 and 4.6 and to the error bound (1.5), up to a few minor technical modifications. The Fourier sampling strategy is as follows. We divide the frequency space into dyadic bands and consider a sampling scheme analogous to the \(({\mathbf {N}},{\mathbf {m}})\)-multilevel random subsampling strategy with saturation \({\tilde{r}}\) described in Definition 5, where symmetry of the samples is enforced in every frequency band. In particular, \({\mathbf {N}}\) is defined as in (8.5), the saturation level is \({\tilde{r}} = \text {round}(\log _2(m/2))\), and the local numbers of measurements are

$$ m_k = 2\left\lfloor \frac{m}{4(r-r_0)}\right\rfloor , \quad k = {\tilde{r}} +1, \ldots , r-1, $$

where, in the last frequency band, we let \(m_r = m-(m_1 +\cdots + m_{r-1})\) in order to reach a total budget of m measurements exactly. The samples are then computed as follows. The first \({\tilde{r}}\) dyadic bands are saturated. For every \(k > {\tilde{r}}\), we pick \(m_k/2\) samples uniformly at random from the k-th frequency semiband (corresponding to positive frequencies) and we choose frequencies in the opposite semiband (corresponding to negative frequencies) in a symmetric way. The wavelet coefficients of f are recovered via basis pursuit (1.2). Numerically, (1.2) is solved using the MATLAB package SPGL1 (see [66, 67]) with parameters \(\texttt {bpTol}\) = 1e-6, \(\texttt {optTol}\)= 1e-6, and a maximum of 10000 iterations.Footnote 4

(Fourier\(\ell ^1_w\)): This strategy is almost identical to (Fourier, \(\ell ^1\)). The only difference is that wavelet coefficients are recovered via weighted (as opposed to unweighted) basis pursuit, i.e. by solving (1.2) where the \(\ell ^1\)-norm is replaced with the weighted \(\ell ^1_w\)-norm. The weights w are set according to the recipe described in Sect. 10.2 with \(\delta = 10^{-5}\). Weighted basis pursuit is numerically solved using the MATLAB package SPGL1 as in the previous case.

(Gauss\(\ell ^1\)): This is the standard encoder–decoder pair of compressed sensing with random Gaussian measurements, corresponding to the setting of Theorems 3.1 and 4.3 and to the error bound (1.3). The vector \(d \in {\mathbb {R}}^N\) of wavelet coefficients of f is explicitly computed and then encoded as \(y = A d\), where \(A \in {\mathbb {R}}^{m \times N}\) has i.i.d. entries drawn from the normal distribution with mean zero and variance 1/m. The function is recovered by means of the basis pursuit decoder (1.2), numerically solved via SPGL1 as in the previous cases.Footnote 5

(Optimal\(\ell ^1\)): This strategy corresponds to the setting of Theorems 3.1 and 4.3 and to the optimal error bound (1.3). As in the previous case, we compute the vector \(d\in {\mathbb {R}}^N\) of wavelet coefficients of f. Then, the first \(m_1 = \text {round}(m/2)\) entries of d are directly encoded into \(y^{(1)} \in {\mathbb {R}}^{m_1}\). The remaining \(m_2 = m - m_1\) measurements are computed as \(y^{(2)} = A (d_n)_{n = m_1+1}^{N}\), where \(A \in {\mathbb {R}}^{m_2 \times (N-m_1)}\) has i.i.d. entries drawn from the normal distribution with mean zero and variance \(1/m_2\). We consider the basis pursuit decoder (1.2), numerically solved using SPGL1 as in the previous cases.

(Gauss, Tree): This encoder–decoder pair corresponds to the model-based compressive sensing strategy proposed in [12]. The encoder identical to (Gauss, \(\ell ^1\)), and the decoder explicitly promotes tree-structured sparsity in the recovered function using the model-based CoSaMP algorithm [12]. This strategy requires tuning a parameter c, which links m to the desired tree-sparsity level s as \(m = c s\). In the numerical tests, we consider \(c = 3,4,5,6,7\). We employ the Model-based Compressive Sensing Toolbox v1.1 provided by the authors of [12]. The maximum number of iterations for the outer loop of CoSaMP is set to 100.

These four encoder–decoder pairs are compared with \(N = 2^{15} = 32768\) and values of m ranging from \(2^3 = 8\) to \(2^{11} = 2048\). We employ Haar and db4 wavelets, having \(p= 1\) and \(p = 2\) vanishing moments, respectively. In this setting, the weights used in (Fourier, \(\ell ^1_w\)) are constant for all \(m \le 256\). The relative \(L^2\) error is computed using the wavelet coefficients of f, approximated as in the strategies (Gauss, \(\ell ^1\)), (Optimal, \(\ell ^1\)) and (Gauss, Tree).

In Fig. 5 the encoder–decoder pairs (Fourier, \(\ell ^1\)) and (Fourier, \(\ell ^1_w\)) have almost identical performances and they consistently outperform all the other strategies, with only a few exceptions. Moreover, this behaviour is independent of the number of discontinuities K. It is remarkable that (Fourier, \(\ell ^1\)) and (Fourier, \(\ell ^1_w\)) are able to numerically outperform even the theoretically optimal pair (Optimal, \(\ell ^1\)). Although our theory prescribes the use of weighted square-root LASSO decoder in the Fourier case, the numerics show that employing (weighted or unweighted) basis pursuit (1.2) is enough to numerically outperform the other strategies.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adcock, B., Brugiapaglia, S. & King–Roskamp, M. Do Log Factors Matter? On Optimal Wavelet Approximation and the Foundations of Compressed Sensing. Found Comput Math 22, 99–159 (2022). https://doi.org/10.1007/s10208-021-09501-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-021-09501-3

Keywords

Mathematics Subject Classification

Navigation