Skip to main content

On the Global-Local Dichotomy in Sparsity Modeling

  • Chapter
  • First Online:
Compressed Sensing and its Applications

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

Abstract

The traditional sparse modeling approach, when applied to inverse problems with large data such as images, essentially assumes a sparse model for small overlapping data patches and processes these patches as if they were independent from each other. While producing state-of-the-art results, this methodology is suboptimal, as it does not attempt to model the entire global signal in any meaningful way—a nontrivial task by itself.

In this paper we propose a way to bridge this theoretical gap by constructing a global model from the bottom-up. Given local sparsity assumptions in a dictionary, we show that the global signal representation must satisfy a constrained underdetermined system of linear equations, which forces the patches to agree on the overlaps. Furthermore, we show that the corresponding global pursuit can be solved via local operations. We investigate conditions for unique and stable recovery and provide numerical evidence corroborating the theory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Notice that while R i extracts the i-th patch from the signal x, the operator \(\tilde {R_{i}}\) extracts the representation α i of R i x from Γ.

  2. 2.

    Notice that α i might be a minimal representation but not a unique one with minimal sparsity. For discussion of uniqueness, see Subsection 2.3.

  3. 3.

    In general \(\min \left \{ s:\;\mu _{1}^{*}\left (s-1\right )\geqslant 1\right \} \neq \max \left \{ s:\;\mu _{1}^{*}\left (s\right )<1\right \} \) because the function \(\mu _{1}^{*}\) need not be monotonic.

References

  1. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/. Software available from tensorflow.org

  2. R. Aceska, J.L. Bouchot, S. Li, Local sparsity and recovery of fusion frames structured signals. preprint (2015). http://www.mathc.rwth-aachen.de/~bouchot/files/pubs/FusionCSfinal.pdf

  3. M. Aharon, M. Elad, Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM J. Imag. Sci. 1(3), 228–247 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. U. Ayaz, S. Dirksen, H. Rauhut, Uniform recovery of fusion frame structured sparse signals. Appl. Comput. Harmon. Anal. 41(2), 341–361 (2016). https://doi.org/10.1016/j.acha.2016.03.006. http://www.sciencedirect.com/science/article/pii/S1063520316000294

  5. S. Basu, R. Pollack, M.F. Roy, Algorithms in Real Algebraic Geometry. Algorithms and Computation in Mathematics, 2nd edn., vol. 10 (Springer, Berlin, 2006)

    Google Scholar 

  6. T. Blumensath, M. Davies, Sparse and shift-invariant representations of music. IEEE Trans. Audio Speech Lang. Process. 14(1), 50–57 (2006). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1561263

    Article  Google Scholar 

  7. T. Blumensath, M.E. Davies, Sampling theorems for signals from the union of finite-dimensional linear subspaces. IEEE Trans. Inf. Theory 55(4), 1872–1882 (2009). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4802322

    Article  MathSciNet  MATH  Google Scholar 

  8. P. Boufounos, G. Kutyniok, H. Rauhut, Sparse recovery from combined fusion frame measurements. IEEE Trans. Inf. Theory 57(6), 3864–3876 (2011). https://doi.org/https://doi.org/10.1109/TIT.2011.2143890

    Article  MathSciNet  MATH  Google Scholar 

  9. S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011). http://dx.doi.org/10.1561/2200000016

    Article  MATH  Google Scholar 

  10. H. Bristow, A. Eriksson, S. Lucey, Fast convolutional sparse coding. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 391–398

    Google Scholar 

  11. A.M. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009). http://epubs.siam.org/doi/abs/10.1137/060657704

    Article  MathSciNet  MATH  Google Scholar 

  12. E.J. Candes, Modern statistical estimation via oracle inequalities. Acta Numer. 15, 257–325 (2006). http://journals.cambridge.org/abstract_S0962492906230010

    Article  MathSciNet  MATH  Google Scholar 

  13. S. Chen, S.A. Billings, W. Luo, Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control. 50(5), 1873–1896 (1989)

    Article  MATH  Google Scholar 

  14. W. Dong, L. Zhang, G. Shi, X. Li, Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  15. D.L. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. 100(5), 2197–2202 (2003). doi: https://doi.org/10.1073/pnas.0437847100. http://www.pnas.org/content/100/5/2197

  16. C. Ekanadham, D. Tranchina, E.P. Simoncelli, A unified framework and method for automatic neural spike identification. J. Neurosci. Methods 222, 47–55 (2014). doi: 10.1016/j.jneumeth.2013.10.001. http://www.sciencedirect.com/science/article/pii/S0165027013003415

  17. M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing (Springer, New York, 2010)

    Book  MATH  Google Scholar 

  18. M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)

    Article  MathSciNet  Google Scholar 

  19. Y.C. Eldar, M. Mishali, Block sparsity and sampling over a union of subspaces, in 2009 16th International Conference on Digital Signal Processing (IEEE, New York, 2009), pp. 1–8. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5201211

    Google Scholar 

  20. Y.C. Eldar, M. Mishali, Robust recovery of signals from a structured union of subspaces. IEEE Trans. Inf. Theory 55(11), 5302–5316 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  21. Finite Frames - Theory and Applications. http://www.springer.com/birkhauser/mathematics/book/978-0-8176-8372-6

  22. S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sensing (Springer, New York, 2013). http://link.springer.com/content/pdf/10.1007/978-0-8176-4948-7.pdf

    Book  MATH  Google Scholar 

  23. D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  24. R. Glowinski, On alternating direction methods of multipliers: a historical perspective, in Modeling, Simulation and Optimization for Science and Technology (Springer, Dordrecht, 2014), pp. 59–82

    MATH  Google Scholar 

  25. R. Grosse, R. Raina, H. Kwong, A.Y. Ng, Shift-invariance sparse coding for audio classification (2012). arXiv preprint arXiv: 1206.5241

    Google Scholar 

  26. R. Grosse, R. Raina, H. Kwong, A.Y. Ng, Shift-invariance sparse coding for audio classification. arXiv: 1206.5241 [cs, stat] (2012). http://arxiv.org/abs/1206.5241. arXiv: 1206.5241

  27. S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, L. Zhang, Convolutional sparse coding for image super-resolution, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1823–1831

    Google Scholar 

  28. F. Heide, W. Heidrich, G. Wetzstein, Fast and flexible convolutional sparse coding, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, New York, 2015), pp. 5135–5143

    Google Scholar 

  29. J. Huang, T. Zhang, D. Metaxas, Learning with structured sparsity. J. Mach. Learn. Res. 12, 3371–3412 (2011)

    MathSciNet  MATH  Google Scholar 

  30. J. Huang, T. Zhang, et al., The benefit of group sparsity. Ann. Stat. 38(4), 1978–2004 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  31. K. Kavukcuoglu, P. Sermanet, Y.l. Boureau, K. Gregor, M. Mathieu, Y.L. Cun, Learning convolutional feature hierarchies for visual recognition, in Advances in Neural Information Processing Systems, ed. by J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta, vol. 23 (Curran Associates, Red Hook, 2010), pp. 1090–1098. http://papers.nips.cc/paper/4133-learning-convolutional-feature-hierarchies-for-visual-recognition.pdf

  32. A. Kyrillidis, L. Baldassarre, M.E. Halabi, Q. Tran-Dinh, V. Cevher, Structured sparsity: discrete and convex approaches, in Compressed Sensing and Its Applications. Applied and Numerical Harmonic Analysis, ed. by H. Boche, R. Calderbank, G. Kutyniok, J. Vybíral (Springer, Cham, 2015), pp. 341–387. http://link.springer.com/chapter/10.1007/978-3-319-16042-9_12. https://doi.org/10.1007/978-3-319-16042-9_12

  33. P.L. Lions, B. Mercier, Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  34. M.A. Little, N.S. Jones, Generalized methods and solvers for noise removal from piecewise constant signals. II. New methods. Proc. R. Soc. Lond. A: Math. Phys. Eng. Sci. rspa20100674 (2011). doi: https://doi.org/https://doi.org/10.1098/rspa.2010.0674. http://rspa.royalsocietypublishing.org/content/early/2011/06/07/rspa.2010.0674

  35. Y.M. Lu, M.N. Do, A theory for sampling signals from a union of subspaces. IEEE Trans. Signal Process. 56, 2334–2345 (2007)

    Article  MathSciNet  Google Scholar 

  36. J. Mairal, G. Sapiro, M. Elad, Learning multiscale sparse representations for image and video restoration. Multiscale Model. Simul. 7(1), 214–241 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  37. J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, Non-local sparse models for image restoration. in 2009 IEEE 12th International Conference on Computer Vision (IEEE, New York, 2009), pp. 2272–2279

    Google Scholar 

  38. J. Mairal, F. Bach, J. Ponce, Sparse modeling for image and vision processing. Found. Trends Comput. Graph. Vis. 8(2–3), 85–283 (2014). https://doi.org/10.1561/0600000058. http://www.nowpublishers.com/article/Details/CGV-058

  39. Maplesoft, a division of Waterloo Maple Inc. http://www.maplesoft.com

  40. V. Papyan, M. Elad, Multi-scale patch-based image restoration. IEEE Trans. Image Process. 25(1), 249–261 (2016). https://doi.org/https://doi.org/10.1109/TIP.2015.2499698

    Article  MathSciNet  Google Scholar 

  41. V. Papyan, Y. Romano, M. Elad, Convolutional neural networks analyzed via convolutional sparse coding. J. Mach. Learn. Res. 18(83), 1–52 (2017)

    MathSciNet  Google Scholar 

  42. V. Papyan, J. Sulam, M. Elad, Working locally thinking globally: theoretical guarantees for convolutional sparse coding. IEEE Trans. Signal Process. 65(21), 5687–5701 (2017)

    Article  MathSciNet  Google Scholar 

  43. Y.C. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, in Asilomar Conference on Signals, Systems and Computers (IEEE, New York, 1993), pp. 40–44

    Google Scholar 

  44. R. Quiroga, Spike sorting. Scholarpedia 2(12), 3583 (2007). https://doi.org/https://doi.org/10.4249/scholarpedia.3583

  45. Y. Romano, M. Elad, Boosting of image denoising algorithms. SIAM J. Imag. Sci. 8(2), 1187–1219 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  46. Y. Romano, M. Elad, Patch-disagreement as away to improve K-SVD denoising, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, New York, 2015), pp. 1280–1284

    Google Scholar 

  47. Y. Romano, M. Protter, M. Elad, Single image interpolation via adaptive nonlocal sparsity-based modeling. IEEE Trans. Image Process. 23(7), 3085–3098 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  48. L.I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms. Physica D 60(1), 259–268 (1992). http://www.sciencedirect.com/science/article/pii/016727899290242F

    Article  MathSciNet  MATH  Google Scholar 

  49. C. Rusu, B. Dumitrescu, S. Tsaftaris, Explicit shift-invariant dictionary learning. IEEE Signal Process. Lett. 21, 6–9 (2014). http://www.schur.pub.ro/Idei2011/Articole/SPL_2014_shifts.pdf

    Article  Google Scholar 

  50. E. Smith, M.S. Lewicki, Efficient coding of time-relative structure using spikes. Neural Comput. 17(1), 19–45 (2005). http://dl.acm.org/citation.cfm?id=1119614

    Article  MATH  Google Scholar 

  51. A.M. Snijders, N. Nowak, R. Segraves, S. Blackwood, N. Brown, J. Conroy, G. Hamilton, A.K. Hindle, B. Huey, K. Kimura, S. Law, K. Myambo, J. Palmer, B. Ylstra, J.P. Yue, J.W. Gray, A.N. Jain, D. Pinkel, D.G. Albertson, Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet. 29(3), 263–264 (2001). https://doi.org/10.1038/ng754. https://www.nature.com/ng/journal/v29/n3/full/ng754.html

  52. J. Sulam, M. Elad, Expected patch log likelihood with a sparse prior, in International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (Springer, New York, 2015), pp. 99–111

    Google Scholar 

  53. J. Sulam, B. Ophir, M. Elad, Image denoising through multi-scale learnt dictionaries, in 2014 IEEE International Conference on Image Processing (ICIP) (IEEE, New York, 2014), pp. 808–812

    Book  Google Scholar 

  54. J.J. Thiagarajan, K.N. Ramamurthy, A. Spanias, Shift-invariant sparse representation of images using learned dictionaries, in IEEE Workshop on Machine Learning for Signal Processing, 2008, MLSP 2008 (2008), pp. 145–150 https://doi.org/https://doi.org/10.1109/MLSP.2008.4685470

  55. J.A. Tropp, A.C. Gilbert, M.J. Strauss, Algorithms for simultaneous sparse approximation. Part i: greedy pursuit. Signal Process. 86(3), 572–588 (2006)

    MATH  Google Scholar 

  56. J. Yang, J. Wright, T.S. Huang, Y. Ma, Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  57. G. Yu, G. Sapiro, S. Mallat, Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity. IEEE Trans. Image Process. 21(5), 2481–2499 (2012). https://doi.org/https://doi.org/10.1109/TIP.2011.2176743

    Article  MathSciNet  MATH  Google Scholar 

  58. M.D. Zeiler, D. Krishnan, G.W. Taylor, R. Fergus, Deconvolutional networks, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, New York, 2010), pp. 2528–2535. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5539957

    Book  Google Scholar 

  59. M. Zeiler, G. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2018–2025 (2011). doi: https://doi.org/10.1109/ICCV.2011.6126474

  60. D. Zoran, Y. Weiss, From learning models of natural image patches to whole image restoration, in 2011 IEEE International Conference on Computer Vision (ICCV) (IEEE, New York, 2011), pp. 479–486. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6126278

    Google Scholar 

Download references

Acknowledgments

The research leading to these results has received funding from the European Research Council under European Union’s Seventh Framework Programme, ERC Grant agreement no. 320649. The authors would also like to thank Jeremias Sulam, Vardan Papyan, Raja Giryes, and Gitta Kutinyok for inspiring discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Batenkov .

Editor information

Editors and Affiliations

1 Appendix A: Proof of Lemma 1

Proof

Denote \(Z:=\ker M\) and consider the linear map \(A:Z\to \mathbb {R}^{N}\) given by the restriction of the “averaging map” \(D_{G}:\mathbb {R}^{mP}\to \mathbb {R}^{N}\) to Z.

  1. 1.

    Let us see first that \(im\left (A\right )=\mathbb {R}^{N}\). Indeed, for every \(x\in \mathbb {R}^{N}\), consider its patches x i  = R i x. Since D is full rank, there exist \(\left \{ \alpha _{i}\right \} \) for which i  = x i . Then setting \({\varGamma }:=\left (\alpha _{1},\dots ,\alpha _{P}\right )\), we have both D G Γ = x and  = 0 (by construction, see Section 2), i.e., Γ ∈ Z and the claim follows.

  2. 2.

    Define

    $$\displaystyle \begin{aligned} J:=\ker D\times\ker D\times\dots\ker D\subset\mathbb{R}^{mP}. \end{aligned}$$

    We claim that \(J=\ker A\).

    1. a.

      In one direction, let \({\varGamma }=\left (\alpha _{1},\dots ,\alpha _{P}\right )\in \ker A\), i.e.,  = 0 and D G Γ = 0. Immediately we see that \(\frac {1}{n}D\alpha _{i}=0\) for all i, and therefore \(\alpha _{i}\in \ker D\) for all i, thus Γ ∈ J.

    2. b.

      In the other direction, let \({\varGamma }=\left (\alpha _{1},\dots ,\alpha _{P}\right )\in J\), i.e., i  = 0. Then the local representations agree, i.e.,  = 0, thus Γ ∈ Z. Furthermore, D G Γ = 0 and therefore \({\varGamma }\in \ker A\).

  3. 3.

    By the fundamental theorem of linear algebra, we conclude

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \dim Z & = & \dim im\left(A\right)+\dim\ker A=N+\dim J\\ & = & N+\left(m-n\right)N=N\left(m-n+1\right). \end{array} \end{aligned} $$

2 Appendix B: Proof of Lemma 2

We start with an easy observation.

Proposition 9

For any vector \(\rho \in \mathbb {R}^{N}\) , we have

$$\displaystyle \begin{aligned} \|\rho\|{}_{2}^{2}=\frac{1}{n}\sum_{j=1}^{N}\|R_{j}\rho\|{}_{2}^{2}. \end{aligned}$$

Proof

Since

$$\displaystyle \begin{aligned} \|\rho\|{}_{2}^{2}=\sum_{j=1}^{N}\rho_{j}^{2}=\frac{1}{n}\sum_{j=1}^{N}n\rho_{j}^{2}=\frac{1}{n}\sum_{j=1}^{N}\sum_{k=1}^{n}\rho_{j}^{2}, \end{aligned}$$
we can rearrange the sum and get
$$\displaystyle \begin{aligned} \begin{array}{rcl} \|\rho\|{}_{2}^{2} & = & \frac{1}{n}\sum_{k=1}^{n}\sum_{j=1}^{N}\rho_{j}^{2}=\frac{1}{n}\sum_{k=1}^{n}\sum_{j=1}^{N}\rho_{\left(j+k\right)\mod N}^{2}=\frac{1}{n}\sum_{j=1}^{N}\sum_{k=1}^{n}\rho_{\left(j+k\right)\mod N}^{2}\\ & = & \frac{1}{n}\sum_{j=1}^{N}\|R_{j}\rho\|{}_{2}^{2}. \end{array} \end{aligned} $$

Corollary 3

Given MΓ = 0, we have

$$\displaystyle \begin{aligned} \|y-D_{G}{\varGamma}\|{}_{2}^{2}=\frac{1}{n}\sum_{j=1}^{N}\|R_{j}y-D\alpha_{j}\|{}_{2}^{2}. \end{aligned}$$

Proof

Using Proposition 9, we get

$$\displaystyle \begin{aligned} \begin{array}{rcl} \|y-D_{G}{\varGamma}\|{}_{2}^{2} & = & \frac{1}{n}\sum_{j=1}^{N}\|R_{j}y-R_{j}D_{G}{\varGamma}\|{}_{2}^{2}=\frac{1}{n}\sum_{j=1}^{N}\|R_{j}y-\varOmega_{j}{\varGamma}\|{}_{2}^{2}. \end{array} \end{aligned} $$
Now since  = 0, then by definition of M, we have Ω j Γ =  j (see (6)), and this completes the proof. □

Recall Definition 6. Multiplying the corresponding matrices gives

Proposition 10

We have the following equality for all i = 1, …P:

$$\displaystyle \begin{aligned} S_{B}R_{i}=S_{T}R_{i+1}. \end{aligned} $$
(22)

To facilitate the proof, we introduce extension of Definition 6 to multiple shifts as follows.

Definition 16

Let n be fixed. For k = 0, …, n − 1 let

  1. 1.

    \(S_{T}^{\left (k\right )}:=\begin {bmatrix}I_{n-k} & \boldsymbol {0}\end {bmatrix}\) and \(S_{B}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0} & I_{n-k}\end {bmatrix}\) denote the operators extracting the top (resp. bottom) n − k entries from a vector of length n; the matrices have dimension \(\left (n-k\right )\times n\).

  2. 2.

    \(Z_{B}^{\left (k\right )}:=\begin {bmatrix}S_{B}^{\left (k\right )}\\ \boldsymbol {0}_{k\times n} \end {bmatrix}\) and \(Z_{T}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0}_{k\times n}\\ S_{T}^{\left (k\right )} \end {bmatrix}\) .

  3. 3.

    \(W_{B}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0}_{k\times n}\\ S_{B}^{\left (k\right )} \end {bmatrix}\) and \(W_{T}^{\left (k\right )}:=\begin {bmatrix}S_{T}^{\left (k\right )}\\ \boldsymbol {0}_{k\times n} \end {bmatrix}\) .

Note that \(S_{B}=S_{B}^{\left (1\right )}\) and \(S_{T}=S_{T}^{\left (1\right )}\). We have several useful consequences of the above definitions. The proofs are carried out via elementary matrix identities and are left to the reader.

Proposition 11

For any \(n\in \mathbb {N}\) , the following hold:

  1. 1.

    \(Z_{T}^{\left (k\right )}=\left (Z_{T}^{\left (1\right )}\right )^{k}\) and \(Z_{B}^{\left (k\right )}=\left (Z_{B}^{\left (1\right )}\right )^{k}\) for k = 0, …, n − 1;

  2. 2.

    \(W_{T}^{\left (k\right )}W_{T}^{\left (k\right )}=W_{T}^{\left (k\right )}\) and \(W_{B}^{\left (k\right )}W_{B}^{\left (k\right )}=W_{B}^{\left (k\right )}\) for k = 0, …, n − 1;

  3. 3.

    \(W_{T}^{\left (k\right )}W_{B}^{\left (j\right )}=W_{B}^{\left (j\right )}W_{T}^{\left (k\right )}\) for j, k = 0, …, n − 1;

  4. 4.

    \(Z_{B}^{\left (k\right )}=Z_{B}^{\left (k\right )}W_{B}^{\left (k\right )}\) and \(Z_{T}^{\left (k\right )}=Z_{T}^{\left (k\right )}W_{T}^{\left (k\right )}\) for k = 0, …, n − 1;

  5. 5.

    \(W_{B}^{\left (k\right )}=Z_{T}^{\left (1\right )}W_{B}^{\left (k-1\right )}Z_{B}^{\left (1\right )}\) and \(W_{T}^{\left (k\right )}=Z_{B}^{\left (1\right )}W_{T}^{\left (k-1\right )}Z_{T}\) for k = 1, …, n − 1;

  6. 6.

    \(Z_{B}^{\left (k\right )}Z_{T}^{\left (k\right )}=W_{T}^{\left (k\right )}\) and \(Z_{T}^{\left (k\right )}Z_{B}^{\left (k\right )}=W_{B}^{\left (k\right )}\) for k = 0, …, n − 1;

  7. 7.

    \(\left (n-1\right )I_{n\times n}=\sum _{k=1}^{n-1}\left (W_{B}^{\left (k\right )}+W_{T}^{\left (k\right )}\right ).\)

Proposition 12

If the vectors \(u_{1},\dots ,u_{N}\in \mathbb {R}^{n}\) satisfy pairwise

$$\displaystyle \begin{aligned} S_{B}u_{i}=S_{T}u_{i+1}, \end{aligned}$$

then they also satisfy for each k = 0, …, n − 1 the following:

$$\displaystyle \begin{aligned} W_{B}^{\left(k\right)}u_{i} & =Z_{T}^{\left(k\right)}u_{i+k}, \end{aligned} $$
(23)

$$\displaystyle \begin{aligned} Z_{B}^{\left(k\right)}u_{i} & =W_{T}^{\left(k\right)}u_{i+k}. \end{aligned} $$
(24)

Proof

It is easy to see that the condition S B u i  = S T u i+1 directly implies

$$\displaystyle \begin{aligned} Z_{B}^{\left(1\right)}u_{i} & =W_{T}^{\left(1\right)}u_{i+1},\quad W_{B}^{\left(1\right)}u_{i}=Z_{T}^{\left(1\right)}u_{i+1}\quad\forall i. \end{aligned} $$
(25)

Let us first prove (23) by induction on k. The base case k = 1 is precisely (25). Assuming validity for k − 1 and ∀i, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{aligned}{}W_{B}^{\left(k\right)}u_{i}= & Z_{T}^{\left(1\right)}W_{B}^{\left(k-1\right)}Z_{B}^{\left(1\right)}u_{i} & \left(\text{by Proposition 11, item 11}\right)\\ {} = & Z_{T}^{\left(1\right)}W_{B}^{\left(k-1\right)}W_{T}^{\left(1\right)}u_{i+1} & \left(\text{by 11}\right)\\ {} = & Z_{T}^{\left(1\right)}W_{T}^{\left(1\right)}W_{B}^{\left(k-1\right)}u_{i+1} & \left(\text{by Proposition 11, item 11}\right)\\ {} = & Z_{T}^{\left(1\right)}W_{T}^{\left(1\right)}Z_{T}^{\left(k-1\right)}u_{i+k} & \left(\text{by the induction hypothesis}\right)\\ {} = & Z_{T}^{\left(1\right)}Z_{T}^{\left(k-1\right)}u_{i+k} & \left(\text{by Proposition 11, item 11}\right)\\ {} = & Z_{T}^{\left(k\right)}u_{i+k}. & \left(\text{by Proposition 11, item 11}\right) \end{aligned} \end{array} \end{aligned} $$

To prove (24) we proceed as follows:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{aligned}{}Z_{B}^{\left(k\right)}u_{i} & =Z_{B}^{\left(k\right)}W_{B}^{\left(k\right)}u_{i} & \left(\text{by Proposition 11, item 11}\right)\\ & =Z_{B}^{\left(k\right)}Z_{T}^{\left(k\right)}u_{i+k} & \left(\text{by 11 which is already proved}\right)\\ & =W_{T}^{\left(k\right)}u_{i+k}. & \left(\text{by Proposition 11, item 11}\right) \end{aligned} \end{array} \end{aligned} $$
This finishes the proof of Proposition 12. □

Example 1

Fig. 11

Illustration to the proof of Proposition 12. The green pair is equal, as well as the red pair. It follows that the blue elements are equal as well.

Let us now present the proof of Lemma 2.

Proof

We show equivalence in two directions.

  • \(\left (1\right )\Longrightarrow \left (2\right )\): Let  = 0. Define x := D G Γ, and then further denote x i  := R i x. Then on the one hand:

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{aligned}{}x_{i} & =R_{i}D_{G}{\varGamma}\\ & =\varOmega_{i}{\varGamma} & \text{(definition of }\varOmega_{i})\\ & =D\alpha_{i}. & \left(M{\varGamma}=0\right) \end{aligned} \end{array} \end{aligned} $$

    On the other hand, because of (22) we have S B R i x = S T R i+1 x, and by combining the two, we conclude that S B i  = S T i+1.

  • \(\left (2\right )\Longrightarrow \left (1\right )\): In the other direction, suppose that S B i  = S T i+1. Denote u i  :=  i . Now consider the product Ω i Γ where Ω i  = R i D G . One can easily be convinced that in fact

    $$\displaystyle \begin{aligned} \varOmega_{i}{\varGamma}=\frac{1}{n}\left(\sum_{k=1}^{n-1}\left(Z_{B}^{\left(k\right)}u_{i-k}+Z_{T}^{\left(k\right)}u_{i+k}\right)+u_{i}\right). \end{aligned}$$

    Therefore

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \left(\varOmega_{i}-Q_{i}\right){\varGamma} & =&\frac{1}{n}\left(u_{i}+\sum_{k=1}^{n-1}\left(Z_{B}^{\left(k\right)}u_{i-k}+Z_{T}^{\left(k\right)}u_{i+k}\right)\right)-u_{i}\\ & =&\frac{1}{n}\left(\sum_{k=1}^{n-1}\left(W_{T}^{\left(k\right)}u_{i}+W_{B}^{k}u_{i}\right)-\left(n-1\right)u_{i}\right) \qquad \left(\text{by Proposition 11}\right)\\ & =&0.\qquad \left(\text{by Proposition 11, item 11}\right) \end{array} \end{aligned} $$

    Since this holds for all i, we have shown that  = 0.

3 Appendix C: Proof of Theorem 6

Recall that \(M_{A}=\frac {1}{n}\sum _{i}R_{i}^{T}P_{s_{i}}R_{i}\). We first show that M A is a contraction.

Proposition 13

\(\left \Vert M_{A}\right \Vert _{2}\leqslant 1\).

Proof

Closely following a similar proof in [45], divide the index set \(\left \{ 1,\dots ,N\right \} \) into n groups representing non-overlapping patches: for i = 1, …, n let

$$\displaystyle \begin{aligned} K\left(i\right):=\left\{ i,i+n,\dots,i+\left(\left\lfloor \frac{N}{n}\right\rfloor -1\right)n\right\} \;\mod N. \end{aligned}$$
Now
$$\displaystyle \begin{aligned} \left\Vert M_{A}x\right\Vert _{2} & =\frac{1}{n}\left\Vert \sum_{i=1}^{N}R_{i}^{T}P_{s_{i}}R_{i}x\right\Vert _{2}\\ & =\frac{1}{n}\left\Vert \sum_{i=1}^{n}\sum_{j\in K\left(i\right)}R_{j}^{T}P_{s_{j}}R_{j}x\right\Vert _{2}\\ & \leqslant\frac{1}{n}\sum_{i=1}^{n}\left\Vert \sum_{j\in K\left(i\right)}R_{j}^{T}P_{j}R_{j}x\right\Vert _{2}. \end{aligned} $$
By construction, \(R_{j}R_{k}^{T}=\boldsymbol {0}_{n\times n}\) for \(j,k\in K\left (i\right )\) and j ≠ k. Therefore for all i = 1, …, n we have
$$\displaystyle \begin{aligned} \left\Vert \sum_{j\in K\left(i\right)}R_{j}^{T}P_{s_{j}}R_{j}x\right\Vert _{2}^{2} & =\sum_{j\in K\left(i\right)}\left\Vert R_{j}^{T}P_{s_{j}}R_{j}x\right\Vert _{2}^{2}\\ & \leqslant\sum_{j\in K\left(i\right)}\left\Vert R_{j}x\right\Vert _{2}^{2}\leqslant\left\Vert x\right\Vert _{2}^{2}. \end{aligned} $$
Substituting in back into the preceding inequality finally gives
$$\displaystyle \begin{aligned} \left\Vert M_{A}x\right\Vert _{2}\leqslant\frac{1}{n}\sum_{i=1}^{n}\left\Vert x\right\Vert _{2}=\left\Vert x\right\Vert _{2}. \end{aligned}$$

Now let us move on to prove Theorem 6.

Proof

Define

$$\displaystyle \begin{aligned} \hat{P}_{i}:=\left(I-P_{s_{i}}\right)R_{i}. \end{aligned}$$
It is easy to see that
$$\displaystyle \begin{aligned} \sum_{i}\hat{P}_{i}^{T}\hat{P}_{i}=A_{\mathcal{S}}^{T}A_{\mathcal{S}}. \end{aligned}$$
Let the SVD of \(A_{\mathcal {S}}\) be
$$\displaystyle \begin{aligned} A_{\mathcal{S}}=U\varSigma V^{T}. \end{aligned}$$
Now
$$\displaystyle \begin{aligned} \begin{array}{rcl} V\varSigma^{2}V^{T}=A_{\mathcal{S}}^{T}A_{\mathcal{S}}=\sum_{i}\hat{P}_{i}^{T}\hat{P}_{i} & = & \sum_{i}R_{i}^{T}R_{i}-\underbrace{\sum_{i}R_{i}^{T}P_{s_{i}}R_{i}}_{:=T}\\ & = & nI-T. \end{array} \end{aligned} $$
Therefore T = nI − VΣV T, and
$$\displaystyle \begin{aligned} \begin{array}{rcl} M_{A} & = & \frac{1}{n}T=I-\frac{1}{n}V\varSigma^{2}V^{T}=V\left(I-\frac{\varSigma^{2}}{n}\right)V^{T}. \end{array} \end{aligned} $$
This shows that the eigenvalues of M A are \(\tau _{i}=1-\frac {\sigma _{i}^{2}}{n}\) where \(\left \{ \sigma _{i}\right \} \) are the singular values of \(A_{\mathcal {S}}\). Thus we obtain
$$\displaystyle \begin{aligned} \begin{array}{rcl} M_{A}^{k} & = & V{\mathrm{diag}}\left\{ \tau_{i}^{k}\right\} V^{T}. \end{array} \end{aligned} $$
If σ i  = 0 then τ i  = 1, and in any case, by Proposition 13, we have \(\left |\tau _{i}\right |\leqslant 1\). Let the columns of the matrix W consist of the singular vectors of \(A_{\mathcal {S}}\) corresponding to σ i  = 0 (and so \({\mathrm {span}} W=\mathcal {N}\left ({A_{\mathcal {S}}}\right )\)), then
$$\displaystyle \begin{aligned} \lim_{k\to\infty}M_{A}^{k}=WW^{T}. \end{aligned}$$
Thus, as k →, \(M_{A}^{k}\) tends to the orthogonal projector onto \(\mathcal {N}\left ({A_{\mathcal {S}}}\right )\). The convergence is evidently linear, the constant being dependent upon \(\left \{ \tau _{i}\right \} \). □

4 Appendix D: Proof of Theorem 8

Recall that the signal consists of s constant segments of corresponding lengths 1, …, s . We would like to compute the MSE for every pixel within every such segment of length α :=  r . For each patch, the oracle provides the locations of the jump points within the patch.

Fig. 12

The oracle estimator for the pixel O in the segment (black). The orange line is patch number j = 1, …, n, and the relevant pixels are between a j and b j . The signal itself is shown to extend beyond the segment (blue line).

Now, the oracle error for the pixel is

$$\displaystyle \begin{aligned} \hat{x}_{A}^{r,k}-v & =\frac{1}{n}\sum_{j=1}^{n}\frac{1}{b_{j}-a_{j}+1}\sum_{i=a_{j}}^{b_{j}}z_{i}\\ & =\sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}z_{i}, \end{aligned} $$
where the coefficients c i,α,n,k are some positive rational numbers depending only on i, α, n and k. It is easy to check by rearranging the above expression that
$$\displaystyle \begin{aligned} \sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}=1, \end{aligned} $$
(27)
and furthermore, denoting d i  := c i,α,n,k for fixed α, n, k, we also have that
$$\displaystyle \begin{aligned} d_{-k}<d_{-k+1}<\dots d_{0}>d_{1}>\dots d_{\alpha-k-1}. \end{aligned} $$
(28)

Example 2

n = 4, α = 3

  • For k = 1:

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \hat{x}_{A}^{r,k}-v & = & \frac{1}{4}\left(\frac{1}{2}+\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{0}+\frac{1}{4}\left(\frac{1}{2}+\frac{1}{3}+\frac{1}{3}\right)z_{-1}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{1}\\ & = & \underbrace{\frac{7}{24}}_{d_{-1}}z_{-1}+\underbrace{\frac{5}{12}}_{d_{0}}z_{0}+\underbrace{\frac{7}{24}}_{d_{1}}z_{1} \end{array} \end{aligned} $$
  • For k = 2:

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \hat{x}_{A}^{r,k}-v & = & \frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}+1\right)z_{0}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{-1}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}\right)z_{-2}\\ & = & \frac{13}{24}z_{0}+\frac{7}{24}z_{-1}+\frac{1}{6}z_{-2} \end{array} \end{aligned} $$

Now consider the optimization problem

$$\displaystyle \begin{aligned} \min_{c\in\mathbb{R}^{\alpha}}c^{T}c\quad\text{s.t}\;\mathbf{1}^{T}c=1. \end{aligned}$$
It can be easily verified that it has the optimal value \(\frac {1}{\alpha }\), attained at c  = α 1. From this, (27) and (28), it follows that
$$\displaystyle \begin{aligned} \sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}^{2}>\frac{1}{\alpha}. \end{aligned}$$

Since the z i are i.i.d., we have

$$\displaystyle \begin{aligned} \mathbb{E}\left(\hat{x}_{A}^{r,k}-v\right)^{2}=\sigma^{2}\sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}^{2}, \end{aligned}$$
while for the entire nonzero segment of length α =  r
$$\displaystyle \begin{aligned} E_{r}:=\mathbb{E}\left(\sum_{k=0}^{\alpha-1}\left(\hat{x}_{A}^{r,k}-v\right)^{2}\right)=\sum_{k=0}^{\alpha-1}\mathbb{E}\left(\hat{x}_{A}^{r,k}-v\right)^{2}=\sigma^{2}\sum_{k=0}^{\alpha-1}\sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}^{2}. \end{aligned}$$
Defining
$$\displaystyle \begin{aligned} R\left(n,\alpha\right):=\sum_{k=0}^{\alpha-1}\sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}^{2}, \end{aligned}$$
we obtain that \(R\left (n,\alpha \right )>1\) and furthermore
$$\displaystyle \begin{aligned} \mathbb{E}\left\Vert \hat{x}_{A}-x\right\Vert ^{2}=\sum_{r=1}^{s}E_{r}=\sigma^{2}\sum_{r=1}^{s}R\left(n,\ell_{r}\right)>s\sigma^{2}. \end{aligned}$$
This proves item \(\left (1\right )\) of Theorem 8. For showing the explicit formulas for \(R\left (n,\alpha \right )\) in item \(\left (2\right )\), we have used automatic symbolic simplification software MAPLE [39].

By construction (26), it is not difficult to see that if \(n\geqslant \alpha \) then

$$\displaystyle \begin{aligned} R(n,\alpha) & =\frac{1}{n^{2}}\sum_{k=0}^{\alpha-1}\Big(\sum_{j=0}^{k}\big(2H_{\alpha-1}-H_{k}+\frac{n-\alpha+1}{\alpha}-H_{\alpha-1-j}\big)^{2}\\ & +\sum_{j=k+1}^{\alpha-1}\big(2H_{\alpha-1}-H_{\alpha-k-1}+\frac{n-\alpha+1}{\alpha}-H_{j}\big)^{2}\Big), \end{aligned} $$
where \(H_{k}:=\sum _{i=1}^{k}\frac {1}{i}\) is the k-th harmonic number. This simplifies to
$$\displaystyle \begin{aligned} R(n,\alpha)=1+\frac{\alpha(2\alpha H_{\alpha}^{(2)}+2-3\alpha)-1}{n^{2}}, \end{aligned}$$
where \(H_{k}^{\left (2\right )}=\sum _{i=1}^{k}\frac {1}{i^{2}}\) is the k-th harmonic number of the second kind.

On the other hand, for \(n\leqslant \frac {\alpha }{2}\) we have

$$\displaystyle \begin{aligned} R(n,\alpha)=\sum_{k=0}^{n-2}c_{n,k}^{(1)}+\sum_{k=n-1}^{\alpha-n}c_{n,k}^{(2)}+\sum_{k=\alpha-n+1}^{\alpha-1}c_{n,\alpha-1-k}^{(1)}, \end{aligned}$$
where
$$\displaystyle \begin{aligned} c_{n,k}^{(1)}=\frac{1}{n^{2}}\Biggl(\sum_{j=k}^{n-1}\big(H_{n-1}-H_{j}+\frac{k+1}{n}\big)^{2}+\sum_{i=n-k}^{n-1}\big(\frac{n-i}{n}\big)^{2}+\sum_{i=0}^{k-1}\big(H_{n-1}-H_{k}+\frac{k-i}{n}\big)^{2}\Biggr) \end{aligned}$$
and
$$\displaystyle \begin{aligned} c_{n,k}^{(2)}=\frac{1}{n^{2}}\Biggl(\sum_{j=k-n+1}^{k}\biggl(\frac{j-k+n}{n}\biggr)^{2}+\sum_{j=k+1}^{k+n-1}\biggl(\frac{k+n-j}{n}\biggr)^{2}\Biggr). \end{aligned}$$
Automatic symbolic simplification of the above gives
$$\displaystyle \begin{aligned} R\left(n,\alpha\right)=\frac{11}{18}+\frac{2\alpha}{3n}-\frac{5}{18n^{2}}+\frac{\alpha-1}{3n^{3}}. \end{aligned}$$

5 Appendix E: Generative Models for Patch-Sparse Signals

In this section we propose a general framework aimed at generating signals from the patch-sparse model. Our approach is to construct a graph-based model for the dictionary and subsequently use this model to generate dictionaries and signals which turn out to be much richer than those considered in Section 4.

Local Support Dependencies

We start by highlighting the importance of the local connections (recall Lemma 2) between the neighboring patches of the signal and therefore between the corresponding subspaces containing those patches. This in turn allows to characterize \(\varSigma _{\mathcal {M}}\) as the set of all “realizable” paths in a certain dependency graph derived from the dictionary D. This point of view allows to describe the model \(\mathcal {M}\) using only the intrinsic properties of the dictionary, in contrast to Theorem 2.

Proposition 14

Let \(0\neq x\in \mathcal {M}\) and \({\varGamma }amma\in \rho \left (x\right )\) with \({\mathrm {supp}}{\varGamma }=\left (S_{1},\dots ,S_{P}\right )\) . Then for i = 1, …, P

$$\displaystyle \begin{aligned} {\mathrm{rank}}\left[S_{B}D_{S_{i}}\;-S_{T}D_{S_{i+1}}\right]<\left|S_{i}\right|+\left|S_{i+1}\right|\leqslant2s, \end{aligned} $$
(29)

where by convention rank∅ = −∞.

Proof

\(x\in \mathcal {M}\) implies by Lemma 2 that for every i = 1, …P

$$\displaystyle \begin{aligned} \left[S_{B}D\quad-S_{T}D\right]\begin{bmatrix}\alpha_{i}\\ \alpha_{i+1} \end{bmatrix}=0. \end{aligned}$$
But
$$\displaystyle \begin{aligned} \left[S_{B}D\quad-S_{T}D\right]\begin{bmatrix}\alpha_{i}\\ \alpha_{i+1} \end{bmatrix}=\left[S_{B}D_{S_{i}}\quad-S_{T}D_{S_{i+1}}\right]\begin{bmatrix}\alpha_{i}|S_{i}\\ \alpha_{i+1}|S_{i+1} \end{bmatrix}=0, \end{aligned}$$
and therefore the matrix \(\left [S_{B}D_{S_{i}}\quad -S_{T}D_{S_{i+1}}\right ]\) must be rank-deficient. Note in particular that the conclusion still holds if one (or both) of the \(\left \{ s_{i},s_{i+1}\right \} \) is empty. □

The preceding result suggests a way to describe all the supports in \(\varSigma _{\mathcal {M}}\) .

Definition 17

Given a dictionary D, we define an abstract directed graph \(\mathcal {G}_{D,s}=\left (V,E\right )\), with the vertex set

$$\displaystyle \begin{aligned} V=\left\{ \left(i_{1},\dots,i_{k}\right)\subset\left\{ 1,\dots,m\right\} :\quad{\mathrm{rank}} D_{i_{1},\dots,i_{k}}=k<n\right\} , \end{aligned}$$
and the edge set
$$\displaystyle \begin{aligned} E=\biggl\{\left(S_{1},S_{2}\right)\in V\times V:\quad{\mathrm{rank}}\left[S_{B}D_{S_{1}}\quad-S_{T}D_{S_{2}}\right]<\min\left\{ n-1,\left|S_{1}\right|+\left|S_{2}\right|\right\} \biggr\}. \end{aligned}$$
In particular, ∅ ∈ V and \(\left (\emptyset ,\emptyset \right )\in E\) with \({\mathrm {rank}}\left [\emptyset \right ]:=-\infty \).

Remark 4

It might be impossible to compute \(\mathcal {G}_{D,s}\) in practice. However we set this issue aside for now and only explore the theoretical ramifications of its properties.

Definition 18

The set of all directed paths of length P in \(\mathcal {G}_{D,s}\), not including the self-loop \(\underbrace {\left (\emptyset ,\emptyset ,\dots \emptyset \right )}_{\times P}\), is denoted by \(\mathcal {C_{G}}\left (P\right )\).

Definition 19

A path \(\mathcal {S}\in \mathcal {C_{G}}\left (P\right )\) is called realizable if \(\dim \ker A_{\mathcal {S}}>0\). The set of all realizable paths in \(\mathcal {C_{G}}\left (P\right )\) is denoted by \(\mathcal {R_{G}}\left (P\right )\).

Thus we have the following result.

Theorem 9

Suppose \(0\neq x\in \mathcal {M}\) . Then

  1. 1.

    Every representation \({\varGamma }=\left (\alpha _{i}\right )_{i=1}^{P}\in \rho \left (x\right )\) satisfies \({\mathrm {supp}}{\varGamma }\in \mathcal {C_{G}}\left (P\right )\) , and therefore

    $$\displaystyle \begin{aligned} \varSigma_{\mathcal{M}}\subseteq\mathcal{R_{G}}\left(P\right). \end{aligned} $$
    (30)
  2. 2.

    The model \(\mathcal {M}\) can be characterized “intrinsically” by the dictionary as follows:

    $$\displaystyle \begin{aligned} \mathcal{M}=\bigcup_{\mathcal{S}\in\mathcal{R_{G}}\left(P\right)}\ker A_{\mathcal{S}}. \end{aligned} $$
    (31)

Proof

Let \({\mathrm {supp}}{\varGamma }=\left (S_{1},\dots ,S_{P}\right )\) with S i  = suppα i if α i  ≠ 0, and S i  = ∅ if α i  = 0. Then by Proposition 14, we must have that

$$\displaystyle \begin{aligned} {\mathrm{rank}}\left[S_{B}D_{S_{i}}\;-S_{T}D_{S_{i+1}}\right]<\left|S_{i}\right|+\left|S_{i+1}\right|\leqslant2s. \end{aligned}$$
Furthermore, since \({\varGamma }\in \rho \left (x\right )\) we must have that \(D_{S_{i}}\) is full rank for each i = 1, …, P. Thus \(\left (S_{i},S_{i+1}\right )\in \mathcal {G}_{D,s}\), and so \({\mathrm {supp}}{\varGamma }\in \mathcal {R_{G}}\left (P\right )\). Since by assumption \({\mathrm {supp}}{\varGamma }\in \varSigma _{\mathcal {M}}\), this proves (30).

To show (31), notice that if \({\mathrm {supp}}{\varGamma }amma\in \mathcal {R_{G}}\left (P\right )\), then for every \(x\in \ker A_{{\mathrm {supp}}{\varGamma }}\), we have \(R_{i}x=P_{S_{i}}R_{i}x\), i.e., R i x =  i for some α i with suppα i  ⊆ S i . Clearly in this case \(\left |{\mathrm {supp}}\alpha _{i}\right |\leqslant s\) and therefore \(x\in \mathcal {M}\). The other direction of (31) follows immediately from the definitions. □

Definition 20

The dictionary D is called “\(\left (s,P\right )\)-good” if

$$\displaystyle \begin{aligned} \left|\mathcal{R_{G}}\left(P\right)\right|>0. \end{aligned}$$

Theorem 10

The set of “ \(\left (s,P\right )\) -good” dictionaries has measure zero in the space of all n × m matrices.

Proof

Every low-rank condition defines a finite number of algebraic equations on the entries of D (given by the vanishing of all the 2s × 2s minors of \(\begin {bmatrix}S_{B}D_{S_{i}} & S_{T}D_{S_{j}}\end {bmatrix}\)). Since the number of possible graphs is finite (given fixed n, m and s), the resulting solution set is a finite union of semi-algebraic sets of low dimension and hence has measure zero. □

Constructing “Good” Dictionaries

The above considerations suggest that the good dictionaries are hard to come by; here we provide an example of an explicit construction.

We start by defining an abstract graph \(\mathcal {G}\) with some desirable properties, and subsequently look for a nontrivial realization D of the graph, so that in addition \(\mathcal {R}_{\mathcal {G}}\neq \emptyset \).

Fig. 13

A possible dependency graph \(\mathcal {G}\) with m = 10. In this example, \( \left |\mathcal {C_{G}} \left (70 \right ) \right |=37614\).

Every edge in \(\mathcal {G}\) corresponds to a conditions of the form (29) imposed on the entries of D. As discussed in Theorem 10, this in turn translates to a set of algebraic equations. So the natural idea would be to write out the large system of such equations and look for a solution over the field \(\mathbb {R}\) by well-known algorithms in numerical algebraic geometry [5]. However, this approach is highly impractical because these algorithms have (single or double) exponential running time. We consequently propose a simplified, more direct approach to the problem.

In detail, we replace the low-rank conditions (29) with more explicit and restrictive ones below.

Assumptions(*):

For each \(\left (S_{i},S_{j}\right )\in \mathcal {G}\) we have \(\left |S_{i}\right |=\left |S_{j}\right |=k\). We require that \({\mathrm {span}} S_{B}D_{S_{i}}={\mathrm {span}} S_{T}D_{S_{j}}=\varLambda _{i,j}\) with \(\dim \varLambda _{i,j}=k\). Thus there exists a non-singular transfer matrix \(C_{i,j}\in \mathbb {R}^{k\times k}\) such that

$$\displaystyle \begin{aligned} S_{B}D_{S_{i}}=C_{i,j}S_{T}D_{S_{j}}. \end{aligned} $$
(32)

In other words, every column in \(S_{B}D_{S_{i}}\) must be a specific linear combination of the columns in \(S_{T}D_{S_{j}}\). This is much more restrictive than the low-rank condition, but on the other hand, given the matrix C i,j , it defines a set of linear constraints on D. To summarize, the final algorithm is presented in Algorithm 5. In general, nothing guarantees that for a particular choice of \(\mathcal {G}\) and the transfer matrices, there is a nontrivial solution D; however, in practice we do find such solutions. For example, taking the graph from Figure 13 on page 162 and augmenting it with the matrices C i,j (scalars in this case), we obtain a solution over \(\mathbb {R}^{6}\) which is shown in Figure 14 on page 166. Notice that while the resulting dictionary has a Hankel-type structure similar to what we have seen previously, the additional dependencies between the atoms produce a rich signal space structure, as we shall demonstrate in the following section.

Fig. 14
figure 14

A realization \(D\in \mathbb {R}^{6\times 10}\) of \(\mathcal {G}\) from Figure 13 on page 162.

Algorithm 5 Finding a realization D of the graph G

Generating Signals

Now suppose the graph \(\mathcal {G}\) is known (or can be easily constructed). Then this gives a simple procedure to generate signals from \(\mathcal {M}\), presented in Algorithm 6.

Algorithm 6 Constructing a signal from M via G

Fig. 15

Examples of signals from \(\mathcal {M}\) and the corresponding supports \(\mathcal {S}\).

An interesting question arises: given \(\mathcal {S}\in \mathcal {C_{G}}\left (P\right )\), can we say something about \(\dim \ker A_{\mathcal {S}}\)? In particular, when is it strictly positive (i.e., when \(\mathcal {S}\in \mathcal {R_{G}}\left (P\right )\)?) While in general the question seems to be difficult, in some special cases this number can be estimated using only the properties of the local connections \(\left (S_{i},S_{i+1}\right )\), by essentially counting the additional “degrees of freedom” when moving from patch i to patch i + 1. To this effect, we prove two results.

Proposition 15

For every \(\mathcal {S}\in \mathcal {R_{G}}\left (P\right )\) , we have

$$\displaystyle \begin{aligned} \dim\ker A_{\mathcal{S}}=\dim\ker M_{*}^{\left(\mathcal{S}\right)}. \end{aligned}$$

Proof

Notice that

$$\displaystyle \begin{aligned} \ker A_{\mathcal{S}}=\left\{ D_{G}^{\left(\mathcal{S}\right)}{\varGamma}_{\mathcal{S}},\;M_{*}^{\left(\mathcal{S}\right)}{\varGamma}_{\mathcal{ S}}=0\right\} =im\left(D_{G}^{\left(\mathcal{S}\right)}|{}_{\ker M_{*}^{\left(\mathcal{S}\right)}}\right), \end{aligned}$$
and therefore \(\dim \ker A_{\mathcal {S}}\leqslant \dim \ker M_{*}^{\left (\mathcal {S}\right )}\). Furthermore, the map \(D_{G}^{\left (\mathcal {S}\right )}|{ }_{\ker M_{*}^{\left (\mathcal {S}\right )}}\) is injective, because if \(D_{G}^{\left (\mathcal {S}\right )}{\varGamma }_{\mathcal {S}}=0\) and \(M_{*}^{\left (\mathcal {S}\right )}{\varGamma }_{\mathcal {S}}=0\), we must have that \(D_{S_{i}}\alpha _{i}|{ }_{S_{i}}=0\) and, since \(D_{S_{i}}\) has full rank, also α i  = 0. The conclusion follows. □

Proposition 16

Assume that the model satisfies Assumptions(*) above. Then for every \(\mathcal {S}\in \mathcal {R_{G}}\left (P\right )\)

$$\displaystyle \begin{aligned} \dim\ker A_{\mathcal{S}}\leqslant k. \end{aligned}$$

Proof

The idea is to construct a spanning set for \(\ker M_{*}^{\left (\mathcal {S}\right )}\) and invoke Proposition 15. Let us relabel the nodes along \(\mathcal {S}\) to be 1, 2, …, P. Starting from an arbitrary α 1 with support \(\left |S_{1}\right |=k\), we use (32) to obtain, for i = 1, 2, …, P − 1, a formula for the next portion of the global representation vector Γamma

$$\displaystyle \begin{aligned} \alpha_{i+1}=C_{i,i+1}^{-1}\alpha_{i}. \end{aligned} $$
(33)
This gives a set Δ consisting of overall k linearly independent vectors Γamma i with \({\mathrm {supp}}{\varGamma }_{i}=\mathcal {S}\). It may happen that equation (33) is not satisfied for i = P. However, every Γ with \({\mathrm {supp}}{\varGamma }=\mathcal {S}\) and \(M_{*}^{\left (\mathcal {S}\right )}{\varGamma }amma_{\mathcal {S}}=0\) must belong to spanΔ, and therefore
$$\displaystyle \begin{aligned} \dim\ker M_{*}^{\left(\mathcal{S}\right)}\leqslant\dim{\mathrm{span}}\varDelta=k. \end{aligned}$$

We believe that Proposition 16 can be extended to more general graphs, not necessarily satisfying Assumptions(*).In particular, the following estimate appears to hold for a general model \(\mathcal {M}\) and \(\mathcal {S}\in \mathcal {R_{G}}\left (P\right )\):

$$\displaystyle \begin{aligned} \dim\ker A_{\mathcal{ S}}\leqslant\left|S_{1}\right|+\sum_{i}\left(\left|S_{i+1}\right|-{\mathrm{rank}}\left[S_{B}D_{S_{i}}\;S_{T}D_{S_{i+1}}\right]\right). \end{aligned}$$
We leave the rigorous proof of this result to a future work.

Further Remarks

While the model presented in this section is the hardest to analyze theoretically, even in the restricted case of Assumptions(*) (when does a nontrivial realization of a given \(\mathcal {G}\) exist? How does the answer depend on n? When \(\mathcal {R_{G}}\left (P\right )\neq \emptyset \)? etc?), we hope that this construction will be most useful in applications such as denoising of natural signals.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Batenkov, D., Romano, Y., Elad, M. (2017). On the Global-Local Dichotomy in Sparsity Modeling. In: Boche, H., Caire, G., Calderbank, R., März, M., Kutyniok, G., Mathar, R. (eds) Compressed Sensing and its Applications. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-69802-1_1

Download citation

Publish with us

Policies and ethics