On the Global-Local Dichotomy in Sparsity Modeling

Batenkov, Dmitry; Romano, Yaniv; Elad, Michael

doi:10.1007/978-3-319-69802-1_1

Dmitry Batenkov⁸,
Yaniv Romano⁹ &
Michael Elad¹⁰

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

1441 Accesses
5 Citations

Abstract

The traditional sparse modeling approach, when applied to inverse problems with large data such as images, essentially assumes a sparse model for small overlapping data patches and processes these patches as if they were independent from each other. While producing state-of-the-art results, this methodology is suboptimal, as it does not attempt to model the entire global signal in any meaningful way—a nontrivial task by itself.

In this paper we propose a way to bridge this theoretical gap by constructing a global model from the bottom-up. Given local sparsity assumptions in a dictionary, we show that the global signal representation must satisfy a constrained underdetermined system of linear equations, which forces the patches to agree on the overlaps. Furthermore, we show that the corresponding global pursuit can be solved via local operations. We investigate conditions for unique and stable recovery and provide numerical evidence corroborating the theory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Notice that while R _i extracts the i-th patch from the signal x, the operator $\tilde {R_{i}}$ extracts the representation α _i of R _i x from Γ.
2.
Notice that α _i might be a minimal representation but not a unique one with minimal sparsity. For discussion of uniqueness, see Subsection 2.3.
3.
In general $\min \left \{ s:\;\mu _{1}^{*}\left (s-1\right )\geqslant 1\right \} \neq \max \left \{ s:\;\mu _{1}^{*}\left (s\right )<1\right \} $ because the function $\mu _{1}^{*}$ need not be monotonic.

References

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/. Software available from tensorflow.org
R. Aceska, J.L. Bouchot, S. Li, Local sparsity and recovery of fusion frames structured signals. preprint (2015). http://www.mathc.rwth-aachen.de/~bouchot/files/pubs/FusionCSfinal.pdf
M. Aharon, M. Elad, Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM J. Imag. Sci. 1(3), 228–247 (2008)
Article MathSciNet MATH Google Scholar
U. Ayaz, S. Dirksen, H. Rauhut, Uniform recovery of fusion frame structured sparse signals. Appl. Comput. Harmon. Anal. 41(2), 341–361 (2016). https://doi.org/10.1016/j.acha.2016.03.006. http://www.sciencedirect.com/science/article/pii/S1063520316000294
S. Basu, R. Pollack, M.F. Roy, Algorithms in Real Algebraic Geometry. Algorithms and Computation in Mathematics, 2nd edn., vol. 10 (Springer, Berlin, 2006)
Google Scholar
T. Blumensath, M. Davies, Sparse and shift-invariant representations of music. IEEE Trans. Audio Speech Lang. Process. 14(1), 50–57 (2006). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1561263
Article Google Scholar
T. Blumensath, M.E. Davies, Sampling theorems for signals from the union of finite-dimensional linear subspaces. IEEE Trans. Inf. Theory 55(4), 1872–1882 (2009). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4802322
Article MathSciNet MATH Google Scholar
P. Boufounos, G. Kutyniok, H. Rauhut, Sparse recovery from combined fusion frame measurements. IEEE Trans. Inf. Theory 57(6), 3864–3876 (2011). https://doi.org/https://doi.org/10.1109/TIT.2011.2143890
Article MathSciNet MATH Google Scholar
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011). http://dx.doi.org/10.1561/2200000016
Article MATH Google Scholar
H. Bristow, A. Eriksson, S. Lucey, Fast convolutional sparse coding. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 391–398
Google Scholar
A.M. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009). http://epubs.siam.org/doi/abs/10.1137/060657704
Article MathSciNet MATH Google Scholar
E.J. Candes, Modern statistical estimation via oracle inequalities. Acta Numer. 15, 257–325 (2006). http://journals.cambridge.org/abstract_S0962492906230010
Article MathSciNet MATH Google Scholar
S. Chen, S.A. Billings, W. Luo, Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control. 50(5), 1873–1896 (1989)
Article MATH Google Scholar
W. Dong, L. Zhang, G. Shi, X. Li, Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013)
Article MathSciNet MATH Google Scholar
D.L. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. 100(5), 2197–2202 (2003). doi: https://doi.org/10.1073/pnas.0437847100. http://www.pnas.org/content/100/5/2197
C. Ekanadham, D. Tranchina, E.P. Simoncelli, A unified framework and method for automatic neural spike identification. J. Neurosci. Methods 222, 47–55 (2014). doi: 10.1016/j.jneumeth.2013.10.001. http://www.sciencedirect.com/science/article/pii/S0165027013003415
M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing (Springer, New York, 2010)
Book MATH Google Scholar
M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)
Article MathSciNet Google Scholar
Y.C. Eldar, M. Mishali, Block sparsity and sampling over a union of subspaces, in 2009 16th International Conference on Digital Signal Processing (IEEE, New York, 2009), pp. 1–8. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5201211
Google Scholar
Y.C. Eldar, M. Mishali, Robust recovery of signals from a structured union of subspaces. IEEE Trans. Inf. Theory 55(11), 5302–5316 (2009)
Article MathSciNet MATH Google Scholar
Finite Frames - Theory and Applications. http://www.springer.com/birkhauser/mathematics/book/978-0-8176-8372-6
S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sensing (Springer, New York, 2013). http://link.springer.com/content/pdf/10.1007/978-0-8176-4948-7.pdf
Book MATH Google Scholar
D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
R. Glowinski, On alternating direction methods of multipliers: a historical perspective, in Modeling, Simulation and Optimization for Science and Technology (Springer, Dordrecht, 2014), pp. 59–82
MATH Google Scholar
R. Grosse, R. Raina, H. Kwong, A.Y. Ng, Shift-invariance sparse coding for audio classification (2012). arXiv preprint arXiv: 1206.5241
Google Scholar
R. Grosse, R. Raina, H. Kwong, A.Y. Ng, Shift-invariance sparse coding for audio classification. arXiv: 1206.5241 [cs, stat] (2012). http://arxiv.org/abs/1206.5241. arXiv: 1206.5241
S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, L. Zhang, Convolutional sparse coding for image super-resolution, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1823–1831
Google Scholar
F. Heide, W. Heidrich, G. Wetzstein, Fast and flexible convolutional sparse coding, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, New York, 2015), pp. 5135–5143
Google Scholar
J. Huang, T. Zhang, D. Metaxas, Learning with structured sparsity. J. Mach. Learn. Res. 12, 3371–3412 (2011)
MathSciNet MATH Google Scholar
J. Huang, T. Zhang, et al., The benefit of group sparsity. Ann. Stat. 38(4), 1978–2004 (2010)
Article MathSciNet MATH Google Scholar
K. Kavukcuoglu, P. Sermanet, Y.l. Boureau, K. Gregor, M. Mathieu, Y.L. Cun, Learning convolutional feature hierarchies for visual recognition, in Advances in Neural Information Processing Systems, ed. by J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta, vol. 23 (Curran Associates, Red Hook, 2010), pp. 1090–1098. http://papers.nips.cc/paper/4133-learning-convolutional-feature-hierarchies-for-visual-recognition.pdf
A. Kyrillidis, L. Baldassarre, M.E. Halabi, Q. Tran-Dinh, V. Cevher, Structured sparsity: discrete and convex approaches, in Compressed Sensing and Its Applications. Applied and Numerical Harmonic Analysis, ed. by H. Boche, R. Calderbank, G. Kutyniok, J. Vybíral (Springer, Cham, 2015), pp. 341–387. http://link.springer.com/chapter/10.1007/978-3-319-16042-9_12. https://doi.org/10.1007/978-3-319-16042-9_12
P.L. Lions, B. Mercier, Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet MATH Google Scholar
M.A. Little, N.S. Jones, Generalized methods and solvers for noise removal from piecewise constant signals. II. New methods. Proc. R. Soc. Lond. A: Math. Phys. Eng. Sci. rspa20100674 (2011). doi: https://doi.org/https://doi.org/10.1098/rspa.2010.0674. http://rspa.royalsocietypublishing.org/content/early/2011/06/07/rspa.2010.0674
Y.M. Lu, M.N. Do, A theory for sampling signals from a union of subspaces. IEEE Trans. Signal Process. 56, 2334–2345 (2007)
Article MathSciNet Google Scholar
J. Mairal, G. Sapiro, M. Elad, Learning multiscale sparse representations for image and video restoration. Multiscale Model. Simul. 7(1), 214–241 (2008)
Article MathSciNet MATH Google Scholar
J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, Non-local sparse models for image restoration. in 2009 IEEE 12th International Conference on Computer Vision (IEEE, New York, 2009), pp. 2272–2279
Google Scholar
J. Mairal, F. Bach, J. Ponce, Sparse modeling for image and vision processing. Found. Trends Comput. Graph. Vis. 8(2–3), 85–283 (2014). https://doi.org/10.1561/0600000058. http://www.nowpublishers.com/article/Details/CGV-058
Maplesoft, a division of Waterloo Maple Inc. http://www.maplesoft.com
V. Papyan, M. Elad, Multi-scale patch-based image restoration. IEEE Trans. Image Process. 25(1), 249–261 (2016). https://doi.org/https://doi.org/10.1109/TIP.2015.2499698
Article MathSciNet Google Scholar
V. Papyan, Y. Romano, M. Elad, Convolutional neural networks analyzed via convolutional sparse coding. J. Mach. Learn. Res. 18(83), 1–52 (2017)
MathSciNet Google Scholar
V. Papyan, J. Sulam, M. Elad, Working locally thinking globally: theoretical guarantees for convolutional sparse coding. IEEE Trans. Signal Process. 65(21), 5687–5701 (2017)
Article MathSciNet Google Scholar
Y.C. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, in Asilomar Conference on Signals, Systems and Computers (IEEE, New York, 1993), pp. 40–44
Google Scholar
R. Quiroga, Spike sorting. Scholarpedia 2(12), 3583 (2007). https://doi.org/https://doi.org/10.4249/scholarpedia.3583
Y. Romano, M. Elad, Boosting of image denoising algorithms. SIAM J. Imag. Sci. 8(2), 1187–1219 (2015)
Article MathSciNet MATH Google Scholar
Y. Romano, M. Elad, Patch-disagreement as away to improve K-SVD denoising, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, New York, 2015), pp. 1280–1284
Google Scholar
Y. Romano, M. Protter, M. Elad, Single image interpolation via adaptive nonlocal sparsity-based modeling. IEEE Trans. Image Process. 23(7), 3085–3098 (2014)
Article MathSciNet MATH Google Scholar
L.I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms. Physica D 60(1), 259–268 (1992). http://www.sciencedirect.com/science/article/pii/016727899290242F
Article MathSciNet MATH Google Scholar
C. Rusu, B. Dumitrescu, S. Tsaftaris, Explicit shift-invariant dictionary learning. IEEE Signal Process. Lett. 21, 6–9 (2014). http://www.schur.pub.ro/Idei2011/Articole/SPL_2014_shifts.pdf
Article Google Scholar
E. Smith, M.S. Lewicki, Efficient coding of time-relative structure using spikes. Neural Comput. 17(1), 19–45 (2005). http://dl.acm.org/citation.cfm?id=1119614
Article MATH Google Scholar
A.M. Snijders, N. Nowak, R. Segraves, S. Blackwood, N. Brown, J. Conroy, G. Hamilton, A.K. Hindle, B. Huey, K. Kimura, S. Law, K. Myambo, J. Palmer, B. Ylstra, J.P. Yue, J.W. Gray, A.N. Jain, D. Pinkel, D.G. Albertson, Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet. 29(3), 263–264 (2001). https://doi.org/10.1038/ng754. https://www.nature.com/ng/journal/v29/n3/full/ng754.html
J. Sulam, M. Elad, Expected patch log likelihood with a sparse prior, in International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (Springer, New York, 2015), pp. 99–111
Google Scholar
J. Sulam, B. Ophir, M. Elad, Image denoising through multi-scale learnt dictionaries, in 2014 IEEE International Conference on Image Processing (ICIP) (IEEE, New York, 2014), pp. 808–812
Book Google Scholar
J.J. Thiagarajan, K.N. Ramamurthy, A. Spanias, Shift-invariant sparse representation of images using learned dictionaries, in IEEE Workshop on Machine Learning for Signal Processing, 2008, MLSP 2008 (2008), pp. 145–150 https://doi.org/https://doi.org/10.1109/MLSP.2008.4685470
J.A. Tropp, A.C. Gilbert, M.J. Strauss, Algorithms for simultaneous sparse approximation. Part i: greedy pursuit. Signal Process. 86(3), 572–588 (2006)
MATH Google Scholar
J. Yang, J. Wright, T.S. Huang, Y. Ma, Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
Article MathSciNet MATH Google Scholar
G. Yu, G. Sapiro, S. Mallat, Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity. IEEE Trans. Image Process. 21(5), 2481–2499 (2012). https://doi.org/https://doi.org/10.1109/TIP.2011.2176743
Article MathSciNet MATH Google Scholar
M.D. Zeiler, D. Krishnan, G.W. Taylor, R. Fergus, Deconvolutional networks, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, New York, 2010), pp. 2528–2535. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5539957
Book Google Scholar
M. Zeiler, G. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2018–2025 (2011). doi: https://doi.org/10.1109/ICCV.2011.6126474
D. Zoran, Y. Weiss, From learning models of natural image patches to whole image restoration, in 2011 IEEE International Conference on Computer Vision (ICCV) (IEEE, New York, 2011), pp. 479–486. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6126278
Google Scholar

Download references

Acknowledgments

The research leading to these results has received funding from the European Research Council under European Union’s Seventh Framework Programme, ERC Grant agreement no. 320649. The authors would also like to thank Jeremias Sulam, Vardan Papyan, Raja Giryes, and Gitta Kutinyok for inspiring discussions.

Author information

Authors and Affiliations

Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Dmitry Batenkov
Department of Electrical Engineering, Technion - Israel Institute of Technology, 32000, Haifa, Israel
Yaniv Romano
Department of Computer Science, Technion - Israel Institute of Technology, 32000, Haifa, Israel
Michael Elad

Authors

Dmitry Batenkov
View author publications
You can also search for this author in PubMed Google Scholar
Yaniv Romano
View author publications
You can also search for this author in PubMed Google Scholar
Michael Elad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry Batenkov .

Editor information

Editors and Affiliations

Fakultät für Elektrotechnik und Informationstechnik, Technische Universität München, Munich, Bavaria, Germany
Holger Boche
Institut für Telekommunikationssysteme, Technische Universität Berlin, Berlin, Germany
Giuseppe Caire
Department of Electrical & Computer Engineering, Duke University, Durham, North Carolina, USA
Robert Calderbank
Institut für Mathematik, Technische Universität Berlin, Berlin, Germany
Maximilian März
Institut für Mathematik, Technische Universität Berlin, Berlin, Germany
Gitta Kutyniok
Lehrstuhl und Institute für Statistik, RWTH Aachen, Aachen, Germany
Rudolf Mathar

1 Appendix A: Proof of Lemma 1

Proof

Denote $Z:=\ker M$ and consider the linear map $A:Z\to \mathbb {R}^{N}$ given by the restriction of the “averaging map” $D_{G}:\mathbb {R}^{mP}\to \mathbb {R}^{N}$ to Z.

1.
Let us see first that $im\left (A\right )=\mathbb {R}^{N}$. Indeed, for every $x\in \mathbb {R}^{N}$, consider its patches x _i = R _i x. Since D is full rank, there exist $\left \{ \alpha _{i}\right \} $ for which Dα _i = x _i. Then setting ${\varGamma }:=\left (\alpha _{1},\dots ,\alpha _{P}\right )$, we have both D _G Γ = x and MΓ = 0 (by construction, see Section 2), i.e., Γ ∈ Z and the claim follows.
2.
Define
$$\displaystyle \begin{aligned} J:=\ker D\times\ker D\times\dots\ker D\subset\mathbb{R}^{mP}. \end{aligned}$$
We claim that $J=\ker A$.
1. a.
  In one direction, let ${\varGamma }=\left (\alpha _{1},\dots ,\alpha _{P}\right )\in \ker A$, i.e., MΓ = 0 and D _G Γ = 0. Immediately we see that $\frac {1}{n}D\alpha _{i}=0$ for all i, and therefore $\alpha _{i}\in \ker D$ for all i, thus Γ ∈ J.
2. b.
  In the other direction, let ${\varGamma }=\left (\alpha _{1},\dots ,\alpha _{P}\right )\in J$, i.e., Dα _i = 0. Then the local representations agree, i.e., MΓ = 0, thus Γ ∈ Z. Furthermore, D _G Γ = 0 and therefore ${\varGamma }\in \ker A$.
3.
By the fundamental theorem of linear algebra, we conclude
$$\displaystyle \begin{aligned} \begin{array}{rcl} \dim Z & = & \dim im\left(A\right)+\dim\ker A=N+\dim J\\ & = & N+\left(m-n\right)N=N\left(m-n+1\right). \end{array} \end{aligned} $$

□

2 Appendix B: Proof of Lemma 2

We start with an easy observation.

Proposition 9

For any vector $\rho \in \mathbb {R}^{N}$ , we have

$$\displaystyle \begin{aligned} \|\rho\|{}_{2}^{2}=\frac{1}{n}\sum_{j=1}^{N}\|R_{j}\rho\|{}_{2}^{2}. \end{aligned}$$

Proof

Since

$$\displaystyle \begin{aligned} \|\rho\|{}_{2}^{2}=\sum_{j=1}^{N}\rho_{j}^{2}=\frac{1}{n}\sum_{j=1}^{N}n\rho_{j}^{2}=\frac{1}{n}\sum_{j=1}^{N}\sum_{k=1}^{n}\rho_{j}^{2}, \end{aligned}$$

we can rearrange the sum and get

$$\displaystyle \begin{aligned} \begin{array}{rcl} \|\rho\|{}_{2}^{2} & = & \frac{1}{n}\sum_{k=1}^{n}\sum_{j=1}^{N}\rho_{j}^{2}=\frac{1}{n}\sum_{k=1}^{n}\sum_{j=1}^{N}\rho_{\left(j+k\right)\mod N}^{2}=\frac{1}{n}\sum_{j=1}^{N}\sum_{k=1}^{n}\rho_{\left(j+k\right)\mod N}^{2}\\ & = & \frac{1}{n}\sum_{j=1}^{N}\|R_{j}\rho\|{}_{2}^{2}. \end{array} \end{aligned} $$

□

Corollary 3

Given MΓ = 0, we have

$$\displaystyle \begin{aligned} \|y-D_{G}{\varGamma}\|{}_{2}^{2}=\frac{1}{n}\sum_{j=1}^{N}\|R_{j}y-D\alpha_{j}\|{}_{2}^{2}. \end{aligned}$$

Proof

Using Proposition 9, we get

$$\displaystyle \begin{aligned} \begin{array}{rcl} \|y-D_{G}{\varGamma}\|{}_{2}^{2} & = & \frac{1}{n}\sum_{j=1}^{N}\|R_{j}y-R_{j}D_{G}{\varGamma}\|{}_{2}^{2}=\frac{1}{n}\sum_{j=1}^{N}\|R_{j}y-\varOmega_{j}{\varGamma}\|{}_{2}^{2}. \end{array} \end{aligned} $$

Now since MΓ = 0, then by definition of M, we have Ω _j Γ = Dα _j (see (6)), and this completes the proof. □

Recall Definition 6. Multiplying the corresponding matrices gives

Proposition 10

We have the following equality for all i = 1, …P:

$$\displaystyle \begin{aligned} S_{B}R_{i}=S_{T}R_{i+1}. \end{aligned} $$

(22)

To facilitate the proof, we introduce extension of Definition 6 to multiple shifts as follows.

Definition 16

Let n be fixed. For k = 0, …, n − 1 let

1.
$S_{T}^{\left (k\right )}:=\begin {bmatrix}I_{n-k} & \boldsymbol {0}\end {bmatrix}$ and $S_{B}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0} & I_{n-k}\end {bmatrix}$ denote the operators extracting the top (resp. bottom) n − k entries from a vector of length n; the matrices have dimension $\left (n-k\right )\times n$.
2.
$Z_{B}^{\left (k\right )}:=\begin {bmatrix}S_{B}^{\left (k\right )}\\ \boldsymbol {0}_{k\times n} \end {bmatrix}$ and $Z_{T}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0}_{k\times n}\\ S_{T}^{\left (k\right )} \end {bmatrix}$ .
3.
$W_{B}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0}_{k\times n}\\ S_{B}^{\left (k\right )} \end {bmatrix}$ and $W_{T}^{\left (k\right )}:=\begin {bmatrix}S_{T}^{\left (k\right )}\\ \boldsymbol {0}_{k\times n} \end {bmatrix}$ .

Note that $S_{B}=S_{B}^{\left (1\right )}$ and $S_{T}=S_{T}^{\left (1\right )}$. We have several useful consequences of the above definitions. The proofs are carried out via elementary matrix identities and are left to the reader.

Proposition 11

For any $n\in \mathbb {N}$ , the following hold:

1.
$Z_{T}^{\left (k\right )}=\left (Z_{T}^{\left (1\right )}\right )^{k}$ and $Z_{B}^{\left (k\right )}=\left (Z_{B}^{\left (1\right )}\right )^{k}$ for k = 0, …, n − 1;
2.
$W_{T}^{\left (k\right )}W_{T}^{\left (k\right )}=W_{T}^{\left (k\right )}$ and $W_{B}^{\left (k\right )}W_{B}^{\left (k\right )}=W_{B}^{\left (k\right )}$ for k = 0, …, n − 1;
3.
$W_{T}^{\left (k\right )}W_{B}^{\left (j\right )}=W_{B}^{\left (j\right )}W_{T}^{\left (k\right )}$ for j, k = 0, …, n − 1;
4.
$Z_{B}^{\left (k\right )}=Z_{B}^{\left (k\right )}W_{B}^{\left (k\right )}$ and $Z_{T}^{\left (k\right )}=Z_{T}^{\left (k\right )}W_{T}^{\left (k\right )}$ for k = 0, …, n − 1;
5.
$W_{B}^{\left (k\right )}=Z_{T}^{\left (1\right )}W_{B}^{\left (k-1\right )}Z_{B}^{\left (1\right )}$ and $W_{T}^{\left (k\right )}=Z_{B}^{\left (1\right )}W_{T}^{\left (k-1\right )}Z_{T}$ for k = 1, …, n − 1;
6.
$Z_{B}^{\left (k\right )}Z_{T}^{\left (k\right )}=W_{T}^{\left (k\right )}$ and $Z_{T}^{\left (k\right )}Z_{B}^{\left (k\right )}=W_{B}^{\left (k\right )}$ for k = 0, …, n − 1;
7.
$\left (n-1\right )I_{n\times n}=\sum _{k=1}^{n-1}\left (W_{B}^{\left (k\right )}+W_{T}^{\left (k\right )}\right ).$

Proposition 12

If the vectors $u_{1},\dots ,u_{N}\in \mathbb {R}^{n}$ satisfy pairwise

$$\displaystyle \begin{aligned} S_{B}u_{i}=S_{T}u_{i+1}, \end{aligned}$$

then they also satisfy for each k = 0, …, n − 1 the following:

$$\displaystyle \begin{aligned} W_{B}^{\left(k\right)}u_{i} & =Z_{T}^{\left(k\right)}u_{i+k}, \end{aligned} $$

(23)

$$\displaystyle \begin{aligned} Z_{B}^{\left(k\right)}u_{i} & =W_{T}^{\left(k\right)}u_{i+k}. \end{aligned} $$

(24)

Proof

It is easy to see that the condition S _B u _i = S _T u _i+1 directly implies

$$\displaystyle \begin{aligned} Z_{B}^{\left(1\right)}u_{i} & =W_{T}^{\left(1\right)}u_{i+1},\quad W_{B}^{\left(1\right)}u_{i}=Z_{T}^{\left(1\right)}u_{i+1}\quad\forall i. \end{aligned} $$

(25)

Let us first prove (23) by induction on k. The base case k = 1 is precisely (25). Assuming validity for k − 1 and ∀i, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{aligned}{}W_{B}^{\left(k\right)}u_{i}= & Z_{T}^{\left(1\right)}W_{B}^{\left(k-1\right)}Z_{B}^{\left(1\right)}u_{i} & \left(\text{by Proposition 11, item 11}\right)\\ {} = & Z_{T}^{\left(1\right)}W_{B}^{\left(k-1\right)}W_{T}^{\left(1\right)}u_{i+1} & \left(\text{by 11}\right)\\ {} = & Z_{T}^{\left(1\right)}W_{T}^{\left(1\right)}W_{B}^{\left(k-1\right)}u_{i+1} & \left(\text{by Proposition 11, item 11}\right)\\ {} = & Z_{T}^{\left(1\right)}W_{T}^{\left(1\right)}Z_{T}^{\left(k-1\right)}u_{i+k} & \left(\text{by the induction hypothesis}\right)\\ {} = & Z_{T}^{\left(1\right)}Z_{T}^{\left(k-1\right)}u_{i+k} & \left(\text{by Proposition 11, item 11}\right)\\ {} = & Z_{T}^{\left(k\right)}u_{i+k}. & \left(\text{by Proposition 11, item 11}\right) \end{aligned} \end{array} \end{aligned} $$

To prove (24) we proceed as follows:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{aligned}{}Z_{B}^{\left(k\right)}u_{i} & =Z_{B}^{\left(k\right)}W_{B}^{\left(k\right)}u_{i} & \left(\text{by Proposition 11, item 11}\right)\\ & =Z_{B}^{\left(k\right)}Z_{T}^{\left(k\right)}u_{i+k} & \left(\text{by 11 which is already proved}\right)\\ & =W_{T}^{\left(k\right)}u_{i+k}. & \left(\text{by Proposition 11, item 11}\right) \end{aligned} \end{array} \end{aligned} $$

This finishes the proof of Proposition 12. □

Example 1

Fig. 11

Illustration to the proof of Proposition 12. The green pair is equal, as well as the red pair. It follows that the blue elements are equal as well.

Let us now present the proof of Lemma 2.

Proof

We show equivalence in two directions.

$\left (1\right )\Longrightarrow \left (2\right )$: Let MΓ = 0. Define x := D _G Γ, and then further denote x _i := R _i x. Then on the one hand:
$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{aligned}{}x_{i} & =R_{i}D_{G}{\varGamma}\\ & =\varOmega_{i}{\varGamma} & \text{(definition of }\varOmega_{i})\\ & =D\alpha_{i}. & \left(M{\varGamma}=0\right) \end{aligned} \end{array} \end{aligned} $$
On the other hand, because of (22) we have S _B R _i x = S _T R _i+1 x, and by combining the two, we conclude that S _B Dα _i = S _T Dα _i+1.
$\left (2\right )\Longrightarrow \left (1\right )$: In the other direction, suppose that S _B Dα _i = S _T Dα _i+1. Denote u _i := Dα _i. Now consider the product Ω _i Γ where Ω _i = R _i D _G. One can easily be convinced that in fact
$$\displaystyle \begin{aligned} \varOmega_{i}{\varGamma}=\frac{1}{n}\left(\sum_{k=1}^{n-1}\left(Z_{B}^{\left(k\right)}u_{i-k}+Z_{T}^{\left(k\right)}u_{i+k}\right)+u_{i}\right). \end{aligned}$$
Therefore
$$\displaystyle \begin{aligned} \begin{array}{rcl} \left(\varOmega_{i}-Q_{i}\right){\varGamma} & =&\frac{1}{n}\left(u_{i}+\sum_{k=1}^{n-1}\left(Z_{B}^{\left(k\right)}u_{i-k}+Z_{T}^{\left(k\right)}u_{i+k}\right)\right)-u_{i}\\ & =&\frac{1}{n}\left(\sum_{k=1}^{n-1}\left(W_{T}^{\left(k\right)}u_{i}+W_{B}^{k}u_{i}\right)-\left(n-1\right)u_{i}\right) \qquad \left(\text{by Proposition 11}\right)\\ & =&0.\qquad \left(\text{by Proposition 11, item 11}\right) \end{array} \end{aligned} $$
Since this holds for all i, we have shown that MΓ = 0.

□

3 Appendix C: Proof of Theorem 6

Recall that $M_{A}=\frac {1}{n}\sum _{i}R_{i}^{T}P_{s_{i}}R_{i}$. We first show that M _A is a contraction.

Proposition 13

$\left \Vert M_{A}\right \Vert _{2}\leqslant 1$.

Proof

Closely following a similar proof in [45], divide the index set $\left \{ 1,\dots ,N\right \} $ into n groups representing non-overlapping patches: for i = 1, …, n let

$$\displaystyle \begin{aligned} K\left(i\right):=\left\{ i,i+n,\dots,i+\left(\left\lfloor \frac{N}{n}\right\rfloor -1\right)n\right\} \;\mod N. \end{aligned}$$

Now

$$\displaystyle \begin{aligned} \left\Vert M_{A}x\right\Vert _{2} & =\frac{1}{n}\left\Vert \sum_{i=1}^{N}R_{i}^{T}P_{s_{i}}R_{i}x\right\Vert _{2}\\ & =\frac{1}{n}\left\Vert \sum_{i=1}^{n}\sum_{j\in K\left(i\right)}R_{j}^{T}P_{s_{j}}R_{j}x\right\Vert _{2}\\ & \leqslant\frac{1}{n}\sum_{i=1}^{n}\left\Vert \sum_{j\in K\left(i\right)}R_{j}^{T}P_{j}R_{j}x\right\Vert _{2}. \end{aligned} $$

By construction, $R_{j}R_{k}^{T}=\boldsymbol {0}_{n\times n}$ for $j,k\in K\left (i\right )$ and j ≠ k. Therefore for all i = 1, …, n we have

$$\displaystyle \begin{aligned} \left\Vert \sum_{j\in K\left(i\right)}R_{j}^{T}P_{s_{j}}R_{j}x\right\Vert _{2}^{2} & =\sum_{j\in K\left(i\right)}\left\Vert R_{j}^{T}P_{s_{j}}R_{j}x\right\Vert _{2}^{2}\\ & \leqslant\sum_{j\in K\left(i\right)}\left\Vert R_{j}x\right\Vert _{2}^{2}\leqslant\left\Vert x\right\Vert _{2}^{2}. \end{aligned} $$

Substituting in back into the preceding inequality finally gives

$$\displaystyle \begin{aligned} \left\Vert M_{A}x\right\Vert _{2}\leqslant\frac{1}{n}\sum_{i=1}^{n}\left\Vert x\right\Vert _{2}=\left\Vert x\right\Vert _{2}. \end{aligned}$$

□

Now let us move on to prove Theorem 6.

Proof

Define

$$\displaystyle \begin{aligned} \hat{P}_{i}:=\left(I-P_{s_{i}}\right)R_{i}. \end{aligned}$$

It is easy to see that

$$\displaystyle \begin{aligned} \sum_{i}\hat{P}_{i}^{T}\hat{P}_{i}=A_{\mathcal{S}}^{T}A_{\mathcal{S}}. \end{aligned}$$

Let the SVD of $A_{\mathcal {S}}$ be

$$\displaystyle \begin{aligned} A_{\mathcal{S}}=U\varSigma V^{T}. \end{aligned}$$

Now

$$\displaystyle \begin{aligned} \begin{array}{rcl} V\varSigma^{2}V^{T}=A_{\mathcal{S}}^{T}A_{\mathcal{S}}=\sum_{i}\hat{P}_{i}^{T}\hat{P}_{i} & = & \sum_{i}R_{i}^{T}R_{i}-\underbrace{\sum_{i}R_{i}^{T}P_{s_{i}}R_{i}}_{:=T}\\ & = & nI-T. \end{array} \end{aligned} $$

Therefore T = nI − V ΣV ^T, and

$$\displaystyle \begin{aligned} \begin{array}{rcl} M_{A} & = & \frac{1}{n}T=I-\frac{1}{n}V\varSigma^{2}V^{T}=V\left(I-\frac{\varSigma^{2}}{n}\right)V^{T}. \end{array} \end{aligned} $$

This shows that the eigenvalues of M _A are $\tau _{i}=1-\frac {\sigma _{i}^{2}}{n}$ where $\left \{ \sigma _{i}\right \} $ are the singular values of $A_{\mathcal {S}}$. Thus we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} M_{A}^{k} & = & V{\mathrm{diag}}\left\{ \tau_{i}^{k}\right\} V^{T}. \end{array} \end{aligned} $$

If σ _i = 0 then τ _i = 1, and in any case, by Proposition 13, we have $\left |\tau _{i}\right |\leqslant 1$. Let the columns of the matrix W consist of the singular vectors of $A_{\mathcal {S}}$ corresponding to σ _i = 0 (and so ${\mathrm {span}} W=\mathcal {N}\left ({A_{\mathcal {S}}}\right )$), then

$$\displaystyle \begin{aligned} \lim_{k\to\infty}M_{A}^{k}=WW^{T}. \end{aligned}$$

Thus, as k →∞, $M_{A}^{k}$ tends to the orthogonal projector onto $\mathcal {N}\left ({A_{\mathcal {S}}}\right )$. The convergence is evidently linear, the constant being dependent upon $\left \{ \tau _{i}\right \} $. □

4 Appendix D: Proof of Theorem 8

Recall that the signal consists of s constant segments of corresponding lengths ℓ ₁, …, ℓ _s. We would like to compute the MSE for every pixel within every such segment of length α := ℓ _r. For each patch, the oracle provides the locations of the jump points within the patch.

Fig. 12

The oracle estimator for the pixel O in the segment (black). The orange line is patch number j = 1, …, n, and the relevant pixels are between a _j and b _j. The signal itself is shown to extend beyond the segment (blue line).

Now, the oracle error for the pixel is

$$\displaystyle \begin{aligned} \hat{x}_{A}^{r,k}-v & =\frac{1}{n}\sum_{j=1}^{n}\frac{1}{b_{j}-a_{j}+1}\sum_{i=a_{j}}^{b_{j}}z_{i}\\ & =\sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}z_{i}, \end{aligned} $$

where the coefficients c _i,α,n,k are some positive rational numbers depending only on i, α, n and k. It is easy to check by rearranging the above expression that

$$\displaystyle \begin{aligned} \sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}=1, \end{aligned} $$

(27)

and furthermore, denoting d _i := c _i,α,n,k for fixed α, n, k, we also have that

$$\displaystyle \begin{aligned} d_{-k}<d_{-k+1}<\dots d_{0}>d_{1}>\dots d_{\alpha-k-1}. \end{aligned} $$

(28)

Example 2

n = 4, α = 3

For k = 1:
$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat{x}_{A}^{r,k}-v & = & \frac{1}{4}\left(\frac{1}{2}+\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{0}+\frac{1}{4}\left(\frac{1}{2}+\frac{1}{3}+\frac{1}{3}\right)z_{-1}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{1}\\ & = & \underbrace{\frac{7}{24}}_{d_{-1}}z_{-1}+\underbrace{\frac{5}{12}}_{d_{0}}z_{0}+\underbrace{\frac{7}{24}}_{d_{1}}z_{1} \end{array} \end{aligned} $$
For k = 2:
$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat{x}_{A}^{r,k}-v & = & \frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}+1\right)z_{0}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{-1}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}\right)z_{-2}\\ & = & \frac{13}{24}z_{0}+\frac{7}{24}z_{-1}+\frac{1}{6}z_{-2} \end{array} \end{aligned} $$

Now consider the optimization problem

$$\displaystyle \begin{aligned} \min_{c\in\mathbb{R}^{\alpha}}c^{T}c\quad\text{s.t}\;\mathbf{1}^{T}c=1. \end{aligned}$$

It can be easily verified that it has the optimal value $\frac {1}{\alpha }$, attained at c ^∗ = α 1. From this, (27) and (28), it follows that

$$\displaystyle \begin{aligned} \sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}^{2}>\frac{1}{\alpha}. \end{aligned}$$

Since the z _i are i.i.d., we have

$$\displaystyle \begin{aligned} \mathbb{E}\left(\hat{x}_{A}^{r,k}-v\right)^{2}=\sigma^{2}\sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}^{2}, \end{aligned}$$

while for the entire nonzero segment of length α = ℓ _r

$$\displaystyle \begin{aligned} E_{r}:=\mathbb{E}\left(\sum_{k=0}^{\alpha-1}\left(\hat{x}_{A}^{r,k}-v\right)^{2}\right)=\sum_{k=0}^{\alpha-1}\mathbb{E}\left(\hat{x}_{A}^{r,k}-v\right)^{2}=\sigma^{2}\sum_{k=0}^{\alpha-1}\sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}^{2}. \end{aligned}$$

Defining

$$\displaystyle \begin{aligned} R\left(n,\alpha\right):=\sum_{k=0}^{\alpha-1}\sum_{i=-k}^{\alpha-k-1}c_{i,\alpha,n,k}^{2}, \end{aligned}$$

we obtain that $R\left (n,\alpha \right )>1$ and furthermore

$$\displaystyle \begin{aligned} \mathbb{E}\left\Vert \hat{x}_{A}-x\right\Vert ^{2}=\sum_{r=1}^{s}E_{r}=\sigma^{2}\sum_{r=1}^{s}R\left(n,\ell_{r}\right)>s\sigma^{2}. \end{aligned}$$

This proves item $\left (1\right )$ of Theorem 8. For showing the explicit formulas for $R\left (n,\alpha \right )$ in item $\left (2\right )$, we have used automatic symbolic simplification software MAPLE [39].

By construction (26), it is not difficult to see that if $n\geqslant \alpha $ then

$$\displaystyle \begin{aligned} R(n,\alpha) & =\frac{1}{n^{2}}\sum_{k=0}^{\alpha-1}\Big(\sum_{j=0}^{k}\big(2H_{\alpha-1}-H_{k}+\frac{n-\alpha+1}{\alpha}-H_{\alpha-1-j}\big)^{2}\\ & +\sum_{j=k+1}^{\alpha-1}\big(2H_{\alpha-1}-H_{\alpha-k-1}+\frac{n-\alpha+1}{\alpha}-H_{j}\big)^{2}\Big), \end{aligned} $$

where $H_{k}:=\sum _{i=1}^{k}\frac {1}{i}$ is the k-th harmonic number. This simplifies to

$$\displaystyle \begin{aligned} R(n,\alpha)=1+\frac{\alpha(2\alpha H_{\alpha}^{(2)}+2-3\alpha)-1}{n^{2}}, \end{aligned}$$

where $H_{k}^{\left (2\right )}=\sum _{i=1}^{k}\frac {1}{i^{2}}$ is the k-th harmonic number of the second kind.

On the other hand, for $n\leqslant \frac {\alpha }{2}$ we have

$$\displaystyle \begin{aligned} R(n,\alpha)=\sum_{k=0}^{n-2}c_{n,k}^{(1)}+\sum_{k=n-1}^{\alpha-n}c_{n,k}^{(2)}+\sum_{k=\alpha-n+1}^{\alpha-1}c_{n,\alpha-1-k}^{(1)}, \end{aligned}$$

where

$$\displaystyle \begin{aligned} c_{n,k}^{(1)}=\frac{1}{n^{2}}\Biggl(\sum_{j=k}^{n-1}\big(H_{n-1}-H_{j}+\frac{k+1}{n}\big)^{2}+\sum_{i=n-k}^{n-1}\big(\frac{n-i}{n}\big)^{2}+\sum_{i=0}^{k-1}\big(H_{n-1}-H_{k}+\frac{k-i}{n}\big)^{2}\Biggr) \end{aligned}$$

and

$$\displaystyle \begin{aligned} c_{n,k}^{(2)}=\frac{1}{n^{2}}\Biggl(\sum_{j=k-n+1}^{k}\biggl(\frac{j-k+n}{n}\biggr)^{2}+\sum_{j=k+1}^{k+n-1}\biggl(\frac{k+n-j}{n}\biggr)^{2}\Biggr). \end{aligned}$$

Automatic symbolic simplification of the above gives

$$\displaystyle \begin{aligned} R\left(n,\alpha\right)=\frac{11}{18}+\frac{2\alpha}{3n}-\frac{5}{18n^{2}}+\frac{\alpha-1}{3n^{3}}. \end{aligned}$$

5 Appendix E: Generative Models for Patch-Sparse Signals

In this section we propose a general framework aimed at generating signals from the patch-sparse model. Our approach is to construct a graph-based model for the dictionary and subsequently use this model to generate dictionaries and signals which turn out to be much richer than those considered in Section 4.

Local Support Dependencies

We start by highlighting the importance of the local connections (recall Lemma 2) between the neighboring patches of the signal and therefore between the corresponding subspaces containing those patches. This in turn allows to characterize $\varSigma _{\mathcal {M}}$ as the set of all “realizable” paths in a certain dependency graph derived from the dictionary D. This point of view allows to describe the model $\mathcal {M}$ using only the intrinsic properties of the dictionary, in contrast to Theorem 2.

Proposition 14

Let $0\neq x\in \mathcal {M}$ and ${\varGamma }amma\in \rho \left (x\right )$ with ${\mathrm {supp}}{\varGamma }=\left (S_{1},\dots ,S_{P}\right )$ . Then for i = 1, …, P

$$\displaystyle \begin{aligned} {\mathrm{rank}}\left[S_{B}D_{S_{i}}\;-S_{T}D_{S_{i+1}}\right]<\left|S_{i}\right|+\left|S_{i+1}\right|\leqslant2s, \end{aligned} $$

(29)

where by convention rank∅ = −∞.

Proof

$x\in \mathcal {M}$ implies by Lemma 2 that for every i = 1, …P

$$\displaystyle \begin{aligned} \left[S_{B}D\quad-S_{T}D\right]\begin{bmatrix}\alpha_{i}\\ \alpha_{i+1} \end{bmatrix}=0. \end{aligned}$$

But

$$\displaystyle \begin{aligned} \left[S_{B}D\quad-S_{T}D\right]\begin{bmatrix}\alpha_{i}\\ \alpha_{i+1} \end{bmatrix}=\left[S_{B}D_{S_{i}}\quad-S_{T}D_{S_{i+1}}\right]\begin{bmatrix}\alpha_{i}|S_{i}\\ \alpha_{i+1}|S_{i+1} \end{bmatrix}=0, \end{aligned}$$

and therefore the matrix $\left [S_{B}D_{S_{i}}\quad -S_{T}D_{S_{i+1}}\right ]$ must be rank-deficient. Note in particular that the conclusion still holds if one (or both) of the $\left \{ s_{i},s_{i+1}\right \} $ is empty. □

The preceding result suggests a way to describe all the supports in $\varSigma _{\mathcal {M}}$ .

Definition 17

Given a dictionary D, we define an abstract directed graph $\mathcal {G}_{D,s}=\left (V,E\right )$, with the vertex set

$$\displaystyle \begin{aligned} V=\left\{ \left(i_{1},\dots,i_{k}\right)\subset\left\{ 1,\dots,m\right\} :\quad{\mathrm{rank}} D_{i_{1},\dots,i_{k}}=k<n\right\} , \end{aligned}$$

and the edge set

$$\displaystyle \begin{aligned} E=\biggl\{\left(S_{1},S_{2}\right)\in V\times V:\quad{\mathrm{rank}}\left[S_{B}D_{S_{1}}\quad-S_{T}D_{S_{2}}\right]<\min\left\{ n-1,\left|S_{1}\right|+\left|S_{2}\right|\right\} \biggr\}. \end{aligned}$$

In particular, ∅ ∈ V and $\left (\emptyset ,\emptyset \right )\in E$ with ${\mathrm {rank}}\left [\emptyset \right ]:=-\infty $.

Remark 4

It might be impossible to compute $\mathcal {G}_{D,s}$ in practice. However we set this issue aside for now and only explore the theoretical ramifications of its properties.

Definition 18

The set of all directed paths of length P in $\mathcal {G}_{D,s}$, not including the self-loop $\underbrace {\left (\emptyset ,\emptyset ,\dots \emptyset \right )}_{\times P}$, is denoted by $\mathcal {C_{G}}\left (P\right )$.

Definition 19

A path $\mathcal {S}\in \mathcal {C_{G}}\left (P\right )$ is called realizable if $\dim \ker A_{\mathcal {S}}>0$. The set of all realizable paths in $\mathcal {C_{G}}\left (P\right )$ is denoted by $\mathcal {R_{G}}\left (P\right )$.

Thus we have the following result.

Theorem 9

Suppose $0\neq x\in \mathcal {M}$ . Then

1.
Every representation ${\varGamma }=\left (\alpha _{i}\right )_{i=1}^{P}\in \rho \left (x\right )$ satisfies ${\mathrm {supp}}{\varGamma }\in \mathcal {C_{G}}\left (P\right )$ , and therefore
$$\displaystyle \begin{aligned} \varSigma_{\mathcal{M}}\subseteq\mathcal{R_{G}}\left(P\right). \end{aligned} $$
(30)
2.
The model $\mathcal {M}$ can be characterized “intrinsically” by the dictionary as follows:
$$\displaystyle \begin{aligned} \mathcal{M}=\bigcup_{\mathcal{S}\in\mathcal{R_{G}}\left(P\right)}\ker A_{\mathcal{S}}. \end{aligned} $$
(31)

Proof

Let ${\mathrm {supp}}{\varGamma }=\left (S_{1},\dots ,S_{P}\right )$ with S _i = suppα _i if α _i ≠ 0, and S _i = ∅ if α _i = 0. Then by Proposition 14, we must have that

$$\displaystyle \begin{aligned} {\mathrm{rank}}\left[S_{B}D_{S_{i}}\;-S_{T}D_{S_{i+1}}\right]<\left|S_{i}\right|+\left|S_{i+1}\right|\leqslant2s. \end{aligned}$$

Furthermore, since ${\varGamma }\in \rho \left (x\right )$ we must have that $D_{S_{i}}$ is full rank for each i = 1, …, P. Thus $\left (S_{i},S_{i+1}\right )\in \mathcal {G}_{D,s}$, and so ${\mathrm {supp}}{\varGamma }\in \mathcal {R_{G}}\left (P\right )$. Since by assumption ${\mathrm {supp}}{\varGamma }\in \varSigma _{\mathcal {M}}$, this proves (30).

To show (31), notice that if ${\mathrm {supp}}{\varGamma }amma\in \mathcal {R_{G}}\left (P\right )$, then for every $x\in \ker A_{{\mathrm {supp}}{\varGamma }}$, we have $R_{i}x=P_{S_{i}}R_{i}x$, i.e., R _i x = Dα _i for some α _i with suppα _i ⊆ S _i. Clearly in this case $\left |{\mathrm {supp}}\alpha _{i}\right |\leqslant s$ and therefore $x\in \mathcal {M}$. The other direction of (31) follows immediately from the definitions. □

Definition 20

The dictionary D is called “$\left (s,P\right )$-good” if

$$\displaystyle \begin{aligned} \left|\mathcal{R_{G}}\left(P\right)\right|>0. \end{aligned}$$

Theorem 10

The set of “ $\left (s,P\right )$ -good” dictionaries has measure zero in the space of all n × m matrices.

Proof

Every low-rank condition defines a finite number of algebraic equations on the entries of D (given by the vanishing of all the 2s × 2s minors of $\begin {bmatrix}S_{B}D_{S_{i}} & S_{T}D_{S_{j}}\end {bmatrix}$). Since the number of possible graphs is finite (given fixed n, m and s), the resulting solution set is a finite union of semi-algebraic sets of low dimension and hence has measure zero. □

Constructing “Good” Dictionaries

The above considerations suggest that the good dictionaries are hard to come by; here we provide an example of an explicit construction.

We start by defining an abstract graph $\mathcal {G}$ with some desirable properties, and subsequently look for a nontrivial realization D of the graph, so that in addition $\mathcal {R}_{\mathcal {G}}\neq \emptyset $.

Fig. 13

A possible dependency graph $\mathcal {G}$ with m = 10. In this example, $ \left |\mathcal {C_{G}} \left (70 \right ) \right |=37614$.

Every edge in $\mathcal {G}$ corresponds to a conditions of the form (29) imposed on the entries of D. As discussed in Theorem 10, this in turn translates to a set of algebraic equations. So the natural idea would be to write out the large system of such equations and look for a solution over the field $\mathbb {R}$ by well-known algorithms in numerical algebraic geometry [5]. However, this approach is highly impractical because these algorithms have (single or double) exponential running time. We consequently propose a simplified, more direct approach to the problem.

In detail, we replace the low-rank conditions (29) with more explicit and restrictive ones below.

Assumptions(*):: For each $\left (S_{i},S_{j}\right )\in \mathcal {G}$ we have $\left |S_{i}\right |=\left |S_{j}\right |=k$. We require that ${\mathrm {span}} S_{B}D_{S_{i}}={\mathrm {span}} S_{T}D_{S_{j}}=\varLambda _{i,j}$ with $\dim \varLambda _{i,j}=k$. Thus there exists a non-singular transfer matrix $C_{i,j}\in \mathbb {R}^{k\times k}$ such that
$$\displaystyle \begin{aligned} S_{B}D_{S_{i}}=C_{i,j}S_{T}D_{S_{j}}. \end{aligned} $$
(32)

In other words, every column in $S_{B}D_{S_{i}}$ must be a specific linear combination of the columns in $S_{T}D_{S_{j}}$. This is much more restrictive than the low-rank condition, but on the other hand, given the matrix C _i,j, it defines a set of linear constraints on D. To summarize, the final algorithm is presented in Algorithm 5. In general, nothing guarantees that for a particular choice of $\mathcal {G}$ and the transfer matrices, there is a nontrivial solution D; however, in practice we do find such solutions. For example, taking the graph from Figure 13 on page 162 and augmenting it with the matrices C _i,j (scalars in this case), we obtain a solution over $\mathbb {R}^{6}$ which is shown in Figure 14 on page 166. Notice that while the resulting dictionary has a Hankel-type structure similar to what we have seen previously, the additional dependencies between the atoms produce a rich signal space structure, as we shall demonstrate in the following section.

Algorithm 5 Finding a realization D of the graph G

Generating Signals

Now suppose the graph $\mathcal {G}$ is known (or can be easily constructed). Then this gives a simple procedure to generate signals from $\mathcal {M}$, presented in Algorithm 6.

Algorithm 6 Constructing a signal from M via G

Fig. 15

Examples of signals from $\mathcal {M}$ and the corresponding supports $\mathcal {S}$.

An interesting question arises: given $\mathcal {S}\in \mathcal {C_{G}}\left (P\right )$, can we say something about $\dim \ker A_{\mathcal {S}}$? In particular, when is it strictly positive (i.e., when $\mathcal {S}\in \mathcal {R_{G}}\left (P\right )$?) While in general the question seems to be difficult, in some special cases this number can be estimated using only the properties of the local connections $\left (S_{i},S_{i+1}\right )$, by essentially counting the additional “degrees of freedom” when moving from patch i to patch i + 1. To this effect, we prove two results.

Proposition 15

For every $\mathcal {S}\in \mathcal {R_{G}}\left (P\right )$ , we have

$$\displaystyle \begin{aligned} \dim\ker A_{\mathcal{S}}=\dim\ker M_{*}^{\left(\mathcal{S}\right)}. \end{aligned}$$

Proof

Notice that

$$\displaystyle \begin{aligned} \ker A_{\mathcal{S}}=\left\{ D_{G}^{\left(\mathcal{S}\right)}{\varGamma}_{\mathcal{S}},\;M_{*}^{\left(\mathcal{S}\right)}{\varGamma}_{\mathcal{ S}}=0\right\} =im\left(D_{G}^{\left(\mathcal{S}\right)}|{}_{\ker M_{*}^{\left(\mathcal{S}\right)}}\right), \end{aligned}$$

and therefore $\dim \ker A_{\mathcal {S}}\leqslant \dim \ker M_{*}^{\left (\mathcal {S}\right )}$. Furthermore, the map $D_{G}^{\left (\mathcal {S}\right )}|{ }_{\ker M_{*}^{\left (\mathcal {S}\right )}}$ is injective, because if $D_{G}^{\left (\mathcal {S}\right )}{\varGamma }_{\mathcal {S}}=0$ and $M_{*}^{\left (\mathcal {S}\right )}{\varGamma }_{\mathcal {S}}=0$, we must have that $D_{S_{i}}\alpha _{i}|{ }_{S_{i}}=0$ and, since $D_{S_{i}}$ has full rank, also α _i = 0. The conclusion follows. □

Proposition 16

Assume that the model satisfies Assumptions(*) above. Then for every $\mathcal {S}\in \mathcal {R_{G}}\left (P\right )$

$$\displaystyle \begin{aligned} \dim\ker A_{\mathcal{S}}\leqslant k. \end{aligned}$$

Proof

The idea is to construct a spanning set for $\ker M_{*}^{\left (\mathcal {S}\right )}$ and invoke Proposition 15. Let us relabel the nodes along $\mathcal {S}$ to be 1, 2, …, P. Starting from an arbitrary α ₁ with support $\left |S_{1}\right |=k$, we use (32) to obtain, for i = 1, 2, …, P − 1, a formula for the next portion of the global representation vector Γamma

$$\displaystyle \begin{aligned} \alpha_{i+1}=C_{i,i+1}^{-1}\alpha_{i}. \end{aligned} $$

(33)

This gives a set Δ consisting of overall k linearly independent vectors Γamma _i with ${\mathrm {supp}}{\varGamma }_{i}=\mathcal {S}$. It may happen that equation (33) is not satisfied for i = P. However, every Γ with ${\mathrm {supp}}{\varGamma }=\mathcal {S}$ and $M_{*}^{\left (\mathcal {S}\right )}{\varGamma }amma_{\mathcal {S}}=0$ must belong to spanΔ, and therefore

$$\displaystyle \begin{aligned} \dim\ker M_{*}^{\left(\mathcal{S}\right)}\leqslant\dim{\mathrm{span}}\varDelta=k. \end{aligned}$$

□

We believe that Proposition 16 can be extended to more general graphs, not necessarily satisfying Assumptions(*).In particular, the following estimate appears to hold for a general model $\mathcal {M}$ and $\mathcal {S}\in \mathcal {R_{G}}\left (P\right )$:

$$\displaystyle \begin{aligned} \dim\ker A_{\mathcal{ S}}\leqslant\left|S_{1}\right|+\sum_{i}\left(\left|S_{i+1}\right|-{\mathrm{rank}}\left[S_{B}D_{S_{i}}\;S_{T}D_{S_{i+1}}\right]\right). \end{aligned}$$

We leave the rigorous proof of this result to a future work.

Further Remarks

While the model presented in this section is the hardest to analyze theoretically, even in the restricted case of Assumptions(*) (when does a nontrivial realization of a given $\mathcal {G}$ exist? How does the answer depend on n? When $\mathcal {R_{G}}\left (P\right )\neq \emptyset $? etc?), we hope that this construction will be most useful in applications such as denoising of natural signals.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Batenkov, D., Romano, Y., Elad, M. (2017). On the Global-Local Dichotomy in Sparsity Modeling. In: Boche, H., Caire, G., Calderbank, R., März, M., Kutyniok, G., Mathar, R. (eds) Compressed Sensing and its Applications. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-69802-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-69802-1_1
Published: 18 January 2018
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-69801-4
Online ISBN: 978-3-319-69802-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

On the Global-Local Dichotomy in Sparsity Modeling

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Appendix A: Proof of Lemma 1

Proof

2 Appendix B: Proof of Lemma 2

Proposition 9

Proof

Corollary 3

Proof

Proposition 10

Definition 16

Proposition 11

Proposition 12

Proof

Example 1

Fig. 11

Proof

3 Appendix C: Proof of Theorem 6

Proposition 13

Proof

Proof

4 Appendix D: Proof of Theorem 8

Fig. 12

Example 2

5 Appendix E: Generative Models for Patch-Sparse Signals

Local Support Dependencies

Proposition 14

Proof

Definition 17

Remark 4

Definition 18

Definition 19

Theorem 9

Proof

Definition 20

Theorem 10

Proof

Constructing “Good” Dictionaries

Fig. 13

Algorithm 5 Finding a realization D of the graph G

Generating Signals

Algorithm 6 Constructing a signal from M via G

Fig. 15

Proposition 15

Proof

Proposition 16

Proof

Further Remarks

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation