Abstract
The traditional sparse modeling approach, when applied to inverse problems with large data such as images, essentially assumes a sparse model for small overlapping data patches and processes these patches as if they were independent from each other. While producing state-of-the-art results, this methodology is suboptimal, as it does not attempt to model the entire global signal in any meaningful way—a nontrivial task by itself.
In this paper we propose a way to bridge this theoretical gap by constructing a global model from the bottom-up. Given local sparsity assumptions in a dictionary, we show that the global signal representation must satisfy a constrained underdetermined system of linear equations, which forces the patches to agree on the overlaps. Furthermore, we show that the corresponding global pursuit can be solved via local operations. We investigate conditions for unique and stable recovery and provide numerical evidence corroborating the theory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Notice that while R i extracts the i-th patch from the signal x, the operator \(\tilde {R_{i}}\) extracts the representation α i of R i x from Γ.
- 2.
Notice that α i might be a minimal representation but not a unique one with minimal sparsity. For discussion of uniqueness, see Subsection 2.3.
- 3.
In general \(\min \left \{ s:\;\mu _{1}^{*}\left (s-1\right )\geqslant 1\right \} \neq \max \left \{ s:\;\mu _{1}^{*}\left (s\right )<1\right \} \) because the function \(\mu _{1}^{*}\) need not be monotonic.
References
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/. Software available from tensorflow.org
R. Aceska, J.L. Bouchot, S. Li, Local sparsity and recovery of fusion frames structured signals. preprint (2015). http://www.mathc.rwth-aachen.de/~bouchot/files/pubs/FusionCSfinal.pdf
M. Aharon, M. Elad, Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM J. Imag. Sci. 1(3), 228–247 (2008)
U. Ayaz, S. Dirksen, H. Rauhut, Uniform recovery of fusion frame structured sparse signals. Appl. Comput. Harmon. Anal. 41(2), 341–361 (2016). https://doi.org/10.1016/j.acha.2016.03.006. http://www.sciencedirect.com/science/article/pii/S1063520316000294
S. Basu, R. Pollack, M.F. Roy, Algorithms in Real Algebraic Geometry. Algorithms and Computation in Mathematics, 2nd edn., vol. 10 (Springer, Berlin, 2006)
T. Blumensath, M. Davies, Sparse and shift-invariant representations of music. IEEE Trans. Audio Speech Lang. Process. 14(1), 50–57 (2006). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1561263
T. Blumensath, M.E. Davies, Sampling theorems for signals from the union of finite-dimensional linear subspaces. IEEE Trans. Inf. Theory 55(4), 1872–1882 (2009). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4802322
P. Boufounos, G. Kutyniok, H. Rauhut, Sparse recovery from combined fusion frame measurements. IEEE Trans. Inf. Theory 57(6), 3864–3876 (2011). https://doi.org/https://doi.org/10.1109/TIT.2011.2143890
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011). http://dx.doi.org/10.1561/2200000016
H. Bristow, A. Eriksson, S. Lucey, Fast convolutional sparse coding. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 391–398
A.M. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009). http://epubs.siam.org/doi/abs/10.1137/060657704
E.J. Candes, Modern statistical estimation via oracle inequalities. Acta Numer. 15, 257–325 (2006). http://journals.cambridge.org/abstract_S0962492906230010
S. Chen, S.A. Billings, W. Luo, Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control. 50(5), 1873–1896 (1989)
W. Dong, L. Zhang, G. Shi, X. Li, Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013)
D.L. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. 100(5), 2197–2202 (2003). doi: https://doi.org/10.1073/pnas.0437847100. http://www.pnas.org/content/100/5/2197
C. Ekanadham, D. Tranchina, E.P. Simoncelli, A unified framework and method for automatic neural spike identification. J. Neurosci. Methods 222, 47–55 (2014). doi: 10.1016/j.jneumeth.2013.10.001. http://www.sciencedirect.com/science/article/pii/S0165027013003415
M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing (Springer, New York, 2010)
M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)
Y.C. Eldar, M. Mishali, Block sparsity and sampling over a union of subspaces, in 2009 16th International Conference on Digital Signal Processing (IEEE, New York, 2009), pp. 1–8. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5201211
Y.C. Eldar, M. Mishali, Robust recovery of signals from a structured union of subspaces. IEEE Trans. Inf. Theory 55(11), 5302–5316 (2009)
Finite Frames - Theory and Applications. http://www.springer.com/birkhauser/mathematics/book/978-0-8176-8372-6
S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sensing (Springer, New York, 2013). http://link.springer.com/content/pdf/10.1007/978-0-8176-4948-7.pdf
D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
R. Glowinski, On alternating direction methods of multipliers: a historical perspective, in Modeling, Simulation and Optimization for Science and Technology (Springer, Dordrecht, 2014), pp. 59–82
R. Grosse, R. Raina, H. Kwong, A.Y. Ng, Shift-invariance sparse coding for audio classification (2012). arXiv preprint arXiv: 1206.5241
R. Grosse, R. Raina, H. Kwong, A.Y. Ng, Shift-invariance sparse coding for audio classification. arXiv: 1206.5241 [cs, stat] (2012). http://arxiv.org/abs/1206.5241. arXiv: 1206.5241
S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, L. Zhang, Convolutional sparse coding for image super-resolution, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1823–1831
F. Heide, W. Heidrich, G. Wetzstein, Fast and flexible convolutional sparse coding, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, New York, 2015), pp. 5135–5143
J. Huang, T. Zhang, D. Metaxas, Learning with structured sparsity. J. Mach. Learn. Res. 12, 3371–3412 (2011)
J. Huang, T. Zhang, et al., The benefit of group sparsity. Ann. Stat. 38(4), 1978–2004 (2010)
K. Kavukcuoglu, P. Sermanet, Y.l. Boureau, K. Gregor, M. Mathieu, Y.L. Cun, Learning convolutional feature hierarchies for visual recognition, in Advances in Neural Information Processing Systems, ed. by J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta, vol. 23 (Curran Associates, Red Hook, 2010), pp. 1090–1098. http://papers.nips.cc/paper/4133-learning-convolutional-feature-hierarchies-for-visual-recognition.pdf
A. Kyrillidis, L. Baldassarre, M.E. Halabi, Q. Tran-Dinh, V. Cevher, Structured sparsity: discrete and convex approaches, in Compressed Sensing and Its Applications. Applied and Numerical Harmonic Analysis, ed. by H. Boche, R. Calderbank, G. Kutyniok, J. Vybíral (Springer, Cham, 2015), pp. 341–387. http://link.springer.com/chapter/10.1007/978-3-319-16042-9_12. https://doi.org/10.1007/978-3-319-16042-9_12
P.L. Lions, B. Mercier, Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
M.A. Little, N.S. Jones, Generalized methods and solvers for noise removal from piecewise constant signals. II. New methods. Proc. R. Soc. Lond. A: Math. Phys. Eng. Sci. rspa20100674 (2011). doi: https://doi.org/https://doi.org/10.1098/rspa.2010.0674. http://rspa.royalsocietypublishing.org/content/early/2011/06/07/rspa.2010.0674
Y.M. Lu, M.N. Do, A theory for sampling signals from a union of subspaces. IEEE Trans. Signal Process. 56, 2334–2345 (2007)
J. Mairal, G. Sapiro, M. Elad, Learning multiscale sparse representations for image and video restoration. Multiscale Model. Simul. 7(1), 214–241 (2008)
J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, Non-local sparse models for image restoration. in 2009 IEEE 12th International Conference on Computer Vision (IEEE, New York, 2009), pp. 2272–2279
J. Mairal, F. Bach, J. Ponce, Sparse modeling for image and vision processing. Found. Trends Comput. Graph. Vis. 8(2–3), 85–283 (2014). https://doi.org/10.1561/0600000058. http://www.nowpublishers.com/article/Details/CGV-058
Maplesoft, a division of Waterloo Maple Inc. http://www.maplesoft.com
V. Papyan, M. Elad, Multi-scale patch-based image restoration. IEEE Trans. Image Process. 25(1), 249–261 (2016). https://doi.org/https://doi.org/10.1109/TIP.2015.2499698
V. Papyan, Y. Romano, M. Elad, Convolutional neural networks analyzed via convolutional sparse coding. J. Mach. Learn. Res. 18(83), 1–52 (2017)
V. Papyan, J. Sulam, M. Elad, Working locally thinking globally: theoretical guarantees for convolutional sparse coding. IEEE Trans. Signal Process. 65(21), 5687–5701 (2017)
Y.C. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, in Asilomar Conference on Signals, Systems and Computers (IEEE, New York, 1993), pp. 40–44
R. Quiroga, Spike sorting. Scholarpedia 2(12), 3583 (2007). https://doi.org/https://doi.org/10.4249/scholarpedia.3583
Y. Romano, M. Elad, Boosting of image denoising algorithms. SIAM J. Imag. Sci. 8(2), 1187–1219 (2015)
Y. Romano, M. Elad, Patch-disagreement as away to improve K-SVD denoising, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, New York, 2015), pp. 1280–1284
Y. Romano, M. Protter, M. Elad, Single image interpolation via adaptive nonlocal sparsity-based modeling. IEEE Trans. Image Process. 23(7), 3085–3098 (2014)
L.I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms. Physica D 60(1), 259–268 (1992). http://www.sciencedirect.com/science/article/pii/016727899290242F
C. Rusu, B. Dumitrescu, S. Tsaftaris, Explicit shift-invariant dictionary learning. IEEE Signal Process. Lett. 21, 6–9 (2014). http://www.schur.pub.ro/Idei2011/Articole/SPL_2014_shifts.pdf
E. Smith, M.S. Lewicki, Efficient coding of time-relative structure using spikes. Neural Comput. 17(1), 19–45 (2005). http://dl.acm.org/citation.cfm?id=1119614
A.M. Snijders, N. Nowak, R. Segraves, S. Blackwood, N. Brown, J. Conroy, G. Hamilton, A.K. Hindle, B. Huey, K. Kimura, S. Law, K. Myambo, J. Palmer, B. Ylstra, J.P. Yue, J.W. Gray, A.N. Jain, D. Pinkel, D.G. Albertson, Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet. 29(3), 263–264 (2001). https://doi.org/10.1038/ng754. https://www.nature.com/ng/journal/v29/n3/full/ng754.html
J. Sulam, M. Elad, Expected patch log likelihood with a sparse prior, in International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (Springer, New York, 2015), pp. 99–111
J. Sulam, B. Ophir, M. Elad, Image denoising through multi-scale learnt dictionaries, in 2014 IEEE International Conference on Image Processing (ICIP) (IEEE, New York, 2014), pp. 808–812
J.J. Thiagarajan, K.N. Ramamurthy, A. Spanias, Shift-invariant sparse representation of images using learned dictionaries, in IEEE Workshop on Machine Learning for Signal Processing, 2008, MLSP 2008 (2008), pp. 145–150 https://doi.org/https://doi.org/10.1109/MLSP.2008.4685470
J.A. Tropp, A.C. Gilbert, M.J. Strauss, Algorithms for simultaneous sparse approximation. Part i: greedy pursuit. Signal Process. 86(3), 572–588 (2006)
J. Yang, J. Wright, T.S. Huang, Y. Ma, Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
G. Yu, G. Sapiro, S. Mallat, Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity. IEEE Trans. Image Process. 21(5), 2481–2499 (2012). https://doi.org/https://doi.org/10.1109/TIP.2011.2176743
M.D. Zeiler, D. Krishnan, G.W. Taylor, R. Fergus, Deconvolutional networks, in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, New York, 2010), pp. 2528–2535. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5539957
M. Zeiler, G. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2018–2025 (2011). doi: https://doi.org/10.1109/ICCV.2011.6126474
D. Zoran, Y. Weiss, From learning models of natural image patches to whole image restoration, in 2011 IEEE International Conference on Computer Vision (ICCV) (IEEE, New York, 2011), pp. 479–486. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6126278
Acknowledgments
The research leading to these results has received funding from the European Research Council under European Union’s Seventh Framework Programme, ERC Grant agreement no. 320649. The authors would also like to thank Jeremias Sulam, Vardan Papyan, Raja Giryes, and Gitta Kutinyok for inspiring discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Appendix A: Proof of Lemma 1
Proof
Denote \(Z:=\ker M\) and consider the linear map \(A:Z\to \mathbb {R}^{N}\) given by the restriction of the “averaging map” \(D_{G}:\mathbb {R}^{mP}\to \mathbb {R}^{N}\) to Z.
-
1.
Let us see first that \(im\left (A\right )=\mathbb {R}^{N}\). Indeed, for every \(x\in \mathbb {R}^{N}\), consider its patches x i = R i x. Since D is full rank, there exist \(\left \{ \alpha _{i}\right \} \) for which Dα i = x i . Then setting \({\varGamma }:=\left (\alpha _{1},\dots ,\alpha _{P}\right )\), we have both D G Γ = x and MΓ = 0 (by construction, see Section 2), i.e., Γ ∈ Z and the claim follows.
-
2.
Define
$$\displaystyle \begin{aligned} J:=\ker D\times\ker D\times\dots\ker D\subset\mathbb{R}^{mP}. \end{aligned}$$We claim that \(J=\ker A\).
-
a.
In one direction, let \({\varGamma }=\left (\alpha _{1},\dots ,\alpha _{P}\right )\in \ker A\), i.e., MΓ = 0 and D G Γ = 0. Immediately we see that \(\frac {1}{n}D\alpha _{i}=0\) for all i, and therefore \(\alpha _{i}\in \ker D\) for all i, thus Γ ∈ J.
-
b.
In the other direction, let \({\varGamma }=\left (\alpha _{1},\dots ,\alpha _{P}\right )\in J\), i.e., Dα i = 0. Then the local representations agree, i.e., MΓ = 0, thus Γ ∈ Z. Furthermore, D G Γ = 0 and therefore \({\varGamma }\in \ker A\).
-
a.
-
3.
By the fundamental theorem of linear algebra, we conclude
$$\displaystyle \begin{aligned} \begin{array}{rcl} \dim Z & = & \dim im\left(A\right)+\dim\ker A=N+\dim J\\ & = & N+\left(m-n\right)N=N\left(m-n+1\right). \end{array} \end{aligned} $$
□
2 Appendix B: Proof of Lemma 2
We start with an easy observation.
Proposition 9
For any vector \(\rho \in \mathbb {R}^{N}\) , we have
Proof
Since
Corollary 3
Given MΓ = 0, we have
Proof
Using Proposition 9, we get
Recall Definition 6. Multiplying the corresponding matrices gives
Proposition 10
We have the following equality for all i = 1, …P:
To facilitate the proof, we introduce extension of Definition 6 to multiple shifts as follows.
Definition 16
Let n be fixed. For k = 0, …, n − 1 let
-
1.
\(S_{T}^{\left (k\right )}:=\begin {bmatrix}I_{n-k} & \boldsymbol {0}\end {bmatrix}\) and \(S_{B}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0} & I_{n-k}\end {bmatrix}\) denote the operators extracting the top (resp. bottom) n − k entries from a vector of length n; the matrices have dimension \(\left (n-k\right )\times n\).
-
2.
\(Z_{B}^{\left (k\right )}:=\begin {bmatrix}S_{B}^{\left (k\right )}\\ \boldsymbol {0}_{k\times n} \end {bmatrix}\) and \(Z_{T}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0}_{k\times n}\\ S_{T}^{\left (k\right )} \end {bmatrix}\) .
-
3.
\(W_{B}^{\left (k\right )}:=\begin {bmatrix}\boldsymbol {0}_{k\times n}\\ S_{B}^{\left (k\right )} \end {bmatrix}\) and \(W_{T}^{\left (k\right )}:=\begin {bmatrix}S_{T}^{\left (k\right )}\\ \boldsymbol {0}_{k\times n} \end {bmatrix}\) .
Note that \(S_{B}=S_{B}^{\left (1\right )}\) and \(S_{T}=S_{T}^{\left (1\right )}\). We have several useful consequences of the above definitions. The proofs are carried out via elementary matrix identities and are left to the reader.
Proposition 11
For any \(n\in \mathbb {N}\) , the following hold:
-
1.
\(Z_{T}^{\left (k\right )}=\left (Z_{T}^{\left (1\right )}\right )^{k}\) and \(Z_{B}^{\left (k\right )}=\left (Z_{B}^{\left (1\right )}\right )^{k}\) for k = 0, …, n − 1;
-
2.
\(W_{T}^{\left (k\right )}W_{T}^{\left (k\right )}=W_{T}^{\left (k\right )}\) and \(W_{B}^{\left (k\right )}W_{B}^{\left (k\right )}=W_{B}^{\left (k\right )}\) for k = 0, …, n − 1;
-
3.
\(W_{T}^{\left (k\right )}W_{B}^{\left (j\right )}=W_{B}^{\left (j\right )}W_{T}^{\left (k\right )}\) for j, k = 0, …, n − 1;
-
4.
\(Z_{B}^{\left (k\right )}=Z_{B}^{\left (k\right )}W_{B}^{\left (k\right )}\) and \(Z_{T}^{\left (k\right )}=Z_{T}^{\left (k\right )}W_{T}^{\left (k\right )}\) for k = 0, …, n − 1;
-
5.
\(W_{B}^{\left (k\right )}=Z_{T}^{\left (1\right )}W_{B}^{\left (k-1\right )}Z_{B}^{\left (1\right )}\) and \(W_{T}^{\left (k\right )}=Z_{B}^{\left (1\right )}W_{T}^{\left (k-1\right )}Z_{T}\) for k = 1, …, n − 1;
-
6.
\(Z_{B}^{\left (k\right )}Z_{T}^{\left (k\right )}=W_{T}^{\left (k\right )}\) and \(Z_{T}^{\left (k\right )}Z_{B}^{\left (k\right )}=W_{B}^{\left (k\right )}\) for k = 0, …, n − 1;
-
7.
\(\left (n-1\right )I_{n\times n}=\sum _{k=1}^{n-1}\left (W_{B}^{\left (k\right )}+W_{T}^{\left (k\right )}\right ).\)
Proposition 12
If the vectors \(u_{1},\dots ,u_{N}\in \mathbb {R}^{n}\) satisfy pairwise
then they also satisfy for each k = 0, …, n − 1 the following:
Proof
It is easy to see that the condition S B u i = S T u i+1 directly implies
Let us first prove (23) by induction on k. The base case k = 1 is precisely (25). Assuming validity for k − 1 and ∀i, we have
To prove (24) we proceed as follows:
Example 1
Fig. 11
Illustration to the proof of Proposition 12. The green pair is equal, as well as the red pair. It follows that the blue elements are equal as well.
Let us now present the proof of Lemma 2.
Proof
We show equivalence in two directions.
-
\(\left (1\right )\Longrightarrow \left (2\right )\): Let MΓ = 0. Define x := D G Γ, and then further denote x i := R i x. Then on the one hand:
$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{aligned}{}x_{i} & =R_{i}D_{G}{\varGamma}\\ & =\varOmega_{i}{\varGamma} & \text{(definition of }\varOmega_{i})\\ & =D\alpha_{i}. & \left(M{\varGamma}=0\right) \end{aligned} \end{array} \end{aligned} $$On the other hand, because of (22) we have S B R i x = S T R i+1 x, and by combining the two, we conclude that S B Dα i = S T Dα i+1.
-
\(\left (2\right )\Longrightarrow \left (1\right )\): In the other direction, suppose that S B Dα i = S T Dα i+1. Denote u i := Dα i . Now consider the product Ω i Γ where Ω i = R i D G . One can easily be convinced that in fact
$$\displaystyle \begin{aligned} \varOmega_{i}{\varGamma}=\frac{1}{n}\left(\sum_{k=1}^{n-1}\left(Z_{B}^{\left(k\right)}u_{i-k}+Z_{T}^{\left(k\right)}u_{i+k}\right)+u_{i}\right). \end{aligned}$$Therefore
$$\displaystyle \begin{aligned} \begin{array}{rcl} \left(\varOmega_{i}-Q_{i}\right){\varGamma} & =&\frac{1}{n}\left(u_{i}+\sum_{k=1}^{n-1}\left(Z_{B}^{\left(k\right)}u_{i-k}+Z_{T}^{\left(k\right)}u_{i+k}\right)\right)-u_{i}\\ & =&\frac{1}{n}\left(\sum_{k=1}^{n-1}\left(W_{T}^{\left(k\right)}u_{i}+W_{B}^{k}u_{i}\right)-\left(n-1\right)u_{i}\right) \qquad \left(\text{by Proposition 11}\right)\\ & =&0.\qquad \left(\text{by Proposition 11, item 11}\right) \end{array} \end{aligned} $$Since this holds for all i, we have shown that MΓ = 0.
□
3 Appendix C: Proof of Theorem 6
Recall that \(M_{A}=\frac {1}{n}\sum _{i}R_{i}^{T}P_{s_{i}}R_{i}\). We first show that M A is a contraction.
Proposition 13
\(\left \Vert M_{A}\right \Vert _{2}\leqslant 1\).
Proof
Closely following a similar proof in [45], divide the index set \(\left \{ 1,\dots ,N\right \} \) into n groups representing non-overlapping patches: for i = 1, …, n let
Now let us move on to prove Theorem 6.
Proof
Define
4 Appendix D: Proof of Theorem 8
Recall that the signal consists of s constant segments of corresponding lengths ℓ 1, …, ℓ s . We would like to compute the MSE for every pixel within every such segment of length α := ℓ r . For each patch, the oracle provides the locations of the jump points within the patch.
Fig. 12
The oracle estimator for the pixel O in the segment (black). The orange line is patch number j = 1, …, n, and the relevant pixels are between a j and b j . The signal itself is shown to extend beyond the segment (blue line).
Now, the oracle error for the pixel is
Example 2
n = 4, α = 3
-
For k = 1:
$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat{x}_{A}^{r,k}-v & = & \frac{1}{4}\left(\frac{1}{2}+\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{0}+\frac{1}{4}\left(\frac{1}{2}+\frac{1}{3}+\frac{1}{3}\right)z_{-1}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{1}\\ & = & \underbrace{\frac{7}{24}}_{d_{-1}}z_{-1}+\underbrace{\frac{5}{12}}_{d_{0}}z_{0}+\underbrace{\frac{7}{24}}_{d_{1}}z_{1} \end{array} \end{aligned} $$ -
For k = 2:
$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat{x}_{A}^{r,k}-v & = & \frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}+1\right)z_{0}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}+\frac{1}{2}\right)z_{-1}+\frac{1}{4}\left(\frac{1}{3}+\frac{1}{3}\right)z_{-2}\\ & = & \frac{13}{24}z_{0}+\frac{7}{24}z_{-1}+\frac{1}{6}z_{-2} \end{array} \end{aligned} $$
Now consider the optimization problem
Since the z i are i.i.d., we have
By construction (26), it is not difficult to see that if \(n\geqslant \alpha \) then
On the other hand, for \(n\leqslant \frac {\alpha }{2}\) we have
5 Appendix E: Generative Models for Patch-Sparse Signals
In this section we propose a general framework aimed at generating signals from the patch-sparse model. Our approach is to construct a graph-based model for the dictionary and subsequently use this model to generate dictionaries and signals which turn out to be much richer than those considered in Section 4.
Local Support Dependencies
We start by highlighting the importance of the local connections (recall Lemma 2) between the neighboring patches of the signal and therefore between the corresponding subspaces containing those patches. This in turn allows to characterize \(\varSigma _{\mathcal {M}}\) as the set of all “realizable” paths in a certain dependency graph derived from the dictionary D. This point of view allows to describe the model \(\mathcal {M}\) using only the intrinsic properties of the dictionary, in contrast to Theorem 2.
Proposition 14
Let \(0\neq x\in \mathcal {M}\) and \({\varGamma }amma\in \rho \left (x\right )\) with \({\mathrm {supp}}{\varGamma }=\left (S_{1},\dots ,S_{P}\right )\) . Then for i = 1, …, P
where by convention rank∅ = −∞.
Proof
\(x\in \mathcal {M}\) implies by Lemma 2 that for every i = 1, …P
The preceding result suggests a way to describe all the supports in \(\varSigma _{\mathcal {M}}\) .
Definition 17
Given a dictionary D, we define an abstract directed graph \(\mathcal {G}_{D,s}=\left (V,E\right )\), with the vertex set
Remark 4
It might be impossible to compute \(\mathcal {G}_{D,s}\) in practice. However we set this issue aside for now and only explore the theoretical ramifications of its properties.
Definition 18
The set of all directed paths of length P in \(\mathcal {G}_{D,s}\), not including the self-loop \(\underbrace {\left (\emptyset ,\emptyset ,\dots \emptyset \right )}_{\times P}\), is denoted by \(\mathcal {C_{G}}\left (P\right )\).
Definition 19
A path \(\mathcal {S}\in \mathcal {C_{G}}\left (P\right )\) is called realizable if \(\dim \ker A_{\mathcal {S}}>0\). The set of all realizable paths in \(\mathcal {C_{G}}\left (P\right )\) is denoted by \(\mathcal {R_{G}}\left (P\right )\).
Thus we have the following result.
Theorem 9
Suppose \(0\neq x\in \mathcal {M}\) . Then
-
1.
Every representation \({\varGamma }=\left (\alpha _{i}\right )_{i=1}^{P}\in \rho \left (x\right )\) satisfies \({\mathrm {supp}}{\varGamma }\in \mathcal {C_{G}}\left (P\right )\) , and therefore
$$\displaystyle \begin{aligned} \varSigma_{\mathcal{M}}\subseteq\mathcal{R_{G}}\left(P\right). \end{aligned} $$(30) -
2.
The model \(\mathcal {M}\) can be characterized “intrinsically” by the dictionary as follows:
$$\displaystyle \begin{aligned} \mathcal{M}=\bigcup_{\mathcal{S}\in\mathcal{R_{G}}\left(P\right)}\ker A_{\mathcal{S}}. \end{aligned} $$(31)
Proof
Let \({\mathrm {supp}}{\varGamma }=\left (S_{1},\dots ,S_{P}\right )\) with S i = suppα i if α i ≠ 0, and S i = ∅ if α i = 0. Then by Proposition 14, we must have that
To show (31), notice that if \({\mathrm {supp}}{\varGamma }amma\in \mathcal {R_{G}}\left (P\right )\), then for every \(x\in \ker A_{{\mathrm {supp}}{\varGamma }}\), we have \(R_{i}x=P_{S_{i}}R_{i}x\), i.e., R i x = Dα i for some α i with suppα i ⊆ S i . Clearly in this case \(\left |{\mathrm {supp}}\alpha _{i}\right |\leqslant s\) and therefore \(x\in \mathcal {M}\). The other direction of (31) follows immediately from the definitions. □
Definition 20
The dictionary D is called “\(\left (s,P\right )\)-good” if
Theorem 10
The set of “ \(\left (s,P\right )\) -good” dictionaries has measure zero in the space of all n × m matrices.
Proof
Every low-rank condition defines a finite number of algebraic equations on the entries of D (given by the vanishing of all the 2s × 2s minors of \(\begin {bmatrix}S_{B}D_{S_{i}} & S_{T}D_{S_{j}}\end {bmatrix}\)). Since the number of possible graphs is finite (given fixed n, m and s), the resulting solution set is a finite union of semi-algebraic sets of low dimension and hence has measure zero. □
Constructing “Good” Dictionaries
The above considerations suggest that the good dictionaries are hard to come by; here we provide an example of an explicit construction.
We start by defining an abstract graph \(\mathcal {G}\) with some desirable properties, and subsequently look for a nontrivial realization D of the graph, so that in addition \(\mathcal {R}_{\mathcal {G}}\neq \emptyset \).
Fig. 13
A possible dependency graph \(\mathcal {G}\) with m = 10. In this example, \( \left |\mathcal {C_{G}} \left (70 \right ) \right |=37614\).
Every edge in \(\mathcal {G}\) corresponds to a conditions of the form (29) imposed on the entries of D. As discussed in Theorem 10, this in turn translates to a set of algebraic equations. So the natural idea would be to write out the large system of such equations and look for a solution over the field \(\mathbb {R}\) by well-known algorithms in numerical algebraic geometry [5]. However, this approach is highly impractical because these algorithms have (single or double) exponential running time. We consequently propose a simplified, more direct approach to the problem.
In detail, we replace the low-rank conditions (29) with more explicit and restrictive ones below.
- Assumptions(*):
-
For each \(\left (S_{i},S_{j}\right )\in \mathcal {G}\) we have \(\left |S_{i}\right |=\left |S_{j}\right |=k\). We require that \({\mathrm {span}} S_{B}D_{S_{i}}={\mathrm {span}} S_{T}D_{S_{j}}=\varLambda _{i,j}\) with \(\dim \varLambda _{i,j}=k\). Thus there exists a non-singular transfer matrix \(C_{i,j}\in \mathbb {R}^{k\times k}\) such that
$$\displaystyle \begin{aligned} S_{B}D_{S_{i}}=C_{i,j}S_{T}D_{S_{j}}. \end{aligned} $$(32)
In other words, every column in \(S_{B}D_{S_{i}}\) must be a specific linear combination of the columns in \(S_{T}D_{S_{j}}\). This is much more restrictive than the low-rank condition, but on the other hand, given the matrix C i,j , it defines a set of linear constraints on D. To summarize, the final algorithm is presented in Algorithm 5. In general, nothing guarantees that for a particular choice of \(\mathcal {G}\) and the transfer matrices, there is a nontrivial solution D; however, in practice we do find such solutions. For example, taking the graph from Figure 13 on page 162 and augmenting it with the matrices C i,j (scalars in this case), we obtain a solution over \(\mathbb {R}^{6}\) which is shown in Figure 14 on page 166. Notice that while the resulting dictionary has a Hankel-type structure similar to what we have seen previously, the additional dependencies between the atoms produce a rich signal space structure, as we shall demonstrate in the following section.
Algorithm 5 Finding a realization D of the graph G
Generating Signals
Now suppose the graph \(\mathcal {G}\) is known (or can be easily constructed). Then this gives a simple procedure to generate signals from \(\mathcal {M}\), presented in Algorithm 6.
Algorithm 6 Constructing a signal from M via G
Fig. 15
Examples of signals from \(\mathcal {M}\) and the corresponding supports \(\mathcal {S}\).
An interesting question arises: given \(\mathcal {S}\in \mathcal {C_{G}}\left (P\right )\), can we say something about \(\dim \ker A_{\mathcal {S}}\)? In particular, when is it strictly positive (i.e., when \(\mathcal {S}\in \mathcal {R_{G}}\left (P\right )\)?) While in general the question seems to be difficult, in some special cases this number can be estimated using only the properties of the local connections \(\left (S_{i},S_{i+1}\right )\), by essentially counting the additional “degrees of freedom” when moving from patch i to patch i + 1. To this effect, we prove two results.
Proposition 15
For every \(\mathcal {S}\in \mathcal {R_{G}}\left (P\right )\) , we have
Proof
Notice that
Proposition 16
Assume that the model satisfies Assumptions(*) above. Then for every \(\mathcal {S}\in \mathcal {R_{G}}\left (P\right )\)
Proof
The idea is to construct a spanning set for \(\ker M_{*}^{\left (\mathcal {S}\right )}\) and invoke Proposition 15. Let us relabel the nodes along \(\mathcal {S}\) to be 1, 2, …, P. Starting from an arbitrary α 1 with support \(\left |S_{1}\right |=k\), we use (32) to obtain, for i = 1, 2, …, P − 1, a formula for the next portion of the global representation vector Γamma
We believe that Proposition 16 can be extended to more general graphs, not necessarily satisfying Assumptions(*).In particular, the following estimate appears to hold for a general model \(\mathcal {M}\) and \(\mathcal {S}\in \mathcal {R_{G}}\left (P\right )\):
Further Remarks
While the model presented in this section is the hardest to analyze theoretically, even in the restricted case of Assumptions(*) (when does a nontrivial realization of a given \(\mathcal {G}\) exist? How does the answer depend on n? When \(\mathcal {R_{G}}\left (P\right )\neq \emptyset \)? etc?), we hope that this construction will be most useful in applications such as denoising of natural signals.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Batenkov, D., Romano, Y., Elad, M. (2017). On the Global-Local Dichotomy in Sparsity Modeling. In: Boche, H., Caire, G., Calderbank, R., März, M., Kutyniok, G., Mathar, R. (eds) Compressed Sensing and its Applications. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-69802-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-69802-1_1
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-69801-4
Online ISBN: 978-3-319-69802-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)