Compressed Sensing and its Applications pp 389417  Cite as
Explicit Matrices with the Restricted Isometry Property: Breaking the SquareRoot Bottleneck
 5 Citations
 2.3k Downloads
Abstract
Matrices with the restricted isometry property (RIP) are of particular interest in compressed sensing. To date, the best known RIP matrices are constructed using random processes, while explicit constructions are notorious for performing at the “squareroot bottleneck,” i.e., they only accept sparsity levels on the order of the square root of the number of measurements. The only known explicit matrix which surpasses this bottleneck was constructed by Bourgain, Dilworth, Ford, Konyagin, and Kutzarova in Bourgain et al. (Duke Math. J. 159:145–185, 2011). This chapter provides three contributions to advance the groundbreaking work of Bourgain et al.: (i) we develop an intuition for their matrix construction and underlying proof techniques; (ii) we prove a generalized version of their main result; and (iii) we apply this more general result to maximize the extent to which their matrix construction surpasses the squareroot bottleneck.
13.1 Introduction
Unfortunately, random matrices are not always RIP, though the failure rate vanishes asymptotically. In applications, you might wish to verify that your randomly drawn matrix actually satisfies RIP before designing your sensing platform around that matrix, but unfortunately, this is NPhard in general [2]. As such, one is forced to blindly assume that the randomly drawn matrix is RIP, and admittedly, this is a reasonable assumption considering the failure rate. Still, this is dissatisfying from a theoretical perspective, and it motivates the construction of explicit RIP matrices:
Definition 1.
For any z > 0, let ExRIP[z] denote the following statement:
There exists an explicit family of M × N matrices with arbitrarily large aspect ratio N∕M which are (K, δ)RIP with K = Ω(M^{z−ε}) for all ε > 0 and δ < 1∕3.
Since there exist (nonexplicit) matrices satisfying z = 1 above, the goal is to prove ExRIP[1]. The most common way to demonstrate that an explicit matrix Φ satisfies RIP is to leverage the pairwise incoherence between the columns of Φ. Indeed, it is straightforward to prove ExRIP[1∕2] by taking Φ to have nearoptimally incoherent unitnorm columns and appealing to interpolation of operators or Gershgorin’s circle theorem (e.g., see, [1, 11, 12]). The emergence of this “squareroot bottleneck” compelled Tao to pose the explicit construction of RIP matrices as an open problem [18]. Since then, only one construction has managed to break the bottleneck: In [8], Bourgain, Dilworth, Ford, Konyagin, and Kutzarova prove ExRIP[1∕2 +ε_{0}] for some undisclosed ε_{0} > 0. This constant has since been estimated as ε_{0} ≈ 5. 5169 × 10^{−28} [15].

What are the proof techniques that Bourgain et al. applied?

Can we optimize the analysis to increase ε_{0}?
13.2 Preliminaries
The goal of this section is to provide some intuition for the main ideas in [8]. We first explain the overall proof technique for demonstrating RIP (this is the vehicle for breaking the squareroot bottleneck), and then we introduce some basic ideas from additive combinatorics. Throughout, \(\mathbb{Z}/n\mathbb{Z}\) denotes the cyclic group of n elements, S^{ n } denotes the cartesian power of a set S (i.e., the set of ntuples with entries from S), \(\mathbb{F}_{p}\) denotes the field of p elements (in this context, p will always be prime), and \(\mathbb{F}_{p}^{{\ast}}\) is the multiplicative subgroup of \(\mathbb{F}_{p}\).
13.2.1 The BigPicture Techniques
Now let’s discuss the alternative techniques that Bourgain et al. use. The main idea is to convert the RIP statement, which concerns all Ksparse vectors simultaneously, into a statement about finitely many vectors:
Definition 2 (flat RIP).
Lemma 1 (essentially Lemma 3 in [8], cf. Theorem 13 in [3]).
If Φ has (K,θ)flat RIP and unitnorm columns, then Φ has (K,150θlog K)RIP.
Unlike the coherence argument, flat RIP doesn’t lead to much loss in K. In particular, [3] shows that random matrices satisfy (K, θ)flat RIP with θ = O(δ∕logK) when \(M =\varOmega ((K/\delta ^{2})\log ^{2}K\log N)\). As such, it makes sense that flat RIP would be a vehicle to break the squareroot bottleneck. However, in practice, it’s difficult to control both the left and righthand sides of the flat RIP inequality—it would be much easier if we only had to worry about getting cancellations, and not getting different levels of cancellation for differentsized subsets. This leads to the following:
Definition 3 (weak flat RIP).
Lemma 2 (essentially Lemma 1 in [8]).
If Φ has (K,θ′)weak flat RIP and worstcase coherence μ ≤ 1∕K, then Φ has \((K,\sqrt{\theta '})\) flat RIP.
Proof.
Unfortunately, this coherence requirement puts K back in the squareroot bottleneck, since μ ≤ 1∕K is equivalent to K ≤ 1∕μ = O(M^{1∕2}). To rectify this, Bourgain et al. use a trick in which a modest K with tiny δ can be converted to a large K with modest δ:
Lemma 3 (buried in Lemma 3 in [8], cf. Theorem 1 in [14]).
If Φ has (K,δ)RIP, then Φ has (sK,2sδ)RIP for all s ≥ 1.
In [14], this trick is used to get RIP results for larger K when testing RIP for smaller K. For the explicit RIP matrix problem, we are stuck with proving how small δ is when K on the order of M^{1∕2}. Note that this trick will inherently exhibit some loss in K. Assuming the best possible scaling for all N, K, and δ is M = Θ((K∕δ^{2})log(N∕K)), then if N = poly(M), you can get (M^{1∕2}, δ)RIP only if δ = Ω((log^{1∕2}M)∕M^{1∕4}). In this bestcase scenario, you would want to pick s = M^{1∕4−ε} for some ε > 0 and apply Lemma 3 to get K = O(M^{3∕4−ε}). In some sense, this is another manifestation of the squareroot bottleneck, but it would still be a huge achievement to saturate this bound.
13.2.2 A Brief Introduction to Additive Combinatorics
Lemma 4.
 (i)
A + A≥A
 (ii)
A − A≥A
 (iii)
E(A,A) ≤A ^{3}
with equality precisely when A is a translate of some subgroup of G.
Proof.
The proof for (ii) is similar.
The notion of additive structure is somewhat intuitive. You should think of a translate of a subgroup as having maximal additive structure. When the bounds (i), (ii) and (iii) are close to being achieved by A (e.g., A is an arithmetic progression), you should think of A as having a lot of additive structure. Interestingly, while there are different measures of additive structure (e.g.,  A − A  and E(A, A)), they often exhibit certain relationships (perhaps not surprisingly). The following is an example of such a relationship which is used throughout the paper by Bourgain et al. [8]:
Lemma 5 (Corollary 1 in [8]).
If E(A,A) ≥A ^{3} ∕K, then there exists a set A′ ⊆ A such that A′≥A∕(20K) and A′ − A′≤ 10 ^{7} K ^{9} A.
In words, a set with a lot of additive energy necessarily has a large subset with a small difference set. This is proved using a version of the Balog–Szemeredi–Gowers lemma [5].
Lemma 6.
In the next section, we will appeal to Lemma 6 to motivate the matrix construction used by Bourgain et al. [8]. We will also use some of the techniques in the proof of Lemma 6 to prove a key lemma (namely, Lemma 7):
Proof (Proof of Lemma 6).
13.3 The Matrix Construction
Lemma 7 (Lemma 9 in [8]).
Proof.
13.3.1 How to Construct \(\mathcal{B}\)
Lemma 7 enables us to prove weakflatRIPtype cancellations in cases where \(\varOmega _{1}(a_{1}),\varOmega _{2}(a_{2}) \subseteq \mathcal{B}\) both lack additive structure. Indeed, the method of [8] is to do precisely this, and the remaining cases (where either Ω_{1}(a_{1}) or Ω_{2}(a_{2}) has more additive structure) will find cancellations by accounting for the dilation weights 1∕(a_{1} − a_{2}). Overall, we will be very close to proving that Φ is RIP if most subsets of \(\mathcal{B}\) lack additive structure. To this end, Bourgain et al. [8] actually prove something much stronger: They design \(\mathcal{B}\) in such a way that all sufficiently large subsets have low additive structure. The following theorem is the first step in the design:
Theorem 1 (Theorem 5 in [8]).
We already know that large subsets of \(\mathcal{C}\) (and \(\mathcal{B}\)) exhibit low additive structure, but the above theorem only gives this in terms of the sumset, whereas Lemma 7 requires low additive structure in terms of additive energy. As such, we will first convert the above theorem into a statement about difference sets, and then apply Lemma 5 to further convert it in terms of additive energy:
Corollary 1 (essentially Corollary 3 in [8]).
Fix r, M and τ according to Theorem 1 , take \(\mathcal{B}\) as defined in (13.2) , and pick s and t such that (2τ − 1)s ≥ t. Then every subset \(B \subseteq \mathcal{B}\) such that B > p^{s} satisfies B − B > p^{t}B.
Proof.
Corollary 2 (essentially Corollary 4 in [8]).
Fix r, M and τ according to Theorem 1 , take \(\mathcal{B}\) as defined in (13.2) , and pick γ and ℓ such that (2τ − 1)(ℓ −γ) ≥ 10γ. Then for every ε > 0, there exists P such that for every p ≥ P, every subset \(S \subseteq \mathcal{B}\) with S > p^{ℓ} satisfies E(S,S) < p^{−γ+ε}S^{3}.
Proof.
Notice that we can weaken our requirements on γ and ℓ if we had a version of Lemma 5 with a smaller exponent on K. This exponent comes from a version of the Balog–Szemeredi–Gowers lemma (Lemma 6 in [8]), which follows from the proof of Lemma 2.2 in [5]. (Specifically, take A = B, and you need to change A −_{ E }B to A +_{ E }B, but this change doesn’t affect the proof.) Bourgain et al. indicate that it would be desirable to prove a better version of this lemma, but it is unclear how easy that would be.
13.3.2 How to Construct \(\mathcal{A}\)
The previous subsection showed how to construct \(\mathcal{B}\) so as to ensure that all sufficiently large subsets have low additive structure. By Lemma 7, this in turn ensures that Φ exhibits weakflatRIPtype cancellations for most \(\varOmega _{1}(a_{1}),\varOmega _{2}(a_{2}) \subseteq \mathcal{B}\). For the remaining cases, Φ must exhibit weakflatRIPtype cancellations by somehow leveraging properties of \(\mathcal{A}\).
 (i)
\(\varOmega (p^{\alpha }) \leq \vert \mathcal{A}(p)\vert \leq p^{\alpha }\).
 (ii)For each \(a \in \mathcal{A}\), then \(a_{1},\ldots,a_{2m} \in \mathcal{A}\setminus \{a\}\) satisfiesonly if (a_{1}, …, a_{ m }) and (a_{m+1}, …, a_{2m}) are permutations of each other. Here, division (and addition) is taken in the field \(\mathbb{F}_{p}\).$$\displaystyle{ \sum _{j=1}^{m} \frac{1} {a  a_{j}} =\sum _{ j=m+1}^{2m} \frac{1} {a  a_{j}} }$$(13.3)
Unfortunately, these requirements on \(\mathcal{A}\) lead to very little intuition compared to our current understanding of \(\mathcal{B}\). Regardless, we will continue by considering how Bourgain et al. constructs \(\mathcal{A}\). The following lemma describes their construction and makes a slight improvement to the value of α chosen in [8]:
Lemma 8.
The original proof of this result is sketched after the statement of Lemma 2 in [8]. Unfortunately, this proof contains a technical error: The authors conclude that a prime p does not divide some integer D_{1}D_{2}V since V ≠ 0, p does not divide D_{1} and  D_{2}V  < p, but the conclusion is invalid since D_{2}V is not necessarily an integer. The following alternative proof removes this difficulty:
Proof (Proof of Lemma 8).
13.4 The Main Result
We are now ready to state the main result of this chapter, which is a generalization of the main result in [8]. Later in this section, we will maximize ε_{0} such that this result implies \(\mathrm{ExRIP}[1/2 +\epsilon _{0}]\) with the matrix construction from [8].
Theorem 2 (The BDFKK restricted isometry machine).
 (a)For every sufficiently large p, and for every \(a \in \mathcal{A}\) and \(a_{1},\ldots,a_{2m} \in \mathcal{A}\setminus \{a\}\) ,only if (a _{1} ,…,a _{m} ) and (a _{m+1} ,…,a _{2m} ) are permutations of each other. Here, division (and addition) is taken in the field \(\mathbb{F}_{p}\) .$$\displaystyle{ \sum _{j=1}^{m} \frac{1} {a  a_{j}} =\sum _{ j=m+1}^{2m} \frac{1} {a  a_{j}} }$$
 (b)
For every ε > 0, there exists P = P(ε) such that for every p ≥ P, every subset \(S \subseteq \mathcal{B}(p)\) with S≥ p ^{ℓ} satisfies E(S,S) ≤ p ^{−γ+ε} S ^{3} .
What follows is a generalized version of the statement of Lemma 10 in [8], which we then use in the hypothesis of a generalized version of Lemma 2 in [8]:
Definition 4.
Let \(\mathrm{L10} =\mathrm{ L10}[\alpha _{1},\alpha _{2},k_{0},k_{1},k_{2},m,y]\) denote the following statement about subsets \(\mathcal{A} = \mathcal{A}(p)\) and \(\mathcal{B} = \mathcal{B}(p)\) of \(\mathbb{F}_{p}\) for \(p\) prime:
For every ε > 0, there exists P > 0 such that for every prime p ≥ P the following holds:
Lemma 9 (generalized version of Lemma 2 in [8]).
Take \(\varOmega _{1},\varOmega _{2} \subseteq \mathcal{A}\times \mathcal{B}\) for which there exist powers of two M_{1},M_{2} such that (13.17) and (13.18) hold for i = 1,2 and every a_{i} ∈ A_{i}. Then (13.14) satisfies \(\vert S(A_{1},A_{2})\vert \leq p^{1\epsilon _{1}\epsilon }\).
The following result gives sufficient conditions for L10, and thus Lemma 9 above:
Lemma 10 (generalized version of Lemma 10 in [8]).
Suppose \(\mathcal{A}\) satisfies hypothesis (a) in Theorem 2 . Then L10 is true with k_{0} = c_{0}∕8, k_{1} = 1∕4 and k_{2} = 9∕8 provided (13.12) and (13.13) are satisfied.
These lemmas are proved in Section 13.5. With these in hand, we are ready to prove the main result:
Proof (Proof of Theorem 2).
For this particular construction of \(\mathcal{A}\) and \(\mathcal{B}\), the remaining bottlenecks may lie at the very foundations of additive combinatorics. For example, it is currently known that the constant c_{0} from Proposition 2 in [6] satisfies 1∕10430 ≤ c_{0} < 1. If c_{0} = 1∕2 (say), then taking m = 10, 000 leads to ε_{0} being on the order of 10^{−12}. Another source of improvement is Lemma 5 (Corollary 1 in [8]), which is proved using a version of the Balog–Szemeredi–Gowers lemma [5]. Specifically, the power of K in Lemma 5 is precisely the coefficient of x in (13.9), as well as the coefficient of γ (less 1) in the righthand side of (13.24); as such, decreasing this exponent would in turn enlarge the feasibility region. An alternative construction for \(\mathcal{A}\) is proposed in [7], and it would be interesting to optimize this construction as well, though the bottlenecks involving c_{0} and the Balog–Szemeredi–Gowers lemma are also present in this alternative.
13.5 Proofs of Technical Lemmas
This section contains the proofs of the technical lemmas (Lemmas 9 and 10) which were used to prove the main result (Theorem 2).
13.5.1 Proof of Lemma 9
13.5.2 Proof of Lemma 10
Footnotes
Notes
Acknowledgements
The author thanks the anonymous referees for their helpful suggestions. This work was supported by NSF Grant No. DMS1321779. The views expressed in this chapter are those of the author and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the U.S. Government.
References
 1.Applebaum, L., Howard, S.D., Searle, S., Calderbank, R.: Chirp sensing codes: deterministic compressed sensing measurements for fast recovery. Appl. Comput. Harmon. Anal. 26, 283–290 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Bandeira, A.S., Dobriban, E., Mixon, D.G., Sawin, W.F.: Certifying the restricted isometry property is hard. IEEE Trans. Inf. Theory 59, 3448–3450 (2013)MathSciNetCrossRefGoogle Scholar
 3.Bandeira, A.S., Fickus, M., Mixon, D.G., Wong, P.: The road to deterministic matrices with the restricted isometry property. J. Fourier Anal. Appl. 19, 1123–1149 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 4.Baraniuk, R., Davenport, M., DeVore, R., Wakin, M.: A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28, 253–263 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 5.Bourgain, J., Garaev, M.Z.: On a variant of sumproduct estimates and explicit exponential sum bounds in prime fields. Math. Proc. Camb. Philos. Soc. 146, 1–21 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 6.Bourgain, J., Glibichuk, A.: Exponential sum estimate over subgroup in an arbitrary finite field. http://www.math.ias.edu/files/avi/Bourgain_Glibichuk.pdf (2011)
 7.Bourgain, J., Dilworth, S.J., Ford, K., Konyagin, S.V., Kutzarova, D.: Breaking the k ^{2} barrier for explicit RIP matrices. In: STOC 2011, pp. 637–644 (2011)MathSciNetGoogle Scholar
 8.Bourgain, J., Dilworth, S.J., Ford, K., Konyagin, S., Kutzarova, D.: Explicit constructions of RIP matrices and related problems. Duke Math. J. 159, 145–185 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 9.Cai, T.T., Zhang, A.: Sharp RIP bound for sparse signal and lowrank matrix recovery. Appl. Comput. Harmon. Anal. 35, 74–93 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Casazza, P.G., Fickus, M.: Fourier transforms of finite chirps. EURASIP J. Appl. Signal Process. 2006, 7 p (2006)Google Scholar
 11.DeVore, R.A.: Deterministic constructions of compressed sensing matrices. J. Complexity 23, 918–925 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 12.Fickus, M., Mixon, D.G., Tremain, J.C.: Steiner equiangular tight frames. Linear Algebra Appl. 436, 1014–1027 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 13.Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, Berlin (2013)CrossRefzbMATHGoogle Scholar
 14.Koiran, P., Zouzias, A.: Hidden cliques and the certification of the restricted isometry property. arXiv:1211.0665 (2012)Google Scholar
 15.Mixon, D.G.: Deterministic RIP matrices: breaking the squareroot bottleneck, short, fat matrices (weblog). http://www.dustingmixon.wordpress.com/2013/12/02/deterministicripmatricesbreakingthesquarerootbottleneck/ (2013)
 16.Mixon, D.G.: Deterministic RIP matrices: breaking the squareroot bottleneck, II, short, fat matrices (weblog). http://www.dustingmixon.wordpress.com/2013/12/11/deterministicripmatricesbreakingthesquarerootbottleneckii/ (2013)
 17.Mixon, D.G.: Deterministic RIP matrices: breaking the squareroot bottleneck, III, short, fat matrices (weblog). http://www.dustingmixon.wordpress.com/2014/01/14/deterministicripmatricesbreakingthesquarerootbottleneckiii/ (2013)
 18.Tao, T.: Open question: deterministic UUP matrices. What’s new (weblog). http://www.terrytao.wordpress.com/2007/07/02/openquestiondeterministicuupmatrices/ (2007)
 19.Tao, T., Vu, V.H.: Additive Combinatorics. Cambridge University Press, Cambridge (2006)CrossRefzbMATHGoogle Scholar
 20.Welch, L.R.: Lower bounds on the maximum cross correlation of signals. IEEE Trans. Inform. Theory 20, 397–399 (1974)MathSciNetCrossRefzbMATHGoogle Scholar