An Invitation to Compressive Sensing
Abstract
This first chapter formulates the objectives of compressive sensing. It introduces the standard compressive problem studied throughout the book and reveals its ubiquity in many concrete situations by providing a selection of motivations, applications, and extensions of the theory. It concludes with an overview of the book that summarizes the content of each of the following chapters.
Keywords
sparsity compressibility algorithms random matrices stability singlepixel camera magnetic resonance imaging radar sampling theory sparse approximation error correction statistics and machine learning lowrank matrix recovery and matrix completionThis first chapter introduces the standard compressive sensing problem and gives an overview of the content of this book. Since the mathematical theory is highly motivated by reallife problems, we also briefly describe some of the potential applications.
1.1 What is Compressive Sensing?
The matrix \(\mathbf{A}\,\in \,{\mathbb{C}}^{m\times N}\) models the linear measurement (information) process. Then one tries to recover the vector \(\mathbf{x}\,\in \,{\mathbb{C}}^{N}\) by solving the above linear system. Traditional wisdom suggests that the number m of measurements, i.e., the amount of measured data, must be at least as large as the signal length N (the number of components of \(\mathbf{x}\)). This principle is the basis for most devices used in current technology, such as analogtodigital conversion, medical imaging, radar, and mobile communication. Indeed, if m < N, then classical linear algebra indicates that the linear system (1.1) is underdetermined and that there are infinitely many solutions (provided, of course, that there exists at least one). In other words, without additional information, it is impossible to recover \(\mathbf{x}\) from \(\mathbf{y}\) in the case m < N. This fact also relates to the Shannon sampling theorem, which states that the sampling rate of a continuoustime signal must be twice its highest frequency in order to ensure reconstruction.
Thus, it came as a surprise that under certain assumptions it is actually possible to reconstruct signals when the number m of available measurements is smaller than the signal length N. Even more surprisingly, efficient algorithms do exist for the reconstruction. The underlying assumption which makes all this possible is sparsity. The research area associated to this phenomenon has become known as compressive sensing, compressed sensing, compressive sampling, or sparse recovery. This whole book is devoted to the mathematics underlying this field.
Let us consider again the acquisition of a signal and the resulting measured data. With the additional knowledge that the signal is sparse or compressible, the traditional approach of taking at least as many measurements as the signal length seems to waste resources: At first, substantial efforts are devoted to measuring all entries of the signal and then most coefficients are discarded in the compressed version. Instead, one would want to acquire the compressed version of a signal “directly” via significantly fewer measured data than the signal length—exploiting the sparsity or compressibility of the signal. In other words, we would like to compressively sense a compressible signal! This constitutes the basic goal of compressive sensing.
We emphasize that the main difficulty here lies in the locations of the nonzero entries of the vector \(\mathbf{x}\) not being known beforehand. If they were, one would simply reduce the matrix \(\mathbf{A}\) to the columns indexed by this location set. The resulting system of linear equations then becomes overdetermined and one can solve for the nonzero entries of the signal. Not knowing the nonzero locations of the vector to be reconstructed introduces some nonlinearity since ssparse vectors (those having at most s nonzero coefficients) form a nonlinear set. Indeed, adding two ssparse vectors gives a 2ssparse vector in general. Thus, any successful reconstruction method will necessarily be nonlinear.
Intuitively, the complexity or “intrinsic” information content of a compressible signal is much smaller than its signal length (otherwise compression would not be possible). So one may argue that the required amount of data (number of measurements) should be proportional to this intrinsic information content rather than the signal length. Nevertheless, it is not immediately clear how to achieve the reconstruction in this scenario.

How should one design the linear measurement process? In other words, what matrices \(\mathbf{A} \in {\mathbb{C}}^{m\times N}\) are suitable?

How can one reconstruct \(\mathbf{x}\) from \(\mathbf{y} = \mathbf{A}\mathbf{x}\)? In other words, what are efficient reconstruction algorithms?
These two questions are not entirely independent, as the reconstruction algorithm needs to take \(\mathbf{A}\) into account, but we will see that one can often separate the analysis of the matrix \(\mathbf{A}\) from the analysis of the algorithm.
Let us notice that the first question is by far not trivial. In fact, compressive sensing is not fitted for arbitrary matrices \(\mathbf{A} \in {\mathbb{C}}^{m\times N}\). For instance, if \(\mathbf{A}\) is made of rows of the identity matrix, then \(\mathbf{y} = \mathbf{A}\mathbf{x}\) simply picks some entries of \(\mathbf{x}\), and hence, it contains mostly zero entries. In particular, no information is obtained about the nonzero entries of \(\mathbf{x}\) not caught in \(\mathbf{y}\), and the reconstruction appears impossible for such a matrix \(\mathbf{A}\). Therefore, compressive sensing is not only concerned with the recovery algorithm—the first question on the design of the measurement matrix is equally important and delicate. We also emphasize that the matrix \(\mathbf{A}\) should ideally be designed for all signals \(\mathbf{x}\) simultaneously, with a measurement process which is nonadaptive in the sense that the type of measurements for the datum y _{ j } (i.e., the jth row of \(\mathbf{A}\)) does not depend on the previously observed data \(y_{1}, \ldots,y_{j1}\). As it turns out, adaptive measurements do not provide better theoretical performance in general (at least in a sense to be made precise in Chap. 10).
According to (1.3), the amount m of data needed to recover ssparse vectors scales linearly in s, while the signal length N only has a mild logarithmic influence. In particular, if the sparsity s is small compared to N, then the number m of measurements can also be chosen small in comparison to N, so that exact solutions of an underdetermined system of linear equations become plausible! This fascinating discovery impacts many potential applications.
The outlined recovery result extends from Gaussian random matrices to the more practical situation encountered in sampling theory. Here, assuming that a function of interest has a sparse expansion in a suitable orthogonal system (in trigonometric monomials, say), it can be recovered from a small number of randomly chosen samples (point evaluations) via ℓ _{1}minimization or several other methods. This connection to sampling theory explains the alternative name compressive sampling.
1.2 Applications, Motivations, and Extensions
In this section, we highlight a selection of problems that reduce to or can be modeled as the standard compressive sensing problem. We hope to thereby convince the reader of its ubiquity. The variations presented here take different flavors: technological applications (singlepixel camera, magnetic resonance imaging, radar), scientific motivations (sampling theory, sparse approximation, error correction, statistics and machine learning), and theoretical extensions (lowrank recovery, matrix completion). We do not delve into the technical details that would be necessary for a total comprehension. Instead, we adopt an informal style and we focus on the description of an idealized mathematical model. Pointers to references treating the details in much more depth are given in the Notes section concluding the chapter.
1.2.1 SinglePixel Camera
Compressive sensing techniques are implemented in a device called the singlepixel camera. The idea is to correlate in hardware a realworld image with independent realizations of Bernoulli random vectors and to measure these correlations (inner products) on a single pixel. It suffices to measure only a small number of such random inner products in order to reconstruct images via sparse recovery methods.
For the purpose of this exposition, images are represented via gray values of pixels collected in the vector \(\mathbf{z} \in {\mathbb{R}}^{N}\), where N = N _{1} N _{2} and N _{1}, N _{2} denote the width and height of the image in pixels. Images are not usually sparse in the canonical (pixel) basis, but they are often sparse after a suitable transformation, for instance, a wavelet transform or discrete cosine transform. This means that one can write \(\mathbf{z} = \mathbf{W}\mathbf{x}\), where \(\mathbf{x} \in {\mathbb{R}}^{N}\) is a sparse or compressible vector and \(\mathbf{W} \in {\mathbb{R}}^{N\times N}\) is a unitary matrix representing the transform.
Although the singlepixel camera is more a proof of concept than a new trend in camera design, it is quite conceivable that similar devices will be used for different imaging tasks. In particular, for certain wavelengths outside the visible spectrum, it is impossible or at least very expensive to build chips with millions of sensor pixels on an area of only several square millimeters. In such a context, the potential of a technology based on compressive sensing is expected to really pay off.
1.2.2 Magnetic Resonance Imaging
Magnetic resonance imaging (MRI) is a common technology in medical imaging used for various tasks such as brain imaging, angiography (examination of blood vessels), and dynamic heart imaging. In traditional approaches (essentially based on the Shannon sampling theorem), the measurement time to produce highresolution images can be excessive (several minutes or hours depending on the task) in clinical situations. For instance, heart patients cannot be expected to hold their breath for too long a time, and children are too impatient to sit still for more than about two minutes. In such situations, the use of compressive sensing to achieve highresolution images based on few samples appears promising.
MRI relies on the interaction of a strong magnetic field with the hydrogen nuclei (protons) contained in the body’s water molecules. A static magnetic field polarizes the spin of the protons resulting in a magnetic moment. Applying an additional radio frequency excitation field produces a precessing magnetization transverse to the static field. The precession frequency depends linearly on the strength of the magnetic field. The generated electromagnetic field can be detected by sensors. Imposing further magnetic fields with a spatially dependent strength, the precession frequency depends on the spatial position as well. Exploiting the fact that the transverse magnetization depends on the physical properties of the tissue (for instance, proton density) allows one to reconstruct an image of the body from the measured signal.
In conclusion, the signal measured by the MRI system is the Fourier transform of the spatially dependent magnitude of the magnetization  X  (the image), subsampled on the curve \(\{\mathbf{k}(t): t \in [0,T]\} \subset {\mathbb{R}}^{3}\). By repeating several radio frequency excitations with modified parameters, one obtains samples of the Fourier transform of  X  along several curves \(\mathbf{k}_{1}, \ldots,\mathbf{k}_{L}\) in \({\mathbb{R}}^{3}\). The required measurement time is proportional to the number L of such curves, and we would like to minimize this number L.
The challenge is to determine good sampling sets K with small size that still ensure recovery of sparse images. The theory currently available predicts that sampling sets K chosen uniformly at random among all possible sets of cardinality m work well (at least when \(\mathbf{W}\) is the identity matrix). Indeed, the results of Chap. 12 guarantee that an ssparse \(\mathbf{x}^{\prime} \in {\mathbb{C}}^{N}\) can be reconstructed by ℓ _{1}minimization if m ≥ C slnN.
Unfortunately, such random sets K are difficult to realize in practice due to the continuity constraints of the trajectories curves \(\mathbf{k}_{1}, \ldots,\mathbf{k}_{L}\). Therefore, good realizable sets K are investigated empirically. One option that seems to work well takes the trajectories as parallel lines in \({\mathbb{R}}^{3}\) whose intersections with a coordinate plane are chosen uniformly at random. This gives some sort of approximation to the case where K is “completely” random. Other choices such as perturbed spirals are also possible.
1.2.3 Radar
Although the Alltop window works well in practice, the theoretical guarantees currently available are somewhat limited due to the fact that \(\mathbf{g}\) is deterministic. As an alternative consistent with the general philosophy of compressive sensing, one can choose \(\mathbf{g} \in {\mathbb{C}}^{m}\) at random, for instance, as a Bernoulli vector with independent ± 1 entries. In this case, it is known that an ssparse vector \(\mathbf{x} \in {\mathbb{C}}^{{m}^{2} }\) can be recovered from \(\mathbf{y} = \mathbf{B}\mathbf{x} \in {\mathbb{C}}^{m}\) provided s ≤ C m ∕ lnm. More information can be found in the Notes section of Chap. 12.
1.2.4 Sampling Theory
Reconstructing a continuoustime signal from a discrete set of samples is an important task in many technological and scientific applications. Examples include image processing, sensor technology in general, and analogtodigital conversion appearing, for instance, in audio entertainment systems or mobile communication devices. Currently, most sampling techniques rely on the Shannon sampling theorem, which states that a function of bandwidth B has to be sampled at the rate 2B in order to ensure reconstruction.
1.2.5 Sparse Approximation
Compressive sensing builds on the empirical observation that many types of signals can be approximated by sparse ones. In this sense, compressive sensing can be seen as a subfield of sparse approximation. There is a specific problem in sparse approximation similar to the standard compressive sensing problem of recovering a sparse vector \(\mathbf{x} \in {\mathbb{C}}^{N}\) from the incomplete information \(\mathbf{y} = \mathbf{A}\mathbf{x} \in {\mathbb{C}}^{m}\) with m < N.
Suppose that a vector \(\mathbf{y} \in {\mathbb{C}}^{m}\) (usually a signal or an image in applications) is to be represented as a linear combination of prescribed elements \(\mathbf{a}_{1},\ldots,\mathbf{a}_{N} \in {\mathbb{C}}^{m}\) such that \(\mathrm{span}\{\mathbf{a}_{1},\ldots,\mathbf{a}_{N}\} = {\mathbb{C}}^{m}\). The system \((\mathbf{a}_{1},\ldots,\mathbf{a}_{N})\) is often called a dictionary. Note that this system may be linearly dependent (redundant) since we allow N > m. Redundancy may be desired when linearly independence is too restrictive. For instance, in time–frequency analysis, bases of time–frequency shifts elements are only possible if the generator has poor time–frequency concentration—this is the Balian–Low theorem. Unions of several bases are also of interest. In such situations, a representation \(\mathbf{y} =\sum _{ j=1}^{N}x_{j}\mathbf{a}_{j}\) is not unique. Traditionally, one removes this drawback by considering a representation with the smallest number of terms, i.e., a sparsest representation.
There are, however, some differences in philosophy compared to the compressive sensing problem. In the latter, one is often free to design the matrix \(\mathbf{A}\) with appropriate properties, while \(\mathbf{A}\) is usually prescribed in the context of sparse approximation. In particular, it is not realistic to rely on randomness as in compressive sensing. Since it is hard to verify the conditions ensuring sparse recovery in the optimal parameter regime (m linear in s up to logarithmic factors), the theoretical guarantees fall short of the ones encountered for random matrices. An exception to this rule of thumb will be covered in Chap. 14 where recovery guarantees are obtained for randomly chosen signals.
The second difference between sparse approximation and compressive sensing appears in the targeted error estimates. In compressive sensing, one is interested in the error \(\\mathbf{x} {\mathbf{x}}^{\sharp }\\) at the coefficient level, where \(\mathbf{x}\) and \({\mathbf{x}}^{\sharp }\) are the original and reconstructed coefficient vectors, respectively, while in sparse approximation, the goal is to approximate a given \(\mathbf{y}\) with a sparse expansion \({\mathbf{y}}^{\sharp } =\sum _{j}x_{j}^{\sharp }\mathbf{a}_{j}\), so one is rather interested in \(\\mathbf{y} {\mathbf{y}}^{\sharp }\\). An estimate for \(\\mathbf{x} {\mathbf{x}}^{\sharp }\\) often yields an estimate for \(\\mathbf{y} {\mathbf{y}}^{\sharp }\ =\ \mathbf{A}(\mathbf{x} {\mathbf{x}}^{\sharp })\\), but the converse is not generally true.

Compression. Suppose that we have found a sparse approximation \(\hat{\mathbf{y}} = \mathbf{A}\hat{\mathbf{x}}\) of a signal \(\mathbf{y}\) with a sparse vector \(\hat{\mathbf{x}}\). Then storing \(\hat{\mathbf{y}}\) amounts to storing only the nonzero coefficients of \(\hat{\mathbf{x}}\). Since \(\hat{\mathbf{x}}\) is sparse, significantly less memory is required than for storing the entries of the original signal \(\mathbf{y}\).
 Denoising. Suppose that we observe a noisy version \(\tilde{\mathbf{y}} = \mathbf{y} + \mathbf{e}\) of a signal \(\mathbf{y}\), where \(\mathbf{e}\) represents a noise vector with \(\\mathbf{e}\ \leq \eta\). The task is then to remove the noise and to recover a good approximation of the original signal \(\mathbf{y}\). In general, if nothing is known about \(\mathbf{y}\), this problem becomes illposed. However, assuming that \(\mathbf{y}\) can be well represented by a sparse expansion, a reasonable approach consists in taking a sparse approximation of \(\tilde{\mathbf{y}}\). More precisely, we ideally choose the solution \(\hat{\mathbf{x}}\) of the ℓ _{0}minimization problem (P_{0,η }) with \(\mathbf{y}\) replaced by the known signal \(\tilde{\mathbf{y}}\). Then we form \(\hat{\mathbf{y}} = \mathbf{A}\hat{\mathbf{x}}\) as the denoised version of \(\mathbf{y}\). For a computationally tractable approach, one replaces the NPhard problem (P_{0,η }) by one of the compressive sensing (sparse approximation) algorithms, for instance, the ℓ _{1}minimization variant (1.4) which takes noise into account, or the socalled basis pursuit denoising problem$$\displaystyle{\mathrm{minimize\;}\lambda \\mathbf{z}\_{1} +\ \mathbf{A}\mathbf{z} \mathbf{y}\_{2}^{2}.}$$

Data Separation. Suppose that a vector \(\mathbf{y} \in {\mathbb{C}}^{m}\) is the composition of two (or more) components, say \(\mathbf{y} = \mathbf{y}_{1} + \mathbf{y}_{2}\). Given \(\mathbf{y}\), we wish to extract the unknown vectors \(\mathbf{y}_{1},\mathbf{y}_{2} \in {\mathbb{C}}^{m}\). This problem appears in several signal processing tasks. For instance, astronomers would like to separate point structures (stars, galaxy clusters) from filaments in their images. Similarly, an audio processing task consists in separating harmonic components (pure sinusoids) from short peaks.
Without additional assumption, this separation problem is illposed. However, if both components \(\mathbf{y}_{1}\) and \(\mathbf{y}_{2}\) have sparse representations in dictionaries \((\mathbf{a}_{1},\ldots,\mathbf{a}_{N_{1}})\) and \((\mathbf{b}_{1},\ldots,\mathbf{b}_{N_{2}})\) of different nature (for instance, sinusoids and spikes), then the situation changes. We can then writewhere the matrix \(\mathbf{A} \in {\mathbb{C}}^{m\times (N_{1}+N_{2})}\) has columns \(\mathbf{a}_{1},\ldots,\mathbf{a}_{N_{1}},\mathbf{b}_{1},\ldots,\mathbf{b}_{N_{2}}\) and the vector \(\mathbf{x} = {[x_{1,1},\ldots,x_{1,N_{1}},x_{2,1},\ldots,x_{2,N_{2}}]}^{\top }\) is sparse. The compressive sensing methodology then allows one—under certain conditions—to determine the coefficient vector \(\mathbf{x}\), hence to derive the two components \(\mathbf{y}_{1} =\sum _{ j=1}^{N_{1}}x_{1,j}\mathbf{a}_{j}\) and \(\mathbf{y}_{2} =\sum _{ j=1}^{N_{2}}x_{2,j}\mathbf{b}_{j}\).$$\displaystyle{\mathbf{y} =\sum _{ j=1}^{N_{1} }x_{1,j}\mathbf{a}_{j} +\sum _{ j=1}^{N_{2} }x_{2,j}\mathbf{b}_{j} = \mathbf{A}\mathbf{x},}$$
1.2.6 Error Correction
In every realistic data transmission device, pieces of data are occasionally corrupted. To overcome this unavoidable issue, one designs schemes for the correction of such errors provided they do not occur too often.
Suppose that we have to transmit a vector \(\mathbf{z} \in {\mathbb{R}}^{n}\). A standard strategy is to encode it into a vector \(\mathbf{v} = \mathbf{B}\mathbf{z} \in {\mathbb{R}}^{N}\) of length N = n + m, where \(\mathbf{B} \in {\mathbb{R}}^{N\times n}\). Intuitively, the redundancy in \(\mathbf{B}\) (due to N > n) should help in identifying transmission errors. The number m reflects the amount of redundancy.
For concreteness of the scheme, we may choose a matrix \(\mathbf{A} \in {\mathbb{R}}^{m\times N}\) as a suitable compressive sensing matrix, for instance, a Gaussian random matrix. Then we select the matrix \(\mathbf{B} \in {\mathbb{R}}^{N\times n}\) with n + m = N in such a way that its columns span the orthogonal complement of the row space of \(\mathbf{A}\), thus guaranteeing that \(\mathbf{A}\mathbf{B} = \mathbf{0}\). With these choices, we are able to correct a number s of transmission errors as large as C m ∕ ln(N ∕ m).
1.2.7 Statistics and Machine Learning
In practice, the number N of parameters is often much larger than the number m of observations, so even without noise, the problem of fitting the parameter \(\mathbf{x}\) is illposed without further assumption. In many cases, however, only a small number of parameters contribute towards the effect to be predicted, but it is a priori unknown which of these parameters are influential. This leads to sparsity in the vector \(\mathbf{x}\), and again we arrive at the standard compressive sensing problem. In statistical terms, determining a sparse parameter vector \(\mathbf{x}\) corresponds to selecting the relevant explanatory variables, i.e., the support of \(\mathbf{x}\). One also speaks of model selection.
1.2.8 LowRank Matrix Recovery and Matrix Completion
Let us finally describe an extension of compressive sensing together with some of its applications. Rather than recovering a sparse vector \(\mathbf{x} \in {\mathbb{C}}^{N}\), we now aim at recovering a matrix \(\mathbf{X} \in {\mathbb{C}}^{n_{1}\times n_{2}}\) from incomplete information. Sparsity is replaced by the assumption that \(\mathbf{X}\) has low rank. Indeed, the small complexity of the set of matrices with a given low rank compared to the set of all matrices makes the recovery of such matrices plausible.
As a popular special case, the matrix completion problem seeks to fill in missing entries of a lowrank matrix. Thus, the measurement map \(\mathcal{A}\) samples the entries \(\mathcal{A}(\mathbf{X})_{\ell} = X_{j,k}\) for some indices j, k depending on ℓ. This setup appears, for example, in consumer taste prediction. Assume that an (online) store sells products indexed by the rows of the matrix and consumers—indexed by the columns—are able to rate these products. Not every consumer will rate every product, so only a limited number of entries of this matrix are available. For purposes of individualized advertisement, the store is interested in predicting the whole matrix of consumer ratings. Often, if two customers both like some subset of products, then they will also both like or dislike other subsets of products (the “types” of customers are essentially limited). For this reason, it can be assumed that the matrix of ratings has (at least approximately) low rank, which is confirmed empirically. Therefore, methods from lowrank matrix recovery, including the nuclear norm minimization approach, apply in this setup.
Although certainly interesting, we will not treat lowrank recovery extensively in this book. Nevertheless, due to the close analogy with sparse recovery, the main results are covered in exercises, and the reader is invited to work through them.
1.3 Overview of the Book
Before studying the standard compressive sensing problem on a technical level, it is beneficial to draw a road map of the basic results and solving strategies presented in this book.
In order to circumvent the computational bottleneck of ℓ _{0}minimization, we introduce several tractable alternatives in Chap. 3. Here, rather than a detailed analysis, we only present some intuitive justification and elementary results for these recovery algorithms. They can be subsumed under roughly three categories: optimization methods, greedy methods, and thresholdingbased methods. The optimization approaches include the ℓ _{1}minimization (1.2) (also called basis pursuit) and the quadratically constrained ℓ _{1}minimization (1.4) (sometimes also called basis pursuit denoising in the literature), which takes potential measurement error into account. These minimization problems can be solved with various methods from convex optimization such as interiorpoint methods. We will also present specialized numerical methods for ℓ _{1}minimization later in Chap. 15.
Orthogonal matching pursuit is a greedy method that builds up the support set of the reconstructed sparse vector iteratively by adding one index to the current support set at each iteration. The selection process is greedy because the index is chosen to minimize the residual at each iteration. Another greedy method is compressive sampling matching pursuit (CoSaMP). At each iteration, it selects several elements of the support set and then refines this selection.
Chapter 4 is devoted to the analysis of basis pursuit (ℓ _{1}minimization). First, we derive conditions for the exact recovery of sparse vectors. The null space property of order s is a necessary and sufficient condition (on the matrix \(\mathbf{A}\)) for the success of exact recovery of all ssparse vectors \(\mathbf{x}\) from \(\mathbf{y} = \mathbf{A}\mathbf{x}\) via ℓ _{1}minimization. It basically requires that every vector in the null space of \(\mathbf{A}\) is far from being sparse. This is natural, since a nonzero vector \(\mathbf{x} \in \ker \mathbf{A}\) cannot be distinguished from the zero vector using \(\mathbf{y} = \mathbf{A}\mathbf{x} = \mathbf{0}\). Next, we refine the null space property—introducing the stable null space property and the robust null space property—to ensure that ℓ _{1}recovery is stable under sparsity defect and robust under measurement error. We also derive conditions that ensure the ℓ _{1}recovery of an individual sparse vector. These conditions (on the vector \(\mathbf{x}\) and the matrix \(\mathbf{A}\)) are useful in later chapters to establish socalled nonuniform recovery results for randomly chosen measurement matrices. The chapter is brought to an end with two small detours. The first one is a geometric interpretation of conditions for exact recovery. The second one considers lowrank recovery and the nuclear norm minimization (1.10). The success of the latter is shown to be equivalent to a suitable adaptation of the null space property. Further results concerning lowrank recovery are treated in exercises spread throughout the book.
Chapter 6 starts with basic results on the restricted isometry constants. For instance, there is the relation δ _{2} = μ with the coherence when the columns of \(\mathbf{A}\) are ℓ _{2}normalized. In this sense, restricted isometry constants generalize the coherence by considering all stuples rather than all pairs of columns. Other relations include the simple (and quite pessimistic) bound δ _{ s } ≤ (s − 1)μ, which can be derived directly from Gershgorin’s disk theorem.
BP  IHT  HTP  OMP  CoSaMP  

δ _{2s } < 0. 6248  δ _{3s } < 0. 5773  δ _{3s } < 0. 5773  δ _{13s } < 0. 1666  δ _{4s } < 0. 4782 
At the time of writing, finding explicit (deterministic) constructions of matrices satisfying (1.13) in the regime where m scales linearly in s up to logarithmic factors is an open problem. The reason lies in the fact that usual tools (such as Gershgorin’s theorem) to estimate condition numbers essentially involve the coherence (or ℓ _{1}coherence function), as in δ _{ κ s } ≤ (κ s − 1)μ. Bounding the latter by a fixed δ _{ ∗ } still faces the quadratic bottleneck (1.12).
We resolve this issue by passing to random matrices. Then a whole new set of tools from probability theory becomes available. When the matrix \(\mathbf{A}\) is drawn at random, these tools enable to show that the restricted isometry property or other conditions ensuring recovery hold with high probability provided m ≥ C sln(N ∕ s). Chapters 7 and 8 introduce all the necessary background on probability theory.
We start in Chap. 7 by recalling basic concepts such as expectation, moments, Gaussian random variables and vectors, and Jensen’s inequality. Next, we treat the relation between the moments of a random variable and its tails. Bounds on the tails of sums of independent random variables will be essential later, and Cramér’s theorem provides general estimates involving the moment generating functions of the random variables. Hoeffding’s inequality specializes to the sum of independent bounded meanzero random variables. Gaussian and Rademacher/Bernoulli variables (the latter taking the values + 1 or − 1 with equal probability) fall into the larger class of subgaussian random variables, for which we also present basic results. Finally, Bernstein inequalities refine Hoeffding’s inequality by taking into account the variance of the random variables. Furthermore, they extend to possibly unbounded subexponential random variables.
For many compressive sensing results with Gaussian or Bernoulli random matrices—that is, for large parts of Chaps. 9 and 11, including bounds for the restricted isometry constants—the relatively simple tools of Chap. 7 are already sufficient. Several topics in compressive sensing, however, notably the analysis of random partial Fourier matrices, build on more advanced tools from probability theory. Chapter 8 presents the required material. For instance, we cover Rademacher sums of the form \(\sum _{j}\epsilon _{j}a_{j}\) where the ε _{ j } = ± 1 are independent Rademacher variables and the symmetrization technique leading to such sums. Khintchine inequalities bound the moments of Rademacher sums. The noncommutative Bernstein inequality provides a tail bound for the operator norm of independent meanzero random matrices. Dudley’s inequality bounds the expected supremum over a family of random variables by a geometric quantity of the set indexing the family. Concentration of measure describes the highdimensional phenomenon which sees functions of random vectors concentrating around their means. Such a result is presented for Lipschitz functions of Gaussian random vectors.
We close Chap. 9 with a detour to the Johnson–Lindenstrauss lemma which states that a finite set of points in a large dimensional space can be mapped to a significantly lowerdimensional space while almost preserving all mutual distances (no sparsity assumption is involved here). This is somewhat equivalent to the concentration inequality (1.16). In this sense, the Johnson–Lindenstrauss lemma implies the RIP. We will conversely show that if a matrix satisfies the RIP, then randomizing the signs of its column yields a Johnson–Lindenstrauss embedding with high probability.
Subgaussian random matrices are of limited practical use, because specific applications may impose a structure on the measurement matrix that totally random matrices lack. As mentioned earlier, deterministic measurement matrices providing provable recovery guarantees are missing from the current theory. This motivates the study of structured random matrices. In Chap. 12, we investigate a particular class of structured random matrices arising in sampling problems. This includes random partial Fourier matrices.
Deriving recovery guarantees for the random sampling matrix \(\mathbf{A}\) in (1.19) is more involved than for subgaussian random matrices where all the entries are independent. In fact, the matrix \(\mathbf{A}\) has mN entries, but it is generated only by m independent random variables. We proceed by increasing level of difficulty and start by showing nonuniform sparse recovery guarantees for ℓ _{1}minimization. The number of samples allowing one to recover a fixed ssparse coefficient vector \(\mathbf{x}\) with high probability is then m ≥ C K ^{2} slnN.
As a second method, we treat Chambolle and Pock’s primal–dual algorithm. This algorithm applies to a large class of optimization problems including ℓ _{1}minimization. It consists of a simple iterative procedure which updates a primal, a dual, and an auxiliary variable at each step. All of the computations are easy to perform. We show convergence of the sequence of primal variables generated by the algorithm to the minimizer of the given functional and outline its specific form for three types of ℓ _{1}minimization problems. In contrast to the homotopy method, it applies also in the complexvalued case.
Finally, we discuss a method that iteratively solves weighted ℓ _{2}minimization problems. The weights are suitably updated in each iteration based on the solution of the previous iteration. Since weighted ℓ _{2}minimization can be performed efficiently (in fact, this is a linear problem), each step of the algorithm can be computed quickly. Although this algorithm is strongly motivated by ℓ _{1}minimization, its convergence to the ℓ _{1}minimizer is not guaranteed. Nevertheless, under the null space property of the matrix \(\mathbf{A}\) (equivalent to sparse recovery via ℓ _{1}minimization), we show that the iteratively reweighted least squares algorithm recovers every ssparse vector from \(\mathbf{y} = \mathbf{A}\mathbf{x}\). Recovery is stable when passing to compressible vectors. Moreover, we give an estimate of the convergence rate in the exactly sparse case.
The book is concluded with three appendices. Appendix A covers background material from linear algebra and matrix analysis, including vector and matrix norms, eigenvalues and singular values, and matrix functions. Basic concepts and results from convex analysis and convex optimization are presented in Appendix B. We also treat matrix convexity and present a proof of Lieb’s theorem on the concavity of the matrix function \(\mathbf{X}\mapsto \mathrm{tr\,}\exp (\mathbf{H} +\ln \mathbf{X})\) on the set of positive definite matrices. Appendix C presents miscellaneous material including covering numbers, Fourier transforms, elementary estimates on binomial coefficients, the Gamma function and Stirling’s formula, smoothing of Lipschitz functions via convolution, distributional derivatives, and differential inequalities.
Notation is usually introduced when it first appears. Additionally, a collection of symbols used in the text can be found on pp. 589. All the constants in this book are universal unless stated otherwise. This means that they do not depend on any other quantity. Often, the value of a constant is given explicitly or it can be deduced from the proof.
1.4 Notes
The field of compressive sensing was initiated with the papers [94] by Candès, Romberg, and Tao and [152] by Donoho who coined the term compressed sensing. Even though there have been predecessors on various aspects of the field, these papers seem to be the first ones to combine the ideas of ℓ _{1}minimization with a random choice of measurement matrix and to realize the effectiveness of this combination for solving underdetermined systems of equations. Also, they emphasized the potential of compressive sensing for many signal processing tasks.
We now list some of the highlights from preceding works and earlier developments connected to compressive sensing. Details and references on the advances of compressive sensing itself will be given in the Notes sections at the end of each subsequent chapter. References [29, 84, 100, 182, 204, 411, 427] provide overview articles on compressive sensing.
Arguably, the first contribution connected to sparse recovery was made by de Prony [402] as far back as 1795. He developed a method for identifying the frequencies \(\omega _{j} \in \mathbb{R}\) and the amplitudes \(x_{j} \in \mathbb{C}\) in a nonharmonic trigonometric sum of the form \(f(t) =\sum _{ j=1}^{s}x_{j}{e}^{2\pi i\omega _{j}t}\). His method takes equidistant samples and solves an eigenvalue problem to compute the ω _{ j }. This method is related to Reed–Solomon decoding covered in the next chapter; see Theorem 2.15. For more information on the Prony method, we refer to [344, 401].
The use of ℓ _{1}minimization appeared in the 1965 Ph.D. thesis [332] of Logan in the context of sparse frequency estimation, and an early theoretical work on L _{1}minimization is the paper [161] by Donoho and Logan. Geophysicists observed in the late 1970s that ℓ _{1}minimization can be successfully used to compute a sparse reflection function indicating changes between subsurface layers [469, 441]. The use of totalvariation minimization, which is closely connected to ℓ _{1}minimization, appeared in the 1990s in the work on image processing by Rudin, Osher, and Fatemi [436]. The use of ℓ _{1}minimization and related greedy methods in statistics was greatly popularized by the work of Tibshirani [473] on the LASSO (Least Absolute Shrinkage and Selection Operator).
The theory of sparse approximation and associated algorithms began in the 1990s with the papers [342, 359, 114]. The theoretical understanding of conditions allowing greedy methods and ℓ _{1}minimization to recover the sparsest solution developed with the work in [158, 181, 155, 239, 224, 215, 476, 479].
Compressive sensing has connections with the area of informationbased complexity which considers the general question of how well functions f from a class \(\mathcal{F}\) can be approximated from m sample values or more generally from the evaluation of m linear or nonlinear functionals applied to f; see [474]. The optimal recovery error defined as the maximal reconstruction error for the best sampling and recovery methods over all functions in the class \(\mathcal{F}\) is closely related to the socalled Gelfand width of \(\mathcal{F}\) [370]; see also Chap. 10. Of particular interest in compressive sensing is the ℓ _{1}ball B _{1} ^{ N } in \({\mathbb{R}}^{N}\). Famous results due to Kashin [299] and Gluskin and Garnaev [219, 227] sharply bound the Gelfand widths of B _{1} ^{ N } from above and below; see also Chap. 10. Although the original interest of Kashin was to estimate mwidths of Sobolev classes, these results give precise performance bounds on how well any method may recover (approximately) sparse vectors from linear measurements. It is remarkable that [299, 219] already employed Bernoulli and Gaussian random matrices in ways similar to their use in compressive sensing (see Chap. 9).
In computer science, too, sparsity appeared before the advent of compressive sensing through the area of sketching. Here, one is not only interested in recovering huge data sets (such as data streams on the Internet) from vastly undersampled data, but one requires in addition that the associated algorithms have sublinear runtime in the signal length. There is no a priori contradiction in this desideratum because one only needs to report locations and values of nonzero entries. Such algorithms often use ideas from group testing [173], which dates back to World War II, when Dorfman [171] devised an efficient method for detecting draftees with syphilis. One usually designs the matrix and the fast algorithm simultaneously [131, 225] in this setup. Lossless expanders as studied in Chap. 13 play a key role in some of the constructions [41]. Quite remarkably, sublinear algorithms are also available for sparse Fourier transforms [223, 519, 287, 288, 262, 261].
Applications of Compressive Sensing. We next provide comments and references on the applications and motivations described in Sect. 1.2.
Singlepixel camera. The singlepixel camera was developed by Baraniuk and coworkers [174] as an elegant proof of concept that the ideas of compressive sensing can be implemented in hardware.
Magnetic resonance imaging. The initial paper [94] on compressive sensing was motivated by medical imaging—although Candès et al. have in fact treated the very similar problem of computerized tomography. The application of compressive sensing techniques to magnetic resonance imaging (MRI) was investigated in [338, 255, 497, 358]. Background on the theoretical foundations of MRI can be found, for instance, in [252, 267, 512]. Applications of compressive sensing to the related problem of nuclear magnetic resonance spectroscopy are contained in [278, 447]. Background on the methods related to Fig. 1.6 is described in the work of Lustig, Vasanawala and coworkers [497, 358].
Radar. The particular radar application outlined in Sect. 1.2 is described in more detail in [268]. The same mathematical model appears also in sonar and in the channel estimation problem of wireless communications [384, 412, 385]. The application of compressive sensing to other radar scenarios can be found, for instance, in [185, 189, 397, 455, 283].
Sampling theory. The classical sampling theorem (1.5) can be associated with the names of Shannon, Nyquist, Whittaker, and Kotelnikov. Sampling theory is a broad and welldeveloped area. We refer to [39, 195, 271, 272, 294] for further information on the classical aspects. The use of sparse recovery techniques in sampling problems appeared early in the development of the compressive sensing theory [94, 97, 408, 409, 411, 416]. In fact, the alternative name compressive sampling indicates that compressive sensing can be viewed as a part of sampling theory—although it draws from quite different mathematical tools than classical sampling theory itself.
Sparse approximation. The theory of compressive sensing can also be viewed as a part of sparse approximation with roots in signal processing, harmonic analysis [170], and numerical analysis [122]. A general source for background on sparse approximation and its applications are the books [179, 451, 472] as well as the survey paper [73].
The principle of representing a signal by a small number of terms in a suitable basis in order to achieve compression is realized, for instance, in the ubiquitous compression standards JPEG, MPEG, and MP3. Wavelets [137] are known to provide a good basis for images, and the analysis of the best (nonlinear) approximation reaches into the area of function spaces, more precisely Besov spaces [508]. Similarly, Gabor expansions [244] may compress audio signals. Since good Gabor systems are always redundant systems (frames) and never bases, computational tools to compute the sparsest representation of a signal are essential. It was realized in [359, 342] that this problem is in general NPhard. The greedy approach via orthogonal matching pursuit was then introduced in [342] (although it had appeared earlier in different contexts), while basis pursuit (ℓ _{1}minimization) was introduced in [114].
The use of the uncertainty principle for deducing a positive statement on the data separation problem with respect to the Fourier and canonical bases appeared in [164, 163]. For further information on the separation problem, we refer the reader to [181, 92, 158, 160, 238, 331, 482]. Background on denoising via sparse representations can be found in [180, 450, 105, 150, 159, 407].
The analysis of conditions allowing algorithms such as ℓ _{1}minimization or orthogonal matching pursuit to recover the sparsest representation has started with the contributions [155, 157, 158, 156, 224, 476, 479], and these early results are the basis for the advances in compressive sensing.
Error correction. The idealized setup of error correction and the compressive sensing approach described in Sect. 1.2 appeared in [96, 167, 431]. For more background on error correction, we refer to [282].
Statistics and machine learning. Sparsity has a long history in statistics and in linear regression models in particular. The corresponding area is sometimes referred to as highdimensional statistics or model selection because the support set of the coefficient vector \(\mathbf{x}\) determines the relevant explanatory variables and thereby selects a model. Stepwise forward regression methods are closely related to greedy algorithms such as (orthogonal) matching pursuit. The LASSO, i.e., the minimization problem (1.8), was introduced by Tibshirani in [473]. Candès and Tao have introduced the Dantzig selector (1.9) in [98] and realized that methods of compressive sensing (the restricted isometry property) are useful for the analysis of sparse regression methods. We refer to [48] and the monograph [76] for details. For more information on machine learning, we direct the reader to [18, 133, 134, 444]. Connections between sparsity and machine learning can be found, for instance, in [23, 147, 513].
Lowrank matrix recovery. The extension of compressive sensing to the recovery of lowrank matrices from incomplete information emerged with the papers [90, 99, 418]. The idea of replacing the rank minimization problem by the nuclear norm minimization appeared in the Ph.D. thesis of Fazel [190]. The matrix completion problem is treated in [90, 417, 99] and the more general problem of quantum state tomography in [246, 245, 330].
Let us briefly mention further applications and relations to other fields.
In inverse problems, sparsity has also become an important concept for regularization methods. Instead of Tikhonov regularization with a Hilbert space norm [186], one uses an ℓ _{1}norm regularization approach [138, 406]. In many practical applications, this improves the recovered solutions. Illposed inverse problems appear, for instance, in geophysics where ℓ _{1}norm regularization was already used in [469, 441] but without rigorous mathematical theory at that time. We refer to the survey papers [269, 270] dealing with compressive sensing in seismic exploration.
Totalvariation minimization is a classical and successful approach for image denoising and other tasks in image processing [106, 436, 104]. Since the total variation is the ℓ _{1}norm of the gradient, the minimization problem is closely related to basis pursuit. In fact, the motivating example for the first contribution [94] of Candès, Romberg, and Tao to compressive sensing came from totalvariation minimization in computer tomography. The restricted isometry property can be used to analyze image recovery via totalvariation minimization [364]. The primal–dual algorithm of Chambolle and Pock to be presented in Chap. 15 was originally motivated by totalvariation minimization as well [107].
Further applications of compressive sensing and sparsity in general include imaging (tomography, ultrasound, photoacoustic imaging, hyperspectral imaging, etc.), analogtodigital conversion [488, 353], DNA microarray processing, astronomy [507], and wireless communications [27, 468].
Topics not Covered in this Book. It is impossible to give a detailed account of all the directions that have so far cropped up around compressive sensing. This book certainly makes a selection, but we believe that we cover the most important aspects and mathematical techniques. With this basis, the reader should be well equipped to read the original references on further directions, generalizations, and applications. Let us only give a brief account of additional topics together with the relevant references. Again, no claim about completeness of the list is made.
Structured sparsity models. One often has additional a priori knowledge than just pure sparsity in the sense that the support set of the sparse vector to be recovered possesses a certain structure, i.e., only specific support sets are allowed. Let us briefly describe the jointsparsity and blocksparsity model.
Suppose that we take measurements not only of a single signal but of a collection of signals that are somewhat coupled. Rather than assuming that each signal is sparse (or compressible) on its own, we assume that the unknown support set is the same for all signals in the collection. In this case, we speak of joint sparsity. A motivating example is color images where each signal corresponds to a color channel of the image, say red, green, and blue. Since edges usually appear at the same location for all channels, the gradient features some joint sparsity. Instead of the usual ℓ _{1}minimization problem, one considers mixed ℓ _{1} ∕ ℓ _{2}norm minimization or greedy algorithms exploiting the jointsparsity structure. A similar setup is described by the blocksparsity (or groupsparsity) model, where certain indices of the sparse vector are grouped together. Then a signal is block sparse if most groups (blocks) of coefficients are zero. In other words, nonzero coefficients appear in groups. Recovery algorithms may exploit this prior knowledge to improve the recovery performance. A theory can be developed along similar lines as usual sparsity [143, 183, 184, 203, 487, 241, 478]. The socalled modelbased compressive sensing [30] provides a further, very general structured sparsity setup.
Sublinear algorithms. This type of algorithms have been developed in computer science for a longer time. The fact that only the locations and values of nonzero entries of a sparse vector have to be reported enables one to design recovery algorithms whose runtime is sublinear in the vector length. Recovery methods are also called streaming algorithms or heavy hitters. We will only cover a toy sublinear algorithm in Chap. 13, and we refer to [41, 131, 223, 289, 285, 222, 225, 261] for more information.
Connection with the geometry of random polytopes. Donoho and Tanner [166, 165, 154, 167] approached the analysis of sparse recovery via ℓ _{1}minimization through polytope geometry. In fact, the recovery of ssparse vectors via ℓ _{1}minimization is equivalent to a geometric property—called neighborliness—of the projected ℓ _{1}ball under the action of the measurement matrix; see also Corollary 4.39. When the measurement matrix is a Gaussian random matrix, Donoho and Tanner give a precise analysis of socalled phase transitions that predict in which ranges of (s, m, N) sparse recovery is successful and unsuccessful with high probability. In particular, their analysis provides the value of the optimal constant C such that m ≈ C sln(N ∕ s) allows for ssparse recovery via ℓ _{1}minimization. We only give a brief account of their work in the Notes of Chap. 9.
Compressive sensing and quantization. If compressive sensing is used for signal acquisition, then a realistic sensor must quantize the measured data. This means that only a finite number of values for the measurements y _{ ℓ } are possible. For instance, 8 bits provide 2^{8} = 256 values for an approximation of y _{ ℓ } to be stored. If the quantization is coarse, then this additional source of error cannot be ignored and a revised theoretical analysis becomes necessary. We refer to [316, 520, 249] for background information. We also mention the extreme case of 1bit compressed sensing where only the signs of the measurements are available via \(\mathbf{y} =\mathrm{ sgn}(\mathbf{A}\mathbf{x})\) [290, 393, 394].
Dictionary learning. Sparsity usually occurs in a specific basis or redundant dictionary. In certain applications, it may not be immediately clear which dictionary is suitable to sparsify the signals of interest. Dictionary learning tries to identify a good dictionary using training signals. Algorithmic approaches include the KSVD algorithm [5, 429] and optimization methods [242]. Optimizing over both the dictionary and the coefficients in the expansions results in a nonconvex program, even when using ℓ _{1}minimization. Therefore, it is notoriously hard to establish a rigorous mathematical theory of dictionary learning despite the fact that the algorithms perform well in practice. Nevertheless, there are a few interesting mathematical results available in the spirit of compressive sensing [221, 242].
Recovery of functions of many variables. Techniques from compressive sensing can be exploited for the reconstruction of functions on a highdimensional space from point samples. Traditional approaches suffer the curse of dimensionality, which predicts that the number of samples required to achieve a certain reconstruction accuracy scales exponentially with the spatial dimension even for classes of infinitely differentiable functions [371, 474]. It is often a reasonable assumption in practice that the function to be reconstructed depends only on a small number of (a priori unknown) variables. This model is investigated in [125, 149], and ideas of compressive sensing allow one to dramatically reduce the number of required samples. A more general model considers functions of the form \(f(\mathbf{x}) = g(\mathbf{A}\mathbf{x})\), where \(\mathbf{x}\) belongs to a subset \(\mathcal{D}\subset {\mathbb{R}}^{N}\) with N being large, \(\mathbf{A} \in {\mathbb{R}}^{m\times N}\) with m ≪ N, and g is a smooth function on an mdimensional domain. Both g and \(\mathbf{A}\) are unknown a priori and are to be reconstructed from suitable samples of f. Again, under suitable assumptions on g and on \(\mathbf{A}\), one can build on methods from compressive sensing to recover f from a relatively small number of samples. We refer to [266, 124, 206] for details.

For a comprehensive treatment of the deterministic issues, Chaps. 2– 6 complemented by Chap. 10 are appropriate. If a proof of the restricted isometry property for random matrices is desired, one can add the simple arguments of Sect. 9.1, which only rely on a few tools from Chap. 7. In a class lasting only one quarter rather than one semester, one can remove Sect. 4.5 and mention only briefly the stability and robustness results of Chaps. 4 and 6. One can also concentrate only on ℓ _{1}minimization and discard Chap. 3 as well as Sects. 5.3, 5.5, 6.3, and 6.4 if the variety of algorithms is not a priority.

On the other hand, for a course focusing on algorithmic aspects, Chaps. 2– 6 as well as (parts of) Chap. 15 are appropriate, possibly replacing Chap. 5 by Chap. 13 and including (parts of) Appendix B.

For a course focusing on probabilistic issues, we recommend Chaps. 7– 9 and Chaps. 11, 12, and 14. This can represent a second onesemester class. However, if this material has to be delivered as a first course, Chap. 4 (especially Sects. 4.1 and 4.4) and Chap. 6 (especially Sects. 6.1 and 6.2) need to be included.
Of course, parts of particular chapters may also be dropped depending on the desired emphasis.
We will be happy to receive feedback on these suggestions from instructors using this book in their class. They may also contact us to obtain typedout solutions for some of the exercises.
References
 5.M. Aharon, M. Elad, A. Bruckstein, The KSVD: An algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
 18.M. Anthony, P. Bartlett, Neural Network Learning: Theoretical Foundations (Cambridge University Press, Cambridge, 1999)MATHCrossRefGoogle Scholar
 23.F. Bach, R. Jenatton, J. Mairal, G. Obozinski, Optimization with sparsityinducing penalties. Found. Trends Mach. Learn. 4(1), 1–106 (2012)CrossRefGoogle Scholar
 27.W. Bajwa, J. Haupt, A.M. Sayeed, R. Nowak, Compressed channel sensing: a new approach to estimating sparse multipath channels. Proc. IEEE 98(6), 1058–1076 (June 2010)CrossRefGoogle Scholar
 29.R.G. Baraniuk, Compressive sensing. IEEE Signal Process. Mag. 24(4), 118–121 (2007)CrossRefGoogle Scholar
 30.R.G. Baraniuk, V. Cevher, M. Duarte, C. Hedge, Modelbased compressive sensing. IEEE Trans. Inform. Theor. 56, 1982–2001 (April 2010)CrossRefGoogle Scholar
 39.J.J. Benedetto, P.J.S.G. Ferreira (eds.), Modern Sampling Theory: Mathematics and Applications. Applied and Numerical Harmonic Analysis (Birkhäuser, Boston, MA, 2001)Google Scholar
 41.R. Berinde, A. Gilbert, P. Indyk, H. Karloff, M. Strauss, Combining geometry and combinatorics: A unified approach to sparse signal recovery. In Proc. of 46th Annual Allerton Conference on Communication, Control, and Computing, pp. 798–805, 2008Google Scholar
 48.P. Bickel, Y. Ritov, A. Tsybakov, Simultaneous analysis of lasso and Dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009)MathSciNetMATHCrossRefGoogle Scholar
 73.A. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009)MathSciNetMATHCrossRefGoogle Scholar
 76.P. Bühlmann, S. van de Geer, Statistics for Highdimensional Data. Springer Series in Statistics (Springer, Berlin, 2011)Google Scholar
 84.E.J. Candès, Compressive sampling. In Proceedings of the International Congress of Mathematicians, Madrid, Spain, 2006Google Scholar
 90.E.J. Candès, B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717–772 (2009)MathSciNetMATHCrossRefGoogle Scholar
 92.E.J. Candès, J. Romberg, Quantitative robust uncertainty principles and optimally sparse decompositions. Found. Comput. Math. 6(2), 227–254 (2006)MathSciNetMATHCrossRefGoogle Scholar
 94.E.J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theor. 52(2), 489–509 (2006)MATHCrossRefGoogle Scholar
 96.E.J. Candès, T. Tao, Decoding by linear programming. IEEE Trans. Inform. Theor. 51(12), 4203–4215 (2005)MATHCrossRefGoogle Scholar
 97.E.J. Candès, T. Tao, Near optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inform. Theor. 52(12), 5406–5425 (2006)CrossRefGoogle Scholar
 98.E.J. Candès, T. Tao, The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351, (2007)Google Scholar
 99.E.J. Candès, T. Tao, The power of convex relaxation: nearoptimal matrix completion. IEEE Trans. Inform. Theor. 56(5), 2053–2080 (2010)CrossRefGoogle Scholar
 100.E.J. Candès, M. Wakin, An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 21–30 (2008)CrossRefGoogle Scholar
 104.A. Chambolle, V. Caselles, D. Cremers, M. Novaga, T. Pock, An introduction to total variation for image analysis. In Theoretical Foundations and Numerical Methods for Sparse Recovery, ed. by M. Fornasier. Radon Series on Computational and Applied Mathematics, vol. 9 (de Gruyter, Berlin, 2010), pp. 263–340Google Scholar
 105.A. Chambolle, R.A. DeVore, N.Y. Lee, B.J. Lucier, Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7(3), 319–335 (1998)MathSciNetMATHCrossRefGoogle Scholar
 106.A. Chambolle, P.L. Lions, Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997)MathSciNetMATHCrossRefGoogle Scholar
 107.A. Chambolle, T. Pock, A firstorder primaldual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis. 40, 120–145 (2011)MathSciNetMATHCrossRefGoogle Scholar
 114.S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by Basis Pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1999)MathSciNetMATHCrossRefGoogle Scholar
 122.A. Cohen, Numerical Analysis of Wavelet Methods (NorthHolland, Amsterdam, 2003)MATHGoogle Scholar
 124.A. Cohen, I. Daubechies, R. DeVore, G. Kerkyacharian, D. Picard, Capturing Ridge Functions in High Dimensions from Point Queries. Constr. Approx. 35, 225–243 (2012)MathSciNetMATHCrossRefGoogle Scholar
 125.A. Cohen, R. DeVore, S. Foucart, H. Rauhut, Recovery of functions of many variables via compressive sensing. In Proc. SampTA 2011, Singapore, 2011Google Scholar
 131.G. Cormode, S. Muthukrishnan, Combinatorial algorithms for compressed sensing. In CISS, Princeton, 2006Google Scholar
 133.F. Cucker, S. Smale, On the mathematical foundations of learning. Bull. Am. Math. Soc., New Ser. 39(1), 1–49 (2002)Google Scholar
 134.F. Cucker, D.X. Zhou, Learning Theory: An Approximation Theory Viewpoint. Cambridge Monographs on Applied and Computational Mathematics (Cambridge University Press, Cambridge, 2007)Google Scholar
 137.I. Daubechies, Ten Lectures on Wavelets, CBMSNSF Regional Conference Series in Applied Mathematics, vol. 61 (SIAM, Philadelphia, 1992)Google Scholar
 138.I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57(11), 1413–1457 (2004)MathSciNetMATHCrossRefGoogle Scholar
 143.M. Davies, Y. Eldar, Rank awareness in joint sparse recovery. IEEE Trans. Inform. Theor. 58(2), 1135–1146 (2012)MathSciNetCrossRefGoogle Scholar
 147.C. De Mol, E. De Vito, L. Rosasco, Elasticnet regularization in learning theory. J. Complex. 25(2), 201–230 (2009)MATHCrossRefGoogle Scholar
 149.R.A. DeVore, G. Petrova, P. Wojtaszczyk, Approximation of functions of few variables in high dimensions. Constr. Approx. 33(1), 125–143 (2011)MathSciNetMATHCrossRefGoogle Scholar
 150.D.L. Donoho, Denoising by softthresholding. IEEE Trans. Inform. Theor. 41(3), 613–627 (1995)MathSciNetMATHCrossRefGoogle Scholar
 152.D.L. Donoho, Compressed sensing. IEEE Trans. Inform. Theor. 52(4), 1289–1306 (2006)MathSciNetCrossRefGoogle Scholar
 154.D.L. Donoho, Highdimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput. Geom. 35(4), 617–652 (2006)MathSciNetMATHCrossRefGoogle Scholar
 155.D.L. Donoho, M. Elad, Optimally sparse representations in general (nonorthogonal) dictionaries via ℓ ^{1} minimization. Proc. Nat. Acad. Sci. 100(5), 2197–2202 (2003)MathSciNetMATHCrossRefGoogle Scholar
 156.D.L. Donoho, M. Elad, On the stability of the basis pursuit in the presence of noise. Signal Process. 86(3), 511–532 (2006)MATHCrossRefGoogle Scholar
 157.D.L. Donoho, M. Elad, V.N. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theor. 52(1), 6–18 (2006)MathSciNetCrossRefGoogle Scholar
 158.D.L. Donoho, X. Huo, Uncertainty principles and ideal atomic decompositions. IEEE Trans. Inform. Theor. 47(7), 2845–2862 (2001)MathSciNetMATHCrossRefGoogle Scholar
 159.D.L. Donoho, I.M. Johnstone, Minimax estimation via wavelet shrinkage. Ann. Stat. 26(3), 879–921 (1998)MathSciNetMATHCrossRefGoogle Scholar
 160.D.L. Donoho, G. Kutyniok, Microlocal analysis of the geometric separation problem. Comm. Pure Appl. Math. 66(1), 1–47 (2013)MathSciNetMATHCrossRefGoogle Scholar
 161.D.L. Donoho, B. Logan, Signal recovery and the large sieve. SIAM J. Appl. Math. 52(2), 577–591 (1992)MathSciNetMATHCrossRefGoogle Scholar
 163.D.L. Donoho, P. Stark, Recovery of a sparse signal when the low frequency information is missing. Technical report, Department of Statistics, University of California, Berkeley, June 1989Google Scholar
 164.D.L. Donoho, P. Stark, Uncertainty principles and signal recovery. SIAM J. Appl. Math. 48(3), 906–931 (1989)MathSciNetCrossRefGoogle Scholar
 165.D.L. Donoho, J. Tanner, Neighborliness of randomly projected simplices in high dimensions. Proc. Natl. Acad. Sci. USA 102(27), 9452–9457 (2005)MathSciNetMATHCrossRefGoogle Scholar
 166.D.L. Donoho, J. Tanner, Sparse nonnegative solutions of underdetermined linear equations by linear programming. Proc. Natl. Acad. Sci. 102(27), 9446–9451 (2005)MathSciNetCrossRefGoogle Scholar
 167.D.L. Donoho, J. Tanner, Counting faces of randomlyprojected polytopes when the projection radically lowers dimension. J. Am. Math. Soc. 22(1), 1–53 (2009)MathSciNetMATHCrossRefGoogle Scholar
 170.D.L. Donoho, M. Vetterli, R.A. DeVore, I. Daubechies, Data compression and harmonic analysis. IEEE Trans. Inform. Theor. 44(6), 2435–2476 (1998)MathSciNetMATHCrossRefGoogle Scholar
 171.R. Dorfman, The detection of defective members of large populations. Ann. Stat. 14, 436–440 (1943)CrossRefGoogle Scholar
 173.D.Z. Du, F. Hwang, Combinatorial Group Testing and Its Applications (World Scientific, Singapore, 1993)MATHGoogle Scholar
 174.M. Duarte, M. Davenport, D. Takhar, J. Laska, S. Ting, K. Kelly, R.G. Baraniuk, SinglePixel Imaging via Compressive Sampling. IEEE Signal Process. Mag. 25(2), 83–91 (2008)CrossRefGoogle Scholar
 179.M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing (Springer, New York, 2010)CrossRefGoogle Scholar
 180.M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736 –3745 (2006)MathSciNetCrossRefGoogle Scholar
 181.M. Elad, A.M. Bruckstein, A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Trans. Inform. Theor. 48(9), 2558–2567 (2002)MathSciNetMATHCrossRefGoogle Scholar
 182.Y. Eldar, G. Kutyniok (eds.), Compressed Sensing: Theory and Applications (Cambridge University Press, New York, 2012)Google Scholar
 183.Y. Eldar, M. Mishali, Robust recovery of signals from a structured union of subspaces. IEEE Trans. Inform. Theor. 55(11), 5302–5316 (2009)MathSciNetCrossRefGoogle Scholar
 184.Y. Eldar, H. Rauhut, Average case analysis of multichannel sparse recovery using convex relaxation. IEEE Trans. Inform. Theor. 56(1), 505–519 (2010)MathSciNetCrossRefGoogle Scholar
 185.J. Ender, On compressive sensing applied to radar. Signal Process. 90(5), 1402–1414 (2010)MATHCrossRefGoogle Scholar
 186.H.W. Engl, M. Hanke, A. Neubauer, Regularization of Inverse Problems (Springer, New York, 1996)MATHCrossRefGoogle Scholar
 189.A. Fannjiang, P. Yan, T. Strohmer, Compressed remote sensing of sparse objects. SIAM J. Imag. Sci. 3(3), 596–618 (2010)MathSciNetCrossRefGoogle Scholar
 190.M. Fazel, Matrix Rank Minimization with Applications. PhD thesis, 2002Google Scholar
 195.P.J.S.G. Ferreira, J.R. Higgins, The establishment of sampling as a scientific principle—a striking case of multiple discovery. Not. AMS 58(10), 1446–1450 (2011)MathSciNetMATHGoogle Scholar
 203.M. Fornasier, H. Rauhut, Recovery algorithms for vector valued data with joint sparsity constraints. SIAM J. Numer. Anal. 46(2), 577–613 (2008)MathSciNetMATHCrossRefGoogle Scholar
 204.M. Fornasier, H. Rauhut, Compressive sensing. In Handbook of Mathematical Methods in Imaging, ed. by O. Scherzer (Springer, New York, 2011), pp. 187–228CrossRefGoogle Scholar
 206.M. Fornasier, K. Schnass, J. Vybiral, Learning Functions of Few Arbitrary Linear Parameters in High Dimensions. Found. Comput. Math. 12, 229–262 (2012)MathSciNetMATHCrossRefGoogle Scholar
 215.J.J. Fuchs, On sparse representations in arbitrary redundant bases. IEEE Trans. Inform. Theor. 50(6), 1341–1344 (2004)CrossRefGoogle Scholar
 219.A. Garnaev, E. Gluskin, On widths of the Euclidean ball. Sov. Math. Dokl. 30, 200–204 (1984)MATHGoogle Scholar
 221.Q. Geng, J. Wright, On the local correctness of ℓ ^{1}minimization for dictionary learning. Preprint (2011)Google Scholar
 222.A. Gilbert, M. Strauss, Analysis of data streams. Technometrics 49(3), 346–356 (2007)MathSciNetCrossRefGoogle Scholar
 223.A.C. Gilbert, S. Muthukrishnan, S. Guha, P. Indyk, M. Strauss, NearOptimal Sparse Fourier Representations via Sampling. In Proceedings of the Thiryfourth Annual ACM Symposium on Theory of Computing, STOC ’02, pp. 152–161, ACM, New York, NY, USA, 2002Google Scholar
 224.A.C. Gilbert, S. Muthukrishnan, M.J. Strauss, Approximation of functions over redundant dictionaries using coherence. In Proceedings of the Fourteenth Annual ACMSIAM Symposium on Discrete Algorithms, SODA ’03, pp. 243–252. SIAM, Philadelphia, PA, 2003Google Scholar
 225.A.C. Gilbert, M. Strauss, J.A. Tropp, R. Vershynin, One sketch for all: fast algorithms for compressed sensing. In Proceedings of the Thirtyninth Annual ACM Symposium on Theory of Computing, STOC ’07, pp. 237–246, ACM, New York, NY, USA, 2007Google Scholar
 227.E. Gluskin, Norms of random matrices and widths of finitedimensional sets. Math. USSRSb. 48, 173–182 (1984)MATHCrossRefGoogle Scholar
 238.R. Gribonval, Sparse decomposition of stereo signals with matching pursuit and application to blind separation of more than two sources from a stereo mixture. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’02), vol. 3, pp. 3057–3060, 2002Google Scholar
 239.R. Gribonval, M. Nielsen, Sparse representations in unions of bases. IEEE Trans. Inform. Theor. 49(12), 3320–3325 (2003)MathSciNetCrossRefGoogle Scholar
 241.R. Gribonval, H. Rauhut, K. Schnass, P. Vandergheynst, Atoms of all channels, unite! Average case analysis of multichannel sparse recovery using greedy algorithms. J. Fourier Anal. Appl. 14(5), 655–687 (2008)MathSciNetMATHGoogle Scholar
 242.R. Gribonval, K. Schnass, Dictionary identification—sparse matrixfactorisation via l _{1}minimisation. IEEE Trans. Inform. Theor. 56(7), 3523–3539 (2010)MathSciNetCrossRefGoogle Scholar
 244.K. Gröchenig, Foundations of TimeFrequency Analysis. Applied and Numerical Harmonic Analysis (Birkhäuser, Boston, MA, 2001)Google Scholar
 245.D. Gross, Recovering lowrank matrices from few coefficients in any basis. IEEE Trans. Inform. Theor. 57(3), 1548–1566 (2011)CrossRefGoogle Scholar
 246.D. Gross, Y.K. Liu, S.T. Flammia, S. Becker, J. Eisert, Quantum state tomography via compressed sensing. Phys. Rev. Lett. 105, 150401 (2010)CrossRefGoogle Scholar
 249.C. Güntürk, M. Lammers, A. Powell, R. Saab, Ö. Yilmaz, Sobolev duals for random frames and ΣΔ quantization of compressed sensing measurements. Found. Comput. Math. 13(1), 1–36, SpringerVerlag (2013)Google Scholar
 252.M. Haacke, R. Brown, M. Thompson, R. Venkatesan, Magnetic Resonance Imaging: Physical Principles and Sequence Design (WileyLiss, New York, 1999)Google Scholar
 255.J. Haldar, D. Hernando, Z. Liang, Compressedsensing MRI with random encoding. IEEE Trans. Med. Imag. 30(4), 893–903 (2011)CrossRefGoogle Scholar
 261.H. Hassanieh, P. Indyk, D. Katabi, E. Price, Nearly optimal sparse Fourier transform. In Proceedings of the 44th Symposium on Theory of Computing, STOC ’12, pp. 563–578, ACM, New York, NY, USA, 2012Google Scholar
 262.H. Hassanieh, P. Indyk, D. Katabi, E. Price, Simple and practical algorithm for sparse Fourier transform. In Proceedings of the Twentythird Annual ACMSIAM Symposium on Discrete Algorithms, SODA ’12, pp. 1183–1194. SIAM, 2012Google Scholar
 266.T. Hemant, V. Cevher, Learning nonparametric basis independent models from point queries via lowrank methods. Preprint (2012)Google Scholar
 267.W. Hendee, C. Morgan, Magnetic resonance imaging Part I—Physical principles. West J. Med. 141(4), 491–500 (1984)Google Scholar
 268.M. Herman, T. Strohmer, Highresolution radar via compressed sensing. IEEE Trans. Signal Process. 57(6), 2275–2284 (2009)MathSciNetCrossRefGoogle Scholar
 269.F. Herrmann, M. Friedlander, O. Yilmaz, Fighting the curse of dimensionality: compressive sensing in exploration seismology. Signal Process. Mag. IEEE 29(3), 88–100 (2012)CrossRefGoogle Scholar
 270.F. Herrmann, H. Wason, T. Lin, Compressive sensing in seismic exploration: an outlook on a new paradigm. CSEG Recorder 36(4), 19–33 (2011)Google Scholar
 271.J.R. Higgins, Sampling Theory in Fourier and Signal Analysis: Foundations, vol. 1 (Clarendon Press, Oxford, 1996)MATHGoogle Scholar
 272.J.R. Higgins, R.L. Stens, Sampling Theory in Fourier and Signal Analysis: Advanced Topics, vol. 2 (Oxford University Press, Oxford, 1999)MATHGoogle Scholar
 278.D. Holland, M. Bostock, L. Gladden, D. Nietlispach, Fast multidimensional NMR spectroscopy using compressed sensing. Angew. Chem. Int. Ed. 50(29), 6548–6551 (2011)CrossRefGoogle Scholar
 282.W. Huffman, V. Pless, Fundamentals of Errorcorrecting Codes (Cambridge University Press, Cambridge, 2003)MATHCrossRefGoogle Scholar
 283.M. Hügel, H. Rauhut, T. Strohmer, Remote sensing via ℓ _{1}minimization. Found. Comput. Math., to appear. (2012)Google Scholar
 285.P. Indyk, A. Gilbert, Sparse recovery using sparse matrices. Proc. IEEE 98(6), 937–947 (2010)CrossRefGoogle Scholar
 287.M. Iwen, Combinatorial sublineartime Fourier algorithms. Found. Comput. Math. 10(3), 303–338 (2010)MathSciNetMATHCrossRefGoogle Scholar
 288.M. Iwen, Improved approximation guarantees for sublineartime Fourier algorithms. Appl. Comput. Harmon. Anal. 34(1), 57–82 (2013)MathSciNetMATHCrossRefGoogle Scholar
 289.M. Iwen, A. Gilbert, M. Strauss, Empirical evaluation of a sublinear time sparse DFT algorithm. Commun. Math. Sci. 5(4), 981–998 (2007)MathSciNetMATHGoogle Scholar
 290.L. Jacques, J. Laska, P. Boufounos, R. Baraniuk, Robust 1bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Trans. Inform. Theor. 59(4), 2082–2102 (2013)MathSciNetCrossRefGoogle Scholar
 294.A.J. Jerri, The Shannon sampling theorem—its various extensions and applications: A tutorial review. Proc. IEEE. 65(11), 1565–1596 (1977)MATHCrossRefGoogle Scholar
 299.B. Kashin, Diameters of some finitedimensional sets and classes of smooth functions. Math. USSR, Izv. 11, 317–333 (1977)Google Scholar
 316.J. Laska, P. Boufounos, M. Davenport, R. Baraniuk, Democracy in action: quantization, saturation, and compressive sensing. Appl. Comput. Harmon. Anal. 31(3), 429–443 (2011)MathSciNetMATHCrossRefGoogle Scholar
 330.Y. Liu, Universal lowrank matrix recovery from Pauli measurements. In NIPS, pp. 1638–1646, 2011Google Scholar
 331.A. Llagostera Casanovas, G. Monaci, P. Vandergheynst, R. Gribonval, Blind audiovisual source separation based on sparse redundant representations. IEEE Trans. Multimed. 12(5), 358–371 (August 2010)CrossRefGoogle Scholar
 332.B. Logan, Properties of HighPass Signals. PhD thesis, Columbia University, New York, 1965Google Scholar
 338.M. Lustig, D.L. Donoho, J. Pauly, Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn. Reson. Med. 58(6), 1182–1195 (2007)CrossRefGoogle Scholar
 342.S. Mallat, Z. Zhang, Matching pursuits with timefrequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)MATHCrossRefGoogle Scholar
 344.S. Marple, Digital Spectral Analysis with Applications (PrenticeHall, Englewood Cliffs, 1987)Google Scholar
 353.M. Mishali, Y.C. Eldar, From theory to practice: Subnyquist sampling of sparse wideband analog signals. IEEE J. Sel. Top. Signal Process. 4(2), 375–391 (April 2010)CrossRefGoogle Scholar
 358.M. Murphy, M. Alley, J. Demmel, K. Keutzer, S. Vasanawala, M. Lustig, Fast ℓ _{1}SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime. IEEE Trans. Med. Imag. 31(6), 1250–1262 (2012)CrossRefGoogle Scholar
 359.B.K. Natarajan, Sparse approximate solutions to linear systems. SIAM J. Comput. 24, 227–234 (1995)MathSciNetMATHCrossRefGoogle Scholar
 364.D. Needell, R. Ward, Stable image reconstruction using total variation minimization. Preprint (2012)Google Scholar
 370.E. Novak, Optimal recovery and nwidths for convex classes of functions. J. Approx. Theor. 80(3), 390–408 (1995)MATHCrossRefGoogle Scholar
 371.E. Novak, H. Woźniakowski, Tractability of Multivariate Problems. Vol. 1: Linear Information. EMS Tracts in Mathematics, vol. 6 (European Mathematical Society (EMS), Zürich, 2008)Google Scholar
 384.G. Pfander, H. Rauhut, J. Tanner, Identification of matrices having a sparse representation. IEEE Trans. Signal Process. 56(11), 5376–5388 (2008)MathSciNetCrossRefGoogle Scholar
 385.G. Pfander, H. Rauhut, J. Tropp, The restricted isometry property for timefrequency structured random matrices. Prob. Theor. Relat. Field. to appearGoogle Scholar
 393.Y. Plan, R. Vershynin, Onebit compressed sensing by linear programming. Comm. Pure Appl. Math. 66(8), 1275–1297 (2013)MATHCrossRefGoogle Scholar
 394.Y. Plan, R. Vershynin, Robust 1bit compressed sensing and sparse logistic regression: a convex programming approach. IEEE Trans. Inform. Theor. 59(1), 482–494 (2013)MathSciNetCrossRefGoogle Scholar
 397.L. Potter, E. Ertin, J. Parker, M. Cetin, Sparsity and compressed sensing in radar imaging. Proc. IEEE 98(6), 1006–1020 (2010)CrossRefGoogle Scholar
 401.D. Potts, M. Tasche, Parameter estimation for exponential sums by approximate Prony method. Signal Process. 90(5), 1631–1642 (2010)MATHCrossRefGoogle Scholar
 402.R. Prony, Essai expérimental et analytique sur les lois de la Dilatabilité des fluides élastiques et sur celles de la Force expansive de la vapeur de l’eau et de la vapeur de l’alkool, à différentes températures. J. École Polytechnique 1, 24–76 (1795)Google Scholar
 406.R. Ramlau, G. Teschke, Sparse recovery in inverse problems. In Theoretical Foundations and Numerical Methods for Sparse Recovery, ed. by M. Fornasier. Radon Series on Computational and Applied Mathematics, vol. 9 (de Gruyter, Berlin, 2010), pp. 201–262Google Scholar
 407.M. Raphan, E. Simoncelli, Optimal denoising in redundant representation. IEEE Trans. Image Process. 17(8), 1342–1352 (2008)MathSciNetCrossRefGoogle Scholar
 408.H. Rauhut, Random sampling of sparse trigonometric polynomials. Appl. Comput. Harmon. Anal. 22(1), 16–42 (2007)MathSciNetMATHCrossRefGoogle Scholar
 409.H. Rauhut, On the impossibility of uniform sparse reconstruction using greedy methods. Sampl. Theor. Signal Image Process. 7(2), 197–215 (2008)MathSciNetMATHGoogle Scholar
 411.H. Rauhut, Compressive sensing and structured random matrices. In Theoretical Foundations and Numerical Methods for Sparse Recovery, ed. by M. Fornasier. Radon Series on Computational and Applied Mathematics, vol. 9 (de Gruyter, Berlin, 2010), pp. 1–92Google Scholar
 412.H. Rauhut, G.E. Pfander, Sparsity in timefrequency representations. J. Fourier Anal. Appl. 16(2), 233–260 (2010)MathSciNetMATHCrossRefGoogle Scholar
 416.H. Rauhut, R. Ward, Sparse Legendre expansions via ℓ _{1}minimization. J. Approx. Theor. 164(5), 517–533 (2012)MathSciNetMATHCrossRefGoogle Scholar
 417.B. Recht, A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)MathSciNetGoogle Scholar
 418.B. Recht, M. Fazel, P. Parrilo, Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetMATHCrossRefGoogle Scholar
 427.J.K. Romberg, Imaging via compressive sampling. IEEE Signal Process. Mag. 25(2), 14–20 (March, 2008)CrossRefGoogle Scholar
 429.R. Rubinstein, M. Zibulevsky, M. Elad, Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Trans. Signal Process. 58(3, part 2), 1553–1564 (2010)Google Scholar
 431.M. Rudelson, R. Vershynin, Geometric approach to errorcorrecting codes and reconstruction of signals. Int. Math. Res. Not. 64, 4019–4041 (2005)MathSciNetCrossRefGoogle Scholar
 436.L. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)MATHCrossRefGoogle Scholar
 441.F. Santosa, W. Symes, Linear inversion of bandlimited reflection seismograms. SIAM J. Sci. Stat. Comput. 7(4), 1307–1330 (1986)MathSciNetMATHCrossRefGoogle Scholar
 444.B. Schölkopf, A. Smola, Learning with Kernels (MIT Press, Cambridge, 2002)Google Scholar
 447.Y. Shrot, L. Frydman, Compressed sensing and the reconstruction of ultrafast 2D NMR data: Principles and biomolecular applications. J. Magn. Reson. 209(2), 352–358 (2011)CrossRefGoogle Scholar
 450.J.L. Starck, E.J. Candès, D.L. Donoho, The curvelet transform for image denoising. IEEE Trans. Image Process. 11(6), 670–684 (2002)MathSciNetCrossRefGoogle Scholar
 451.J.L. Starck, F. Murtagh, J. Fadili, Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity (Cambridge University Press, Cambridge, 2010)CrossRefGoogle Scholar
 455.T. Strohmer, B. Friedlander, Analysis of sparse MIMO radar. Preprint (2012)Google Scholar
 468.G. Tauböck, F. Hlawatsch, D. Eiwen, H. Rauhut, Compressive estimation of doubly selective channels in multicarrier systems: leakage effects and sparsityenhancing processing. IEEE J. Sel. Top. Sig. Process. 4(2), 255–271 (2010)CrossRefGoogle Scholar
 469.H. Taylor, S. Banks, J. McCoy, Deconvolution with the ℓ _{1}norm. Geophysics 44(1), 39–52 (1979)CrossRefGoogle Scholar
 472.V. Temlyakov, Greedy Approximation. Cambridge Monographs on Applied and Computational Mathematics, vol. 20 (Cambridge University Press, Cambridge, 2011)Google Scholar
 473.R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)MathSciNetMATHGoogle Scholar
 474.J. Traub, G. Wasilkowski, H. Woźniakowski, Informationbased Complexity. Computer Science and Scientific Computing (Academic Press Inc., Boston, MA, 1988) With contributions by A.G.Werschulz, T. Boult.Google Scholar
 476.J.A. Tropp, Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theor. 50(10), 2231–2242 (2004)MathSciNetCrossRefGoogle Scholar
 478.J.A. Tropp, Algorithms for simultaneous sparse approximation. Part II: Convex relaxation. Signal Process. 86(3), 589–602 (2006)MathSciNetMATHCrossRefGoogle Scholar
 479.J.A. Tropp, Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theor. 51(3), 1030–1051 (2006)MathSciNetCrossRefGoogle Scholar
 482.J.A. Tropp, On the linear independence of spikes and sines. J. Fourier Anal. Appl. 14(5–6), 838–858 (2008)MathSciNetMATHCrossRefGoogle Scholar
 487.J.A. Tropp, A.C. Gilbert, M.J. Strauss, Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit. Signal Process. 86(3), 572–588 (2006)MATHCrossRefGoogle Scholar
 488.J.A. Tropp, J.N. Laska, M.F. Duarte, J.K. Romberg, R.G. Baraniuk, Beyond Nyquist: Efficient sampling of sparse bandlimited signals. IEEE Trans. Inform. Theor. 56(1), 520–544 (2010)MathSciNetCrossRefGoogle Scholar
 497.S. Vasanawala, M. Alley, B. Hargreaves, R. Barth, J. Pauly, M. Lustig, Improved pediatric MR imaging with compressed sensing. Radiology 256(2), 607–616 (2010)CrossRefGoogle Scholar
 507.Y. Wiaux, L. Jacques, G. Puy, A. Scaife, P. Vandergheynst, Compressed sensing imaging techniques for radio interferometry. Mon. Not. Roy. Astron. Soc. 395(3), 1733–1742 (2009)CrossRefGoogle Scholar
 508.P. Wojtaszczyk, A Mathematical Introduction to Wavelets (Cambridge University Press, Cambridge, 1997)MATHCrossRefGoogle Scholar
 512.G. Wright, Magnetic resonance imaging. IEEE Signal Process. Mag. 14(1), 56–66 (1997)CrossRefGoogle Scholar
 513.J. Wright, A. Yang, A. Ganesh, S. Sastry, Y. Ma, Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)CrossRefGoogle Scholar
 519.J. Zou, A.C. Gilbert, M. Strauss, I. Daubechies, Theoretical and experimental analysis of a randomized algorithm for sparse Fourier transform analysis. J. Comput. Phys. 211, 572–595 (2005)MathSciNetCrossRefGoogle Scholar
 520.A. Zymnis, S. Boyd, E.J. Candès, Compressed sensing with quantized measurements. IEEE Signal Process. Lett. 17(2), 149–152 (2010)CrossRefGoogle Scholar