Non-uniform recovery guarantees for binary measurements and infinite-dimensional compressed sensing

Due to the many applications in Magnetic Resonance Imaging (MRI), Nuclear Magnetic Resonance (NMR), radio interferometry, helium atom scattering etc., the theory of compressed sensing with Fourier transform measurements has reached a mature level. However, for binary measurements via the Walsh transform, the theory has been merely non-existent, despite the large number of applications such as fluorescence microscopy, single pixel cameras, lensless cameras, compressive holography, laser-based failure-analysis etc. Binary measurements are a mainstay in signal and image processing and can be modelled by the Walsh transform and Walsh series that are binary cousins of the respective Fourier counterparts. We help bridging the theoretical gap by providing non-uniform recovery guarantees for infinite-dimensional compressed sensing with Walsh samples and wavelet reconstruction. The theoretical results demonstrate that compressed sensing with Walsh samples, as long as the sampling strategy is highly structured and follows the structured sparsity of the signal, is as effective as in the Fourier case. However, there is a fundamental difference in the asymptotic results when the smoothness and vanishing moments of the wavelet increase. In the Fourier case, this changes the optimal sampling patterns, whereas this is not the case in the Walsh setting.


INTRODUCTION
Since Shannon's classical sampling theorem [50,54], sampling theory has been a widely studied field in signal and image processing.Infinite-dimensional compressed sensing [4,9,18,36,37,48,49] is part of this rich theory and offers a method that allows for infinite-dimensional signals to be recovered from undersampled linear measurements.This gives a non-linear alternative to other methods like generalized sampling [2,5,6,8,32,34,41] and the Parametrized-Background Data-Weak (PBDW)-method [13,14,24,[42][43][44] that reconstruct infinite-dimensional objects from linear measurement.However, these methods do not allow for subsampling, and hence are dependent on consecutive samples of, for example, the Fourier transform.Infinite-dimensional compressed sensing, on the other hand, is similar to generalized sampling and the PBDW-method, but utilises an 1 optimisation problem that allows for subsampling.
Beside the typical flagship of modern compressed sensing, namely MRI [31,40], there is also a myriad of other applications, like fluorescence microscopy [47,51], single pixel cameras [29], medical imaging devices like computer tomography [19], electron microscopy [38], lensless cameras [35], compressive holography [20] and laser-based failure-analysis [52] among others.The applications divide themselves in three different groups: those that are modelled by Fourier measurements, those that are based on the Radon transform, and those that are represented by binary measurements.For Fourier measurements there exists a large history of research, however for Radon and binary measurements, the theoretical results are scarce.In this paper we consider binary measurements and provide the first non-uniform recovery guarantees for infinite-dimensional compressed sensing.
The setup of infinite-dimensional compressed sensing is as follows.We consider an orthonormal basis {ϕ j } j∈N of a Hilbert space H and an element f = j∈N x j ϕ j ∈ H, x j ∈ C, to be recovered from linear measurements.That is, we have another orthonormal basis {ω i } i∈N of H and we can access the linear measurements given by l i (f ) = ⟨f, ω i ⟩.Although the Hilbert space can be arbitrary, we will in applications mostly consider function spaces.Hence, we will often refer to the object f as well as the basis elements as functions.We call the functions ω i , i ∈ N sampling functions and the space spanned by them S = span {ω i ∶ i ∈ N} sampling space.Accordingly, ϕ j , j ∈ N are called reconstruction functions and R = span {ϕ j ∶ j ∈ N} reconstruction space.Generalized sampling [2][3][4] and the PBDW-method [44] use the change of basis matrix U = {u i,j } i,j∈N ∈ B( 2 (N)) with u i,j = ⟨ϕ i , ω j ⟩ to find a solution to the problem of reconstructing coefficients in the reconstruction space from measurements in the sampling space.This is also the case in infinite-dimensional compressed sensing.In particular, we consider the following reconstruction problem.For a fixed signal f = ∑ j x j ϕ j and the measurements g = P Ω U f + z, where Ω ⊂ {0, 1, . . ., N r } is the subsampling set, P Ω the orthogonal projection onto the elements indexed by Ω and z 2 ≤ δ some additional noise.The reconstruction problem is to find a minimiser of (1.1) min ξ∈ 2 (N) ξ 1 subject to P Ω U ξ − g 2 ≤ δ.

PRELIMINARIES
2.1.Setting and Definitions.In this section we recall the settings from [9] that are needed to establish the main results.First, note that we will use a ≲ b to describe that a is smaller b modulo a constant, i.e. there exists some C > 0 such that a ≤ Cb.Moreover, for a set Ω ⊂ N the orthogonal projection corresponding to the elements of the canonical bases of 2 (N) with the indices of Ω is denoted by P Ω .Similar, for N ∈ N the orthogonal projection onto the first N elements of the canonical basis of 2 (N) is represented by P N .Finally, P b a stands for the orthogonal projection onto the basis vectors related to the indices {a + 1, . . ., b}.Note that (1.1) is an infinite-dimensional optimisation problem, however, in practice (1.1) is replaced by min ξ∈ 2 (N) ξ 1 subject to P Ω U P N ξ − g 2 ≤ δ.
We denote the set of (s, M)sparse vectors by Σ s,M .
The majority of natural signals is not perfectly sparse but instead has a small tail in the representation system.Hence, in a large number of applications it is unlikely to ask for sparsity but compressibility.

Definition 2.2 ( [9]
).Let f = ∑ j∈N x j ϕ j , where x = (x j ) j∈N ∈ 2 (N).We say that f is (s, M)compressible with respect to {ϕ j } j∈N if σ s,M (f ) is small, where In terms of this more detailed description of the signal instead of classical sparsity it is possible to adapt the sampling scheme accordingly.Complete random sampling will be substituted by the setting of multilevel random sampling.This allows us later to treat the different levels separately.Moreover, this represents sampling schemes that are used in practice.

r, and suppose that
are chosen uniformly at random, where N 0 = 0. We refer to the set as an (N, m)multilevel sampling scheme.
Remark 2.4.To avoid pathological examples we assume as in [9] that we have for the total sparsity s = s 1 + . . .s r ≥ 3.This results in the fact that log(s) ≥ 1 and therefore also m k ≥ 1 for all k = 1, . . ., r.

MAIN RESULTS: NON-UNIFORM RECOVERY FOR THE WALSH-WAVELET CASE
In this paper we focus on the reconstruction from binary measurements.This arises naturally in examples like those mentioned in the introduction and applications where the sampling is performed with an apparatus that has an "on-off"-behaviour.We focus on the setting of recovering data in L 2 ([0, 1]), however the theory builds on general results for Hilbert spaces as presented in §2.Linear measurements are typically represented by inner products between sampling functions and the data of interest.Binary measurements can be represented with functions that take values in {0, 1}, or, after a well known and convenient trick of subtracting constant one measurements, with functions that take values in {−1, 1}.For practical reasons it is sensible to consider functions that provide fast transforms.Additionally, the function system should correspond well to the reconstruction space.For the reconstruction with wavelets, Walsh functions have proven to be a sensible choice, and are discussed in more detail in §3.2.1.Sampling from binary measurements has been analysed for the linear case in [12,33,53] and in the non-linear case in [1,46].We extend this results to the nonuniform recovery guarantees in the non-linear case.By filling this gap we gain broad knowledge about linear and non-linear reconstruction for two of the three main measurement systems: Fourier and binary.Let where {ϕ j } j∈N denotes the Walsh functions on [0, 1] as described in §3.2.1, and {ω i } i∈N demotes the Daubecies boundary wavelets on [0, 1] described in §3.2.2.We are now able to state the recovery guarantees for the Walsh-wavelet case.
Theorem 3.1 (Main theorem).Given the notation above, let N = (N 0 , . . ., N r ) define the sampling levels as in (3.7) and M = (M 0 , . . ., M r ) represent the levels of the reconstruction space as in (3.6).Consider U as in (3.1) , > 0 and let Ω = Ω N,m be a multilevel sampling scheme such that the following holds: ( ( Then with probability exceeding 1 − s , any minimizer ξ ∈ 1 (N) of (1.1) for some constant c, where This results allows one to exploit the asymptotic sparsity structure of most natural images under the wavelet transform.It was observed in [9] that the ratio of non-zero coefficients per level decreases very fast with increasing level and at the same time the level size increases.Hence, most images are not that sparse in the first levels and sampling patterns should adapt to that.However, they are very sparse in the higher levels.Therefore, we get that the number of measurements depends mainly on the sparsity in this level and the influence of the sparsity in the other levels decays exponentially.Remark 3.2.For awareness of potential extensions of this work to higher dimensions or other reconstruction and sampling spaces we kept the factor (N k − N k−1 ) N k−1 in (3.3).However, for the Walsh-wavelet case in one dimension this factor reduces to 1, when the values from Equation (3.7) are used.Hence, the Equation (3.3) can be further simplified to ) log however, in general one needs the factor Remark 3.3.We would like to highlight the differences to the Fourier-wavelet case, i.e. to Theorem 6.2. in [9].The most striking difference is the squared factor in (3.2).In the Fourier-wavelet case this is dependent on the smoothness of the wavelet and shown to be 1) , where α denotes the decay rate under the Fourier transform, i.e. the smoothness of the wavelet.For very smooth wavelets this can be improved to Hence, for very smooth wavelets we get the optimal linear relation, beside log terms.However, for nonsmooth wavelets like the Haar wavelet, we get a squared relation instead of linear.The reason why we do not observe a similar dependence on the smoothness in terms of the sampling relation is that smoothness of a wavelet does not relate to a faster decay under the Walsh transform.This is also related to the fact that for Fourier measurements (3.3) become where A k,l and B k,l are positive numbers, Ñ = (K √ s) 1+1 v N , where v denotes the number of vanishing moments, and ŝk = max{s k−1 , s k , s k+1 }.In particular, smoothness and vanishing moments of the wavelet does have an impact in the Fourier case, but not in the Walsh case.This is also confirmed in Figure 1, where we have plotted the absolute values of sections of U , where U is the infinite matrix from (3.1).As can be seen in Figure 1, the matrix U gets more block diagonal in the Fourier case with more vanishing moments confirming the dependence of α and v in (3.4).Note that for a completely block diagonal matrix U the m k in (3.4) will only depend on s k and not any of the s l when l ≠ k.In contrast this effect is not visible in the Walsh situation suggesting that the estimate in (3.3) captures the correct behaviour by not depending on α and v.The reason why is that a function needs to be smooth in the dyadic sense to have a faster decay rate under the Walsh transform.However, this is not related to classical smoothness.Finally, numerical examples in §5 suggest that the squared relation in (3.2) is not sharp and is also possible to reconstruct images with a reduced relation between the maximal sample and the maximal reconstructed coefficient.
3.1.Connection to related work.Reconstruction methods are mainly divided in two major classes of linear and non-linear methods.For the linear case generalized sampling [2] and the PBDW-method [44] are prominent examples.Preceding to the first one consistent sampling was investigated by Aldroubi, Eldar, Unser and others [11,[25][26][27][28]55]. Then generalized sampling has been studied by Adcock, Hansen, Hrycak, Gröchenig, Kutyniok, Ma, Poon, Shadrin in [2,5,6,8,32,34,41].The PBDW-method evolved from the work of Maday, Patera, Penn and Yano in [43] first under the name generalized empirical interpolation method.This was then further analysed and extended to the PBDW-method by Binev, Cohen, Dahmen, De-Vore, Petrova, and Wojtaszczyk [13,14,24,42,44].The stability and accuracy of both methods is secured by the stable sampling rate (SSR) which controls the number of samples needed for a stable and accurate reconstruction of a certain number of coefficients in the reconstruction space.It was shown that the SSR is linear for the Fourier-wavelet [7], Fourier-shearlet [41] and Walsh-wavelet case [33].However, this is not always the case as for the Fourier-polynomial situation [34].In the non-linear setting the most prominent reconstruction technique is infinite-dimensional compressed sensing [18] as analysed in the Fourier case by Adcock, Hansen, Kutyniok, Lim, Poon and Roman [4,9,37,48,49].There exists wide spread knowledge in this area.For the Fourier wavelet case we know uniform recovery guarantees [39] and non-uniform recovery guarantees [9,10].For Walsh measurements we have uniform recovery guarantees from Adcock, Antun and Hansen [1] and an analysis for variable and multilevel density sampling strategies for the Walsh-Haar case and finite-dimensional signals by Moshtaghpour, Dias and Jacques in [46].In this paper we present the non-uniform results for the Walsh-wavelet case as has been studied for the Fourier case in [9,10].

3.2.
Sampling and Reconstruction space.
3.2.1.Sampling Space.We start with the sampling space.To model binary measurements Walsh functions have proven to be a good choice.They behave similar to Fourier measurements with the difference that they work in the dyadic rather than the decimal analysis.They also have an increasing number of zero crossing.This leads to the fact that the change of basis matrix gets a block diagonal structure, as can be seen in Figure 1.One can exploit the asymptotic sparsity and incoherence.However, the fact that they are defined in the dyadic analysis leads to some difficulties and specialities in the proof.
Let us now define the Walsh functions, which form the kernel of the Hadamard matrix.Then we proceed with their properties and the definition of the Walsh transform.
This definition can also be extended to negative inputs by Wal(−n, x) = Wal(n, −x) = − Wal(n, x).Walsh functions are one-periodic in the second input if the first one is an integer.Moreover, the definition is extended to arbitrary inputs n ∈ R instead of the more classical definition for n ∈ N. We would like to make the reader aware of different orderings of the Walsh functions.The one presented here is the Walsh-Kaczmarz ordering.It is ordered in terms of increasing number of zero crossings.This has the advantage that it relates nicely with the scaling of wavelets.Two other possible orderings are Walsh-Paley and Walsh-Kronecker.Both have the drawback that the number of zero crossings is not increasing.Therefore, we are not able to get the block diagonal structure in the change of basis matrix.The Walsh Kronecker ordering is also not often used in practice because one has to predefine the largest input of n and dependent on this value the ordering is changing, i.e. there is a third input n max which also leads to changes.
For the sampling pattern we divide the sequency parameter n into blocks of size 2 j with j ∈ N.This results in an insightful relationship between the wavelets and the Walsh functions.Additionally, it is directly related to the block structure observed in numerical experiments.
After the small excursion on orderings we now define the sampling space in one dimension by In general it is not possible to acquire or save an infinite number of samples.Therefore, we restrict ourselves to the sampling space according to Ω N,m , i.e.
The Walsh functions obey some interesting properties: the scaling property, i.e.Wal(2 j n, x) = Wal(n, 2 j x) for all j ∈ N and n, x ∈ R and the multiplicative identity, i.e.Wal(n, x) Wal(n, y) = Wal(n, x ⊕ y), where ⊕ is the dyadic addition.With the Walsh functions we are able to define the continuous Walsh transform almost everywhere: The properties from the Walsh functions are easily transferred to the Walsh transform.We state now some more statements about the Walsh functions and the Walsh transform, which are necessary for the main proof.Lemma 3.5 ( [33]).Let t ∈ N and x ∈ [0, 1), then the following holds: Remark that this only holds because x and t do not have non-zero elements in their dyadic representation at the same spot and therefore the dyadic addition equals the decimal addition.Next, we consider Walsh polynomials and see how we can relate the sum of squares of the polynomial to the sum of squares of its coefficients.

Definition 3.6 ( [33]
).Let A, B ∈ Z such that A ≤ B and α j ∈ R. Then for z ∈ R + we define the Walsh polynomial of order n = B by Φ(z) = ∑ B j=A α j Wal(j, z).The set of all Walsh polynomials up to degree n is given by In the proof we will combine the shifts in the wavelet in a Walsh polynomial.With this lemma at hand this is then easily bounded.
3.2.2.Reconstruction Space.Next, we define the reconstruction space.As we are mainly interested in the reconstruction of images and audio signals, we use Daubechies wavelets.They provide good smoothness and support properties.Moreover, they obey the Multi-resolution analysis (MRA).The wavelet space is described by the wavelet ψ at different levels and shifts ψ j,m (x) = 2 j 2 ψ(2 j x − m) for j, m ∈ N, i.e. we have the wavelet space at level j W j ∶= span {ψ j,m , m ∈ N} .
They build a representation system for L 2 (R), i.e. ⋃ j∈N W j = L 2 (R).For the MRA we define also the sampling function φ and the according sampling space where φ j,m (x) = 2 j 2 φ(2 j x−m).We then have that V j = V j−1 ⊕W j−1 and L 2 (R) = closure V J ⊕ ⋃ j≥J W j .The Daubechies scaling function and wavelet have the same smoothness properties.This allows us to deal with them interchangeably, as we only need the decay rate under the Walsh transform for the analysis.
However, the classical definition of Daubechies wavelets has a large drawback for our setting.Normally, they are defined on the whole line R. Due to the fact that Walsh functions are defined on [0, 1] it is necessary to restrict the wavelets also to [0, 1].Otherwise there will be elements in the reconstruction space which are not in the sampling space and therefore the solution could not be unique.Hence, we are using boundary corrected wavelets ( §4 [21]).In [33] this problem of the relation between the reconstruction and sampling space is discussed in more length and we refer the interested reader.
For the definition of boundary wavelets we have to correct those that intersect with the boundary.We start with the definition of the scaling space and continue with the wavelet space.Let φ be the scaling function of order p with support in [−p + 1, p].First consider the lowest level J 0 such that the scaling functions do only intersect with one boundary 0 or 1, i.e. 2 J0 ≥ 2p − 1.Then we can keep the interior scaling functions.The exterior ones are changed according to [21] and are denoted by The left and right functions still have the same smoothness properties and staggered support, such that the new system has the same properties as before.Additionally, it was proved in [21] that V j can be spanned by the scaling function and it translates and the reflected version φ # (x) = φ(−x + 1), i.e.
The new system still obeys the MRA hence the boundary wavelets can be deduced from the boundary corrected scaling functions.Fortunately, we only need the smoothness properties of the wavelet.The boundary wavelet will be denoted by ψ b and ψ b j,m (x) = 2 j 2 ψ b (2 j x − m).Because the smoothness properties are also kept after the boundary correction, we do not get into the details about the construction of the wavelet.The interested reader should seek out for [21] for a detailed analysis.
With this information at hand we can now have a look at the reconstruction space.To analyse L 2 ([0, 1]) we represent the low frequency part by the scaling space at J 0 and the higher frequency with the wavelet spaces with j ≥ J 0 .Hence, the reconstruction space is given by In the finite-dimensional setting we only consider wavelets up to a certain scale R = log(N ) we denote the space of the first N = 2 R elements by Remark 3.8.We consider here only the case of Daubechies wavelets of order p ≥ 3. The theory also holds for the case for order p = 1, 2. Nevertheless, we get unpleasant exponents α depending on the wavelet and different cases to consider.For the Haar wavelet, we can get even better estimates due to the perfect block structure of the change of basis matrix in that case.A detailed analysis of the relation between Haar wavelets and Walsh functions can be found in [53] and we discuss the recovery guarantees for this special case in §4.4.
For the future analysis we want to get the shift in the wavelets transferred to the Walsh function.For this sake we use Lemma 3.5.However, in the assumptions we have that t ∈ N and x ∈ [0, 1].Due to the larger support of the wavelets this does not hold true, i.e.
Therefore, we have to separate the wavelets into parts which have support in [0, 1].Remark that this is not a contradiction to the construction of the boundary wavelets.They are indeed supported in [0, 1].However, only from the beginning of the scaling J 0 and not the mother scaling function.Therefore, we represent the mother scaling function as follows This can also be done accordingly for the reflected function φ # .More detailed information about this problem can be found in [33].We are now discussing the ordering of the sampling and reconstruction space.We order the reconstruction space according to the levels, as in (3.5).With this we get the multilevel subsampling scheme with the level structure.For this sake, we bring the scaling function at level J 0 and the wavelet at level J 0 together into one block of size 2 J0+1 .The next level constitutes of the wavelets of order J 0 + 1 of size 2 J0+1 and so forth.Therefore, we define to represent the level structure of the reconstruction space.For the sampling space we use the same partition.We only allow by the choice of q ≥ 0 oversampling.Let ).
We then get for the reconstruction matrix U in (3.1) with u i,j = ⟨ϕ i , ω j ⟩ that ω j (x) = Wal(j, x) and for the first block we have ϕ i = φ J0,i for i = 0, . . ., 2 J0 − p − 1 and ϕ i = φ # J0,i for i = 2 J0 − p, . . ., 2 J0 − 1.For the next levels, i.e. for i ≥ 2 J0 we get for l with 2 l ≤ i < 2 l+1 and m = i − 2 l that ϕ i = ψ b l,m .The proof of the main theorem relies mainly on the analysis of the change of basis matrix.Numerical examples and rigour mathematics [53] show that it is perfectly block diagonal for the Walsh-Haar case.And it is also close to block diagonality for other Daubechies wavelets, which can be seen in Figure 1.An intuition about this phenomena is given in Figure 2. We plotted Haar wavelets at different scales with Walsh functions at different sequencies.In 2a the scaling of the Haar wavelet is higher than the sequency of the Walsh function.Therefore, the Walsh function does not change the wavelet on its support and hence it integrates to zero.The next one 2b shows a wavelet and Walsh function at similar scale and sequency which relates to parts of the change of basis function in the inner block.Here the two functions add up nicely to get a non-zero output.Last, we have in 2c that the Walsh functions oscillate faster then the wavelet and hence the Walsh function is not disturbed by the wavelet and can integrate to zero.

PROOF OF THE MAIN RESULT
4.1.Preliminaries.It is important to make sure that the uneven finite sections of the change of basis matrix are close to an isometry.In the finite-dimensional setting this is assured by the stable sampling rate.Detailed analysis about the stable sampling rate for Walsh functions can be found in [33].The analysis for Fourier measurements is conducted in [7,30,41].For the infinite case the balancing property controls the relation between the number of samples and reconstructed coefficients, such that the matrix P N U P M is close to an isometry.

Definition 4.1 ( [4]
).Let U ∈ B( 2 (N)) be an isometry.Then N ∈ N and K ≥ 1 satisfy the strong balancing property with respect to U, M ∈ N and s ∈ N if where ⋅ ∞ → ∞ is the norm on B( ∞ (N)).
In this setting we use the notation as in [9].Let In the rest of the analysis we are interested in the number of samples needed for stable and accurate recovery.This value depends besides known values on the local coherence and the relative sparsity which are defined next.We start with the (global) coherence.
With this it is possible to define the local coherence for every level band.

Definition 4.3 ( [9]
).Let U ∈ B( 2 (N)) be an isometry.The (k, l) th local coherence of U with respect to M, N is given by We also define

Definition 4.4 ( [9]
).Let U ∈ B( 2 (N)) be an isometry and s = (s 1 , . . ., s r ) ∈ N r and 1 ≤ k ≤ r the k th relative sparsity is given by After clarifying the notation and settings we are now able to state the main theorem from [9].
Suppose that ξ ∈ 1 (N) is a minimizer of (1.1).Then, with probability exceeding 1 − s , for some constant c and It is a mathematical justification to use structured sampling schemes in contrast to the first compressed sensing results which promoted the use of random sampling masks.

Key estimates.
In this chapter we discuss the important estimates that are needed for the proof of Theorem 3.1.They are also interesting for themselves and allow a deeper understanding of the relation between Walsh functions and wavelets.

4.2.1.
Local coherence estimate.We start with restating the results about the decay rate of wavelets under the Walsh transform.Lemma 4.6 ( [12]).Let f be a Hölder continuous function of order α ≥ 1.Then the constant C f = sup t∈[0,1] f ′ (t) exists and we have that This leads directly to the following estimate of the decay rate of wavelets under the Walsh transform.
Corollary 4.7.Let φ be the mother scaling function of order p ≥ 3 and φ # be its reflected version.Moreover, let ψ be the corresponding mother wavelet.Then we have that We denote by C φ,ψ the maximum of C φ , C φ # , C ψ , .
Proof.The corollary follows directly from the Hölder continuity of the wavelet.
This decay rate is important in a lot of the following proofs.Next, we use it to estimate µ(P N k−1 N k U ). Lemma 4.8.Let U be the change of basis matrix for the boundary Daubechies wavelets and Walsh functions.Moreover, let M and N be defined by (3.6) and (3.7).Then we have that Proof.The proof follows the lines in [9] and uses the decay estimates in Corollary 4.7.First, we have that Then we get using the arguments in Theorem 7.15 (ii) in [9] and the tensor product structure This gives together with the first estimate the desired result.Now, we recall the result from [12] about the local coherence.Note that the local coherence has a different definition in [9] and [12].Theorem 4.9 ( [12]).Let U be the change of basis matrix for Walsh functions and boundary wavelets of order p ≥ 3 and minimal wavelet decomposition J 0 .Moreover, let M and N as in (3.6) and (3.7).Then let We have that With this two theorems at hand we can now give an estimate for the local coherence.
Combining this with the result in Lemma 4.8 we get again first for l ≤ k Because of the infinite-dimensional setting we also have to estimate µ N,M (k, ∞) from (4.2).This is done in the following Corollary.
Proof.We have that We know from Lemma 4.8 that µ(P N k−1 N k U ) ≤ C N k−1 moreover, we have with Theorem 4.9 that .
Hence, we get Note that the same local coherence estimate was found for the Fourier-Haar case in [10].
4.2.2.Relative sparsity estimate.Now we want to estimate the relative sparsity of the change of basis matrix U in the Walsh-wavelet case.To do so remember Hence, we need to bound For this sake we first bound P ⊥ N U P M 2 .
Lemma 4.12.Let U be the change of basis matrix for the Walsh measurements and boundary wavelets of order p ≥ 3. Let the number of samples N be larger then the number of reconstructed coefficients M .Then we have that where Proof.We start with bounding P ⊥ N U P M 2 .We rewrite it as follows It is clear that this value gets smaller if N grows in relation to M .However, from a practical perspective it is desirable to take as few samples N with in contrast a large number M .For the further analysis we define the fraction of these two by S = M N .We include for completeness the intermediate steps, which are similar to the proof of the main theorem in [33].However, we believe that this allows us to give a deeper understanding.Especially, the constant C rs is interesting to understand and see what impacts its size.
We first use the MRA property to rewrite ϕ ∈ R M as the sum of the elements in the related scaling space.Take in mind at this point that we only consider values of M = 2 R .Hence, we only jump from level to level.We get for ϕ This reduces the problem of the sum of the inner products for the orthogonal projection from a lot of different wavelets to shifted scaling function at the same level.We have Hence, we start with controlling the inner product ⟨Wal(k, ⋅), φ R,n ⟩ and analogously ⟨Wal(k, ⋅), φ # R,n ⟩.Our aim is to remove the scaling and the shift from the wavelet and get instead the product between the Walsh transform of the original mother wavelet and a Walsh polynomial.For this we follow the ideas in [33].Remember first, that the mother scaling function is divided into the sum of functions that are supported in This allows us to only deal with ⟨Wal(k, ⋅), φ i,R,n ⟩.We get Next, we use Lemma 3.5 to get the shift out of the integral.We define p R ∶ Z → N to map z to the the smallest integer with p R (z)2 R + z > 0. This allows us to use Lemma 3.5 because x ∈ [0, 1] and With this we are able to represent the inner product of every shifted version of φ i,R,n with the Walsh function as product of the Walsh transform of φ i and a Walsh function which contains the shift information.In the following we want to rewrite the inner products such that we are left with a Walsh polynomial and the Walsh transform of the mother scaling function.For this define We get After this evaluation we can go back to estimate the norm of P ⊥ N U P M 2 .We have with the Cauchy-Schwarz inequality After multiplying out the brackets we are left with (4.5) and the analogue for φ # as well as their product.Because φ and φ # share the same decay rate, it is sufficient to only deal with (4.5) and deduce the rest from it.To estimate these values we use the one-periodicity of the Walsh functions.For this sake let M = 2 R .We always want to reconstruct a full level as we do not know in which part of the level the information is located.Then we replace k = mM + j, where j = 0, . . ., M − 1 and m ≥ S = N M .This leads to We estimate with Lemma 4.7 Here C φ depends on the choice of the wavelet.In contrast to the Fourier case there is no known relationship between the smoothness of the wavelet and the decay rate or the behaviour of C φ , as discussed in Remark 3.3.
For the first sum we get from technical computations in [33] and Lemma 3.7 that The analogue holds true for the φ # part.Hence, we get together When we now replace S = N M and set C rs = (8p − 8) 2 max C 2 φ , C φ # 2 we get With this estimate at hand we can now proof the next lemma.
Lemma 4.13.Let U be the change of basis matrix given by the Walsh measurements and boundary wavelets of order p ≥ 3. Then we have that where C max = max {C μ, C rs }.
Proof.We use similar estimates as in Corollary 4.10.We know from Lemma 4.12 that P ⊥ N U P M 2 2 ≤ C rs ⋅ M N , whenever N ≥ M .With this we get for k > l: For l ≥ k we get .
Hence, we conclude With this Lemmas at hand we can now bound the relative sparsity S k (N, M, s).
Corollary 4.14.For the setting as before we have Proof.With the estimates from before and the Cauchy-Schwarz inequality we get we make the following calculation with m = 2 The last two steps show that (4.3) and (4.4) are fulfilled and Theorem 4.5 can be applied.We have where we used the estimate of µ N,M from Corollary 4.10 and 4.11, (3.3) and (4.6).Moreover, C and Equation (4.3) is fulfilled.Now, we consider Equation (4.4) Due to the fact that the geometric series is bounded we have

4.4.
Recovery guarantees for the Walsh-Haar case.In this section we pay attention to the Walsh-Haar case.This relationship is of high interest because of the very similar behaviour of Walsh functions and Haar wavelets.As seen earlier this results in perfect block diagonality of the change of basis matrix, see Figure 1a.For a detailed analysis we refer the reader to [53].Due to the structure, the off diagonal blocks to not impact the coherence and sparsity structure at one level.Therefore, the number of samples per level only depends on the incoherence in this given level and the relative sparsity within.With this the main theorem simplifies for the Walsh-Haar case to the next Corollary.
Corollary 4.17.Let the notation be as before, but let the wavelet be the Haar wavelet.Moreover, let > 0 and Ω = Ω N,m be a multilevel sampling scheme such that: (1) The number of samples is larger or equal the number of reconstructed coefficients, i.e.N ≥ M .µ N,M (k, l)s l log(K M √ s)

NUMERICAL EXPERIMENTS
In this chapter we show some examples which illustrate the performance gain we get from the use of compressed sensing in contrast to direct inversion.Additionally, we discuss the impact of the sampling pattern and that the coherence structure needs to be taken into account.For this sake we use a modification of the flip test introduced in [9].
First, we have a look again in figure 1 at the coherence structure of the change of basis matrix for different types of Daubechies wavelets.One can directly spot the block structure of the matrices.This is especially striking for the Haar-Walsh case, but also for other wavelets it is easy to see that the first block has the largest values with nearly zeros outside the blocks and a decay along the diagonal.Together with the structured sparsity of functions and images under the wavelet transform we can apply the main theorem to improve the reconstruction quality.To demonstrate this, let (5.1) f (x) = cos(2πx) + 0.2 cos(10πx).
Then, we can see the reconstruction in figure 4. Due to the discontinuous behaviour of the Walsh functions and the smoothness of the function f , the direct inversion has a lot of block artefacts.Here, CS gives nearly 2) is not sharp, which can also be observed in the numerical examples.Next, we demonstrate that the structure is very important.For this sake we conducted the same experiment with a flipped sampling pattern, see figure 5b.Then, the reconstruction is nowhere close to perfect and the original signal is not even identifiable.

CONCLUSION
In this paper we have completed the theory about linear and non-linear recovery guarantees for the reconstruction from binary and Fourier measurements with wavelets.We underlined the results about the structured sampling and sparsity theory with the special case of Walsh and wavelet reconstruction.Additionally, we showed the numerical gains and the problems that arise when the theory is not taken into account.

(FIGURE 1 .
FIGURE 1. Absolute values of P N U P N with N = 256, where U is the infinite matrix from (3.1), with Daubechies wavelets with different numbers of vanishing moments, and Walsh (upper row) and Fourier measurements (lower row).In the Fourier case, U becomes more block diagonal as smoothness and the number of vanishing moments increase.This is not the case in the Walsh setting, suggesting that the non-dependence of smoothness and vanishing moments in the estimate (3.3) is correct.

FIGURE 2 .
FIGURE 2. Intuition for block diagonal structure of the change of basis matrix

r k=1 2 − l−k 2 ≤
C geo , for all l = 1, . . ., r.We are left with bounding sk mk for all k = 1, . . ., r. Denote the constant from ≲ in Theorem 4.5 by C. With the estimate in Corollary 4.14 we can then bound sk with (3.3) by sk

( 2 ) 2 2
Let K = max k=1,...,r N k −N k−1 m k, M = M r , N = N r and s = s 1 + . . .+ s r and for each k = 1, . . ., r:m k ≳ log( −1 ) log(K √ sN ) ⋅ s k .Then, with probability exceeding 1 − s , any minimizer ξ ∈ 1 (N) satisfiesξ − x 2 ≤ c ⋅ δ √ K(1 + L √ s) + σ s,M (f ) ,for some constant c, whereL = c ⋅ 1 + log 2 (6 −1 ) log 2 (4KM √ s) .If m k = N k − N k−1 for 1 ≤ k ≤ r then this holds with probability 1.Proof.Due to the block diagonality we have for N ≥ M thatP ⊥ N U P M = max ϕ∈R M k>N ⟨Wal(k, ⋅), ϕ⟩ 2 = 0and therefore the balancing property is satisfied for any K. Hence assumption (1) in 4.5 is fulfilled.Next, for the same reason where we replace P ⊥ N with P m we have that M = N and µ N,M (k, l) = 0 for k ≠ l.This allows us to get rid of the sum in the estimate in the main theorem, i.e.N k − N k−1 m k log( −1 ) r l=1