On Recovery Guarantees for One-Bit Compressed Sensing on Manifolds

This paper studies the problem of recovering a signal from one-bit compressed sensing measurements under a manifold model; that is, assuming that the signal lies on or near a manifold of low intrinsic dimension. We provide a convex recovery method based on the Geometric Multi-Resolution Analysis and prove recovery guarantees with a near-optimal scaling in the intrinsic manifold dimension. Our method is the first tractable algorithm with such guarantees for this setting. The results are complemented by numerical experiments confirming the validity of our approach.


Introduction
Linear inverse problems are ubiquitous in many applications in science and engineering. Starting with the seminal works of Candès, Romberg and Tao [10] as well as Donoho [14], a new paradigm in their analysis became an active area of research in the last decades. Namely, rather than considering the linear model as entirely given by the application, one seeks to actively choose remaining degrees of freedom, often using a randomized strategy, to make the problem less ill-posed. This approach gave rise to a number of recovery guarantees for random linear measurement models under structural data assumptions. The first works considered the recovery of sparse signals; subsequent works analyzed more general union-of-subspaces models [17] and the recovery of low rank matrices [37], a model that can also be employed when studying phaseless reconstruction problems [11] or bilinear inverse problems [1].
Another line of works following this approach studies manifold models. That is, one assumes that the structural constraints are given by (unions of finitely many) manifolds. While this model is considerably richer than say sparsity, its rather general formulation makes a unified study, at least in some cases, somewhat more involved. The first work to study random linear projections of smooth manifold was [5], the authors show that Gaussian linear dimension reductions typically preserve the geometric structure. In [25], these results are refined and complemented by a recovery algorithm, which is based on the concept of the Geometric Multi-Resolution Analysis as introduced in [3] (cf. Section 2.1 below). These results were again substantially improved in [16]; these latest results no longer explicitly depend on the ambient dimension.
Arguably, working with manifold models is better adapted to real world data than sparsity and hence may allow to work with smaller embedding dimensions. For that, however, other practical issues need to be considered as well. In particular, to our knowledge there are almost no works to date that study the effects of quantization, i.e., representing the measurements using only a finite number of bits (the only remotely connected work that we are aware of is [32], but this paper does not consider dimension reduction and exclusively focuses on the special case of Grassmann manifolds).
For sparse signal models, in contrast, quantization of subsampled random measurements is an active area of research. On the one hand, a number of works considered the scenario of memoryless scalar quantization, that is, each of the measurement is quantized independently. In particular, the special case of representing each measurement only by a single bit, its sign, -often referred to as one-bit compressed sensing -has received considerable attention. In [27], it was shown that one-bit compressed sensing with Gaussian measurements approximately preserves the geometry, and a heuristic recovery scheme was presented. In [34], recovery guarantees for a linear method, again with Gaussian measurements, were derived. Subsequently, these results were generalized to subgaussian measurements [2], and partial random circulant measurements [13]. In [35], the authors provided a recovery procedure for noisy one-bit Gaussian measurements which provably works on more general signal sets (essentially arbitrary subsets of the euclidean ball). This procedure, however, becomes NP-hard as soon as the signal set is non-convex, a common property of manifolds.
Another line of works studied so-called feedback quantizers, that is, the bit sequence encoding the measurements is computed using a recursive procedure. These works adapt the Sigma-Delta modulation approach originally introduced in the context of bandlimited signals [21,33] and later generalized to frame expansions [6,7] to the sparse recovery framework. A first such approach was introduced and analyzed for Gaussian measurements in [22]; subsequent works generalize the results to subgaussian random measurements [28,19]. Recovery guarantees for a more stable reconstruction scheme based on convex optimization were proved for subgaussian measurements in [38] and extended to partial random circulant matrices in [20]. For more details on the mathematical analysis available for different scenarios, we refer the reader to the overview chapter [9]. In this paper we focus on the MSQ approach and leave the study of Sigma-Delta quantizers under manifold model assumptions for future work.

Contribution
We provide the first tractable one-bit compressed sensing algorithm for signals which are well approximated by manifold models. It is simple to implement and comes with error bounds that basically match the stateof-the-art recovery guarantees in [35]. In contrast to the minimization problem introduced in [35] which does not come with a minimization algorithm, our approach always admits a convex formulation and hence allows for tractable recovery. Our approach is based on the Geometric Multi-Resolution Analysis (GMRA) introduced in [3], and hence combines the approaches of [25] with the general results for one-bit quantized linear measurements provided in [35,36].

Outline
We begin by a detailed description of our problem in Section 2 and fix notation for the rest of the paper. The section also includes a complete axiomatic definition of GMRA. Section 3 states our main results. The proofs can be found in Section 4. In Section 5 we present some numerical experiments testing the recovery in practice and conclude with Section 6. Technical parts of the proofs as well as adaption of the results to GMRAs from random samples are deferred to the Appendix.

Problem Formulation, Notation, and Setup
The problem we address is the following. We consider a given union of low-dimensional manifolds (i.e., signal class) M of intrinsic dimension d that is a subset of the unit sphere S D−1 of a higher dimensional space R D , d D. Furthermore, we image that we do not know M perfectly, and so instead we only have approximate information about M represented in terms of a structured dictionary model D for the manifold. Our goal is now to recover an unknown signal x ∈ M from m one-bit measurements where A ∈ R m×D has Gaussian i.i.d. entries of variance 1/ √ m, using as few measurements, m, as possible. Each single measurement sign( a i , x ) can be interpreted as the random hyperplane {z ∈ R D : a i , z = 0} S D−1 (a) Tessellation of the sphere by random hyperplanes. Definition 2.1 (GMRA Approximation to M, [25]). Let J ∈ N and K 0 , K 1 , ..., K J ∈ N. Then a Geometric Multi Resolution Analysis (GMRA) Approximation of M is a collection {(C j , P j )}, j ∈ [J] := {0, ..., J}, of sets C j = {c j,k } Kj k=1 ⊂ R D of centers and of affine projectors which approximate M at scale j, such that the following assumptions (1)-(3) hold.
(1) Affine Projections: Every P j,k ∈ P j has both an associated center c j,k ∈ C j and an orthogonal matrix Φ j,k ∈ R d×D , such that i.e., P j,k is the projector onto some affine d-dimensional linear subspace P j,k containing c j,k .
(2) Dyadic Structure: The number of centers at each level is bounded by |C j | = K j ≤ C C 2 dj for an absolute constant C C ≥ 1. There exist C 1 > 0 and C 2 ∈ (0, 1], such that following conditions are satisfied: (3) Multiscale Approximation: The projectors in P j approximate M at scale j, i.e., when M is sufficiently smooth the affine spaces P j,k locally approximate M pointwise with error O 2 −2j . More precisely: (a) There exists j 0 ∈ [J − 1], such that c j,k ∈ tube C1·2 −j−2 (M), for all j > j 0 ≥ 1 and k ∈ [K j ].
(b) For each j ∈ [J] and z ∈ R D let c j,kj (z) be one of the centers closest to z, i.e., Then, for each z ∈ M there exists a constant C z > 0 such that for all j ∈ [J]. Moreover, for each z ∈ M there existsC z > 0 such that Remark 2.2. By property (1) GMRA approximation represents M as a combination of several anchor points (the centers c j,k ) and corresponding low dimensional affine spaces P j,k . The levels j control the accuracy of the approximation. The centers are organized in a tree-like structure as stated in property (2). Property (3) then characterizes approximation criteria to be fulfilled on different refinement levels. Note that centers do not have to lie on M (compare Figure 1b) but their distance to M is controlled by property (3a). Figure 2: The closest center c j,kj (x) is not identified by measurements. Dotted lines represent one-bit hyperplanes.

Additional Notation
Let us now fix some additional notation. Throughout the remainder of this paper we will work with several different metrics. Perhaps most importantly, we will quantify the distance between two points z, z ∈ R D with respect to their one-bit measurements by where d H counts the number of differing entries between the two sign patterns (i.e., d A (z, z ) is the normalized Hamming distance between the signs of Az and Az ). Furthermore, let P S denote orthogonal projection onto the unit sphere S D−1 , and more generally let P K denote orthogonal (i.e., nearest neighbor) projection onto the closure of an arbitrary set K ⊂ R D wherever it is defined. Then, for all z, z ∈ R D we will denote by d G (z, z ) = d G (P S (z), P S (z )) the geodesic distance between P S (z) and P S (z ) on S D−1 normalized to fulfill d G (z , −z ) = 1 for all z ∈ R D .
Herein the Euclidian ball with center z and radius r is denoted by B(z, r). In addition, the scale-j GMRA approximation to M, will refer to the portions of the affine subspaces introduced in Definition 2.1 for each fixed j which are potentially relevant as approximations to some portion of M ⊂ S D−1 . To prevent the M j above from being empty we will further assume in our results that we only use scales j > j 0 large enough to guarantee that tube C12 −j−2 (M) ⊂ B(0, 2). Hence we will have c j,k ∈ B(0, 2) for all k ∈ K j , and so C j ⊂ M j . This further guarantees that no sets P j,k ∩ B(0, 2) are empty, and that P j,k ∩ B(0, 2) ⊂ M j for all k ∈ K j .
Finally, we write a b if a ≥ Cb for some constant C > 0. The diameter of a set K ⊂ R D will be denoted by diam(K) := sup z,z ∈K z−z 2 , where · 2 is the Euclidian norm. We use dist(A, B) = inf a∈A,b∈B a−b 2 for the distance of two sets A, B ⊂ R D and by abuse of notation dist(0, A) = inf a∈A a 2 . The operator norm of a matrix A ∈ R n1×n2 is denoted by A = sup x∈R n 2 , x 2 ≤1 Ax 2 . We will write N (K, ε) to denote the Euclidian covering number of a set K ⊂ R D by Euclidean balls of radius ε (i.e., N (K, ε) is the minimum number of ε-balls that are required to cover K). And, the operators r (resp. r ) return the closest integer smaller (resp. larger) than r ∈ R.

The Proposed Computational Approach
Combining prior GMRA-based compressed sensing results [25] with the one-bit results of Plan and Vershynin in [35] suggests the following strategy for recovering an unknown x ∈ M from the measurements given in (1): First, choose a center c j,k whose one-bit measurements agree with as many one-bit measurements of x as possible. Due to the varying shape of the tessellation cells this is not an optimal choice in general (see Figure 2). Nevertheless, one can expect P j,k to be a good approximation to M near x. Thus, in the second step a modified version of Plan and Vershynin's noisy one-bit recovery method using P j,k should yield an approximation of P j,k (x) which is close to x. 1 See OMS-simple for pseudocode.
Algorithm OMS-simple: OnebitManifoldSensing -Simple Version I. Identify a center c j,k close to x via where d H is the Hamming distance, i.e., d H (z, z ) := |{l : z l = z l }|. If d H (sign(Ac j,k ), y) = 0, directly choose x * = c j,k and omit II.
II. If there is no center in the same cell as x (as in Figure 2), solve a noisy one-bit recovery problem as in [35], i.e., where R is a suitable parameter.
Remark 2.3. The minimization in (3) can be efficiently calculated by exploiting tree structures in C j . Numerical experiments (see Section 5) suggest this strategy to yield adequate approximation for the center c j,kj (x) in (2), while being considerably faster (we observed differences in runtime up to a factor of 10).
Though simple to understand, the constraints in (4) have two issues that we need to address: First, in some cases the minimization problem (4) empirically exhibits suboptimal recovery performance (see Section 5.1 for details). Second, the parameter R in (4) is unknown a priori (i.e., OMS-simple requires parameter tuning, making it less practical than one might like). Indeed, our analysis shows that making an optimal choice for R in OMS-simple requires a priori knowledge about P j,k (x) 2 which is only approximately known in advance.
To address this issue, we will modify the constraints in (4) and instead minimize over the convex hull of the nearest neighbor projection of P j,k ∩ B(0, 2) onto S D−1 , conv (P S (P j,k ∩ B(0, 2))) , to remove the R dependence. If 0 ∈ P j,k one has conv (P S (P j,k ∩ B(0, 2))) = P j,k ∩ B(0, 1). If 0 / ∈ P j,k the set conv (P S (P j,k ∩ B(0, 2))) is described by the following set of convex constraints which are straightforward to implement in practice. Denote by P c the projection onto the vector c = P j,k (0). Then, The first two conditions above restrict z to B(0, 1) and span(P j,k ), respectively. The third condition then removes all points that are too close to the origin (see Figure 3). A rigorous proof of equivalence can be found in Appendix A. Our analysis uses that the noisy one-bit recovery results of Plan and Vershynin apply to arbitrary subsets of the unit ball B(0, 1) ⊂ R D which will allow us to adapt our recovery approach. Replacing the constraints in (4) with those in (5) we obtain the following modified recovery approach, OMS. 0 0 P j,k (0) 1 2 (P j,k ∩ B(0, 2)) P j,k ∩ B(0, 2) P j,k (0) Figure 3: Two views of an admissible set conv(P S (P j,k ∩B(0, 2))) from (5) for a case with c 2 = P j,k (0) 2 < 1.
Algorithm OMS: OnebitManifoldSensing I. Identify a center c j,k close to x via where d H is the Hamming distance, i.e., d H (z, z ) := |{l : z l = z l }|. If d H (sign(Ac j,k ), y) = 0, directly choose x * = c j,k and omit II.
II. If there is no center lying in the same cell as x (see Figure 2), recover the projection of x onto P j,k , i.e., P j,k (x). To do so solve the convex optimization (−y l ) a l , z , subject to z ∈ conv (P S (P j,k ∩ B(0, 2))) .
As we shall see, theoretical error bounds for both OMS-simple and OMS can be obtained by nearly the same analysis despite their differences.

Main Results
In this section we present the main results of our work, namely that both OMS-simple and OMS approximate a signal on M to arbitrary precision with a near-optimal number of measurements. More precisely, we obtain the following theorem. There exist absolute constants E, E , c > 0 such that the following holds. Let ∈ (0, 1/16] and assume the GMRA's maximum refinement level J ≥ j := c log(1/ √ ε) for c > 0 as below. Further suppose that one has dist(0, M j ) ≥ 1/2, 0 < C 1 < 2 j , and sup x∈MCx < 2 j−2 . If then with probability at least 1 − 12 exp(−cC 2 1 ε 2 m) for all x ∈ M ⊂ S D−1 the approximations x * obtained by OMS satisfy Proof : See the proofs of Corollary 4.16 and Theorem 4.14 in Section 4.
Remark 3.2. The restrictions on C 1 andC x are easily satisfied, e.g., if the centers form a maximal 2 −j packing of M at each scale j or if the GMRA is constructed from manifold samples as discussed in [31] (cf. Appendix E). In both these cases C 1 andC x are in fact bounded by absolute constants. Numerical simulations (see Section 5) suggest that a slightly modified version of OMS performs better in some scenarios even though we cannot provide a rigorous theoretical justification for the modification's improved performance at present.
Note that Theorem 3.1 depends on the Gaussian width of M. For general sets this quantity provides a useful measure of the set's complexity. In the case of compact of Riemannian submanifolds of R D it might be more convenient to have a dependence on the geometric properties of M instead (e.g., its volume and reach). Indeed, one can show by means of [16] that w(M) can be upper bounded in terms of the manifold's intrinsic dimension d, its d-dimensional volume Vol(M), and the inverse of its reach. Intuitively, these dependencies are to be expected as a manifold with fixed intrinsic dimension d can become more complex as either its volume or curvature (which can be bounded by the inverse of its reach) grows. The following theorem , which is a combination of different results in [16], formalizes this intuition by bounding the Gaussian width of a manifold in terms of its geometric properties.
Then one can replace w(M) in above theorem by where C, c > 0 are absolute constants.
Proof : See Appendix B. Finally, we point out that Theorem 3.1 assumes access to a GMRA approximation to M ⊂ S D−1 which satisfies all of the axioms listed in Definition 2.1. Following the work of Maggioni, Minsker, and Strawn [31], however, one can also ask whether a similar result will still hold if the GMRA approximation one has access to has been learned by randomly sampling points from M without the assumptions of Definition 2.1 being guaranteed a priori. Indeed, such a setting is generally more realistic . In fact it turns out that a version of Theorem 3.1 still holds for such empirical GMRA approximations under suitable conditions; see Theorem E.7 . We refer the interested reader to Appendix D and Appendix E for additional details and discussion regarding the use of such empirically learned GMRA approximations.

Proofs
This section provides proofs of the main result in both settings described above and establishes several technical lemmas. First, properties of the Gaussian width and the geodesic distance are collected and shown. Then, the main results are proven for a given GMRA approximation fulfilling the axioms.

Toolbox
We start by connecting slightly different definitions of dimensionality measures similar to the Gaussian width and clarify how they relate to each other. This is necessary as the tools we make use of appear in their original versions referring to different definitions of Gaussian width.
(ii) the Gaussian mean width to be the Gaussian width of K − K and (iii) the Gaussian complexity: By combining Properties 5. and 6. of Proposition 2.1 in [35] on has In this sense, the Gaussian width extends the concept of dimension to general sets K. Furthermore, for a finite set K the Gaussian width is bounded by w This can be deduced directly from the definition (see, e.g., §2 of [35]). Now that we have introduced the notion of Gaussian width, we can use it to characterize the union of the given manifold and a single level of its GMRA approximation M ∪ M j (recall the definition of M j in Section 2).
Remark 4.4. Note that the first inequality holds for general sets, not only M and M j . Moreover, one only uses M j ⊂ B(0, 2) to prove the second inequality. It thus holds for M j replaced with arbitrary subsets of B(0, 2). We might use both variations referring to Lemma 4.3.
Proof : The first inequality follows by noting that To obtain the second inequality observe that where we used (10), the fact that M ⊂ S D−1 , and that M j ⊂ B(0, 2).
For the last inequality we bound w(M j ). First, note that (2). By Dudley's inequality (see, e.g., [15] ) we conclude via Jensen's inequality that where C is a constant depending on C Dudley and C C . Choosing C = 2C + 3 yields the claim as The following two lemmas concerning width bounds for fine scales will also be useful. Their proofs (see Appendix C), though more technical, use similar ideas to the proof of Lemma 4.3. The first lemma improves on Lemma 4.3 for large values of j by considering a more geometrically precise approximation to M, M rel j ⊂ M j .
It is not surprising that for general M ∈ S D−1 the width bound for w(M j ) (resp. w(M rel j )) depends on either j or log(D). When using the proximity of M rel j to M in Lemma 4.5 we only use the information that M rel j ⊂ tube C M 2 −2j and a large ambient dimension D will lead to a higher complexity of the tube. In the case of Lemma 4.3 we omit the proximity argument by using the maximal number of affine d-dimensional spaces in M j and hence do not depend on D but on the refinement level j.
The next lemma just below utilizes even more geometric structure by assuming that M is a Riemannian Manifold. It improves on both Lemma 4.3 and 4.5 for such M by yielding a width bound which is independent of both j and D for all j sufficiently large.
. Then, there exist absolute constants C, c > 0 such that Here the constants C z and C 1 are from properties (3b) and (3a), respectively.
Finally, the following lemma quantifies the equivalence between Euclidean and normalized geodesic distance on the sphere.
Proof : First observe that z, z = cos (z, z ) = cos(πd G (z, z )). This yields For the upper bound note the relation between the geodesic distanced G and the normalized geodesic distance d Gd We now have the preliminary results necessary in order to prove Theorem 3.1.

Proof of Theorem 3.1 with Axiomatic GMRA
Recall that our theoretical result concerns OMS-simple with recovery performed using (3) and (4). The proof is based on following idea. We first control the error c j,k − x 2 made by (3) in approximating a GMRA center closest to x. To do so we make use of Plan and Vershynin's result on δ-uniform tessellations in [36]. Recall the equivalence between one-bit measurements and random hyperplanes. . Let K ⊂ S D−1 and an arrangement of m hyperplanes in R D be given via a matrix A (i.e., the j-th row of A is the normal to the j-th hyperplane). Let d A (x, y) ∈ [0, 1] denote the fraction of hyperplanes separating x and y in K and let d G be the normalized geodesic distance on the sphere, i.e. opposite poles have distance one. Given δ > 0, the hyperplanes provide a δ-uniform tessellation of K if  In words Theorem 4.9 states that if the number of one-bit measurements scale at least linearly in intrinsic dimension of a set K ⊂ S D−1 then with high probability the percentage of different measurements of two points x, y ∈ K is closely related to their distance on the sphere. Implicitly the diameter of all tessellation cells is bounded by δ.
The original version of Theorem 4.9 uses γ(K) instead of w(K). However, note that by (10) we get for K ⊆ S D−1 that γ(K) ≤ w(K − K) + 2/π ≤ 3w(K) as long as the w(K) ≥ 2/π which is reasonable to assume. Hence, ifC is changed by a factor of 9, Theorem 4.9 can be stated as above.
Using these results we will show in Lemma 4.13 that the center c j,k identified in step I. of the algorithm OMS-simple satisfies x − c j,k 2 ≤ 16 max{ x − c j,kj (x) 2 , C 1 2 −j−1 } in Lemma 4.13. Therefore, the GMRA property (3b) provides an upper bound on x − P j,k (x) 2 . What remains is to then bound the gap between P j,k (x) and the approximation x * . This happens in two steps. First, Plan and Vershynin's result on noisy one-bit sensing (see Theorem 4.11) is applied to a scaled version of (4) bounding the distance between P j,k (x) andx (the minimizer of the scaled version). This argument works by interpreting the true measurements y as a noisy version of the non-accessible one-bit measurements of P j,k (x). The rescaling becomes necessary as Theorem 4.11 is restricted to the unit ball in Euclidean norm. Lastly, a geometric argument is used to bound the distance between the minimum pointsx and x * in order to conclude the proof.
Then with probability at least 1−8 exp(−cδ 2 m), the following event occurs. Consider a signalx ∈ K satisfying x 2 = 1 and its (unknown) uncorrupted one-bit measurementsỹ = (ỹ 1 , . . . ,ỹ m ) given as Then the solutionx to the optimization problem Remark 4.12. Theorem 4.11 yields guaranteed recovery of unknown signals x ∈ K ⊂ B(0, 1) up to a certain error by the formulation we use in (4) from one-bit measurements if the number of measurements scales linearly with the intrinsic dimension of K. The recovery is robust to noise on the measurements. Note that the original version of Theorem 4.11 uses w(K − K) instead of w(K). As w(K − K) ≤ 2w(K) by (10) the result stated above also holds for a slightly modified constant C .
We begin by proving Lemma 4.13.
Lemma 4.13. If m ≥CC −6 1 2 6(j+1) max{w(M∪P S (C j )) 2 , 2/π} the center c j,k chosen in step I. of Algorithm OMS-simple fulfills Noting that Gaussian random vectors and Haar random vectors yield identically distributed hyperplanes, Theorem 4.9 now transfers this bound to the normalized geodesic distance, namely As by property (3a) the centers are close to the manifold, they are also close to the sphere and we have P S (c j,k ) − c j,k 2 < C 1 2 −j−2 , for all c j,k ∈ C j . Hence, we conclude We can now prove a detailed version of Theorem 3.1 for the given axiomatic GMRA and deduce Theorem 3.1 as a corollary.
Theorem 4.14 (Uniform Recovery -Axiomatic Case). Let M ⊂ S D−1 be given by its GMRA for some levels j 0 < j ≤ J, such that C 1 < 2 j0+1 where C 1 is the constant from GMRA properties (2b) and (3a). Fix j and assume that dist(0, where C is the constant from Theorem 4.11,C from Theorem 4.9, and C > 3 from Lemma 4.3. Then, with probability at least 1 − 12 exp(−c(C 1 2 −j−1 ) 2 m) the following holds for all x ∈ M with one-bit measurements y = sign(Ax) and GMRA constantsC x from property (3b) satisfyingC x < 2 j−1 : The approximations x * obtained by OMS fulfill Here C x := 2C x + C 1 .
Remark 4.15. For obtaining the lower bounds on m in (12) and (8) we made use of Lemma 4.3 leading to the influence of j which is suboptimal for fine scales (i.e., j large). To improve on this for large j one can exploit the alternative versions of the lemma, namely, Lemma 4.5 and Lemma 4.6. Then, however, some minor modifications become necessary in the proof of Theorem 4.14 as the lemmas only apply to M rel j : In (I), e.g., one has to guarantee that C j ⊂ M rel j , i.e., that each center c j,k is a best approximation for some part of the manifold. This is a reasonable assumption especially if the centers are constructed as means of small manifold patches which is a common approach in empirical applications (cf. Appendix D). Also, when working with M rel j it is essential in (II) to have a near-best approximation subspace of x, i.e., the k obtained in (I) has to fulfill k ≈ k j (x) as M rel j does not include many near-optimal centers for each point on M. Here, one can exploit the minimal distance of centers c j,k to each other as described in GMRA property (2b) and choose δ slightly smaller (in combination with a correspondingly strengthened upper bound in Lemma 4.13) to obtain the necessary guarantees for (I). As we are principally concerned with the case where j = O(log(D)) in this paper, however, we will leave such variants to future work.
Proof of Theorem 4.14 : Recall that k is the index chosen by OMS in (6). The proof consists of three steps. First, we apply Lemma 4.13 in (I). By the GMRA axioms this supplies an estimate for x − P j,k (x) 2 with high probability. In (II) we use Theorem 4.11 to bound the distance between P j,k (x)/ P j,k (x) 2 and the minimizer x * given by (−y l ) a l , z , subject to z ∈ K := conv(P S (P j,k ∩ B(0, 2))) with high probability. By a union bound over all events Part (III) then concludes with an estimate of the distance x − x * 2 combining (I) and (II).
Hence, we can apply Theorem 4.11 to obtain with probability at least 1 − 8 exp(−cδ 2 m) that the estimate (19) now follows.
(III) To conclude the proof we apply a union bound and obtain with probability at least 1 − 12 exp(−cδ 2 m) that GMRA property (3b) combined with (19) now yields the final desired error bound.
We are now prepared to explore the numerical performance of the proposed methods.

Numerical Simulation
In this section we present various numerical experiments to benchmark OMS. The GMRAs we work with are constructed using the GMRA code provided by Maggioni 2 . We compared the performance of OMS for two exemplary choices of M, namely, a simple 2-dim sphere embedded in R 20 (20000 data points sampled from the 2-dimensional sphere M embedded in S 20−1 ) and the MNIST data set [29] of handwritten digits "1" (3000 data points in R 784 ). In each of the experiments 5.1-5.4 we first computed a GMRA up to refinement level j max = 10 and then recovered 100 randomly chosen x ∈ M from their one-bit measurements by applying OMS. Depicted is the averaged relative error between x and its approximation x * , i.e., x − x * 2 / x 2 which is equal to the absolute error x − x * 2 for M ⊂ S D−1 . Note the different approximation error ranges of the sphere and the MNIST experiments when comparing both settings.

OMS-simple vs. OMS
The first test compares recovery performance of the two algorithms presented above, namely OMS-simple for R ∈ {0.5, 1, 1.5} and OMS. The results are depicted in Figure 4. Note that only R = 1.5 and, in the case of the 2-sphere, R = 1 are depicted as in the respective other cases for each number of measurements most of the trials did not yield a feasible solution in (4) so the average was not well-defined. One can observe that for both data sets OMS outperforms OMS-simple which is not surprising as OMS does not rely on a suitable parameter choice. This observation is also the reason for us to restrict the theoretical analysis to OMS. The more detailed approximation of the toy example (2-dimensional sphere) is due to its simpler structure and lower dimensional setting and can also be observed in 5.2-5.4.

Number of Measurements
Average Error

Modifying OMS
In a second experiment we compared OMS to a slightly different version in which (7) is replaced by [(−y l ) a l , z ] + , subject to z ∈ conv (P S (P j,k ∩ B(0, 2))) 2 The code is available at http://www.math.jhu.edu/~mauro/#tab_code.
where [t] + = max{0, t} denotes the positive part of t ∈ R. This is motivated by following observation: As stated in Theorem 4.11, Plan and Vershynin showed that can recover unknown signals from noisy one-bit measurements if K ⊂ B(0, 1) is a subset of the D-dimensional Euclidean ball. The minimization in (21) can be re-stated equivalently as arg min z∈K   l : y l =sign( a l ,z ) where P Ha l denotes the orthogonal projection onto the D − 1 dimensional subspace H a l perpendicular to a l . To see this note that a l , z / a l 2 = sign( a l , z ) z − P Ha l 2 . Hence, (21) punishes incorrect measurements of a feasible point z ∈ K by its distance to the 'measurements border' H a l while rewarding correct ones. The second part which rewards might cause problems as it pushes minimizers away from the hyperplanes H a l of correct measurements. If the true x, however, lies close to one of them, this may be suboptimal. Hence, we dropped the rewarding term in (22) leading to which is still convex but performs better numerically in some cases. As depicted in Figure 5, the version with [·] + clearly outperforms the one without if M is the 2-dimensional sphere. In contrast, if M is more complex (MNIST data), the [·] + formulation clearly fails. We have no satisfactory explanation for this difference in behavior so far.

Number of Measurements
Average Error

Are Two Steps Necessary?
One might wonder if the two steps in OMS-simple and OMS are necessary at all. Wouldn't it be sufficient to use the center c j,k determined in step I. as an approximation for x? If the GMRA is fine enough, this indeed is the case. If one only has access to a rather rough GMRA, the simulations in Figure 6 show that the second step makes a notable difference in approximation quality. This behavior suits Lemma 4.13. The lemma guarantees a good approximation of x by c j,k as long as x is well approximated by an optimal center. In the MNIST case one can observe that the second step only improves performance if the number of one-bit measurements is sufficiently high. For a small set of measurements the centers might yield better approximation as they lie close to M by GMRA property (3a). On the other hand, only parts of the affine spaces are practical for approximation and a certain number of measurements is necessary to restrict II. to the relevant parts.

Number of Measurements
Average Error  Figure 6: Comparison of the following: Approximation by step I. of OMS when using tree structure (dashed, blue) and when comparing all centers (solid, red); approximation by step I.+II. of OMS when using tree structure (dashed with points, yellow) and when comparing all centers (solid with points, purple).

Tree vs. No Tree
In the fourth test we checked if approximation still works when not all possible centers are compared in step I. of OMS but their tree structure is used. This means to find an optimal center one compares on the first refinement level all centers, and then continues in each subsequent level solely with the children of the k best centers (in the presented experiments we chose k = 10). Of course, the chosen center will not be optimal as not all centers are compared (see Figure 6). In the simple 2-dimensional sphere setting, step II., however, can compensate the worse approximation quality of I. with tree search. Figure 6 hardly shows a difference in final approximation quality in both cases. In the MNIST setting one can observe a considerable difference even when performing two steps.

A Change of Refinement Level
The last experiment (see Figure 7) examines the influence of the refinement level j on the approximation error. For small j (corresponding to a rough GMRA) a high number of measurements can hardly improve the approximation quality while for large j (corresponding to a fine GMRA) the approximation error decreases with increasing measurement rates. This behavior is as expected. A rough GMRA cannot profit much from many measurements as the GMRA approximation itself yields a lower bound on obtainable approximation error. For fine GMRAs the behavior along the measurement axis is similar to above experiments. Note that further increase of j for the same range of measurements did not improve accuracy.

Discussion
In this paper we proposed OMS, a tractable algorithm to approximate data lying on low-dimensional manifolds from compressive one-bit measurements, thereby complementing the theoretical results of Plan and Vershynin on one-bit sensing for general sets in [35] in this important setting. We then proved (uniform) worstcase error bounds for approximations computed by OMS under slightly stronger assumptions than [35], and also performed numerical experiments on both toy-examples and real-world data. As a byproduct of our theoretical analysis (see, e.g., §4) we have further linked the theoretical understanding of one-bit measurements as tessellations of the sphere [36] to the GMRA techniques introduced in [3] by analyzing the interplay between a given manifold and its GMRA approximation's complexity measured in terms of the Gaussian mean width. Finally, to indicate applicability of our results we show that they hold even if there are just random samples from the manifold at hand as opposed to the entire manifold (see, e.g., Appendix D and E). Several interesting questions remain for future research however: First, the experiments in Section 5.4 suggest a possible benefit from using the tree structure within C j . Indeed approximation of OMS does still yield comparable results if I. is restricted to a tree based search which has the advantage of being computable much faster than the minimization over all possible centers. It would be desirable to obtain theoretical error bounds even in this case, as well as to consider the use of other related fast nearest neighbor methods from computer science [23].
Second, the attentive reader might have noticed in the empirical setting of Appendix D and E that (A2) in combination with Lemma E.6 seems to imply that II. of OMS may be unnecessary. As can be seen from Section 5.3 though, the second step of OMS yields a notable improvement even with an empirically constructed GMRA which hints that even with (A2) not strictly fulfilled the empirical GMRA techniques remain valid, and II. of OMS of value. Understanding this phenomenon might lead to more relaxed assumptions than (A1)-(A4).
Third, it could be rewarding to also consider versions of OMS for additional empirical GMRA variants including, e.g., those which rely on adaptive constructions [30], GMRA constructions in which subspaces that minimize different criteria are used to approximate the data in each partition element (see, e.g., [24]), and distributed GMRA constructions which are built up across networks using distributed clustering [4] and SVD [26] algorithms. Such variants could prove valuable with respect to reducing the overall computational storage and/or runtime requirements of OMS in different practical situations.
Finally, as already pointed out in Section 5.2 we do not yet understand how inserting the positive part [·] + in II. affects recovery. There seem to be cases in which a massive improvement can be observed and others in which the performance completely deteriorates. The explanation might be decoupled from this work and OMS.

A Characterization of Convex Hull
Lemma A.1. Let P j,k be the affine subspace chosen in step I. of OMS-simple and define c = P j,k (0). If 0 / ∈ P j,k , the following equivalence holds: Proof : First, assume z ∈ conv (P S (P j,k ∩ B(0, 2))). Obviously, z 2 ≤ 1. As projecting onto the sphere is a simple rescaling, conv (P S (P j,k ∩ B(0, 2))) ⊂ span(P j,k ) implying that Φ T j,k Φ j,k z + P c (z) = z. For showing the third constraint note that any z ∈ P j,k can be written as z = c + (z − c) where z − c is perpendicular to c. If in addition z 2 ≤ 2, we get As z is a convex combination of different P S (z ) the constraint also holds for z. Let z fulfill the three constraints. Then z = ( c 2 2 / z, c ) · z satisfies z ∈ P j,k because of the second constraint and z , c = c 2 2 . Furthermore, by the first and third constraint z 2 ≤ ( c 2 2 / z, c ) ≤ 2 and hence z ∈ P j,k ∩ B(0, c 2 2 / z, c ) ⊂ P j,k ∩ B(0, 2). As P j,k ∩ B(0, c 2 2 / z, c ) is the convex hull of P j,k ∩ ( c 2 2 / z, c ) · S D−1 , there are z 1 , ..., z n ∈ P j,k and λ 1 , ..., λ n ≥ 0 with z k 2 = c 2 2 / z, c and λ k = 1 such that ( c 2 2 / z, c ) · z = λ k z k . Hence, z = λ k ( z, c / c 2 2 ) · z k . As ( z, c / c 2 2 ) · z k ∈ P S (P j,k ∩ B(0, 2)) we get z ∈ conv (P S (P j,k ∩ B(0, 2))).

B Proof of Theorem 3.3
Denote by τ the reach of M and by ρ the diameter diam(M). First, note that for a set K ⊂ R D by Dudley's inequality [15] w(K) ≤ C diam(K)/2 0 log(N (K, ε)) dε where C is an absolute constant. Second, [16,Lemma 14] states that the covering number N (M, ε) of a d-dimensional Riemannian manifold M can be bounded by for all d ≥ 1 for an absolute constant β > 1, this expression may be simplified to We can combine these facts to obtain w(M) ≤ C , by using Cauchy-Schwarz inequality for the second inequality. We now bound the first integral by Given a subset S ⊂ R D we will let N (S, ε) denote the cardinality of a minimal ε-cover of S by Ddimensional Euclidean balls of radius ε > 0 each centered in S. Similarly, we will let P(S, ε) denote the maximal packing number of S (i.e., the maximum cardinality of a subset of S that contains points all of which are at least Euclidean distance ε > 0 from one another.) The following lemmas bound N (M rel j , ε) for various ranges of j and ε.
Proof : By properties (3a) and (2b) every center c j,k has an associated p j,k ∈ M such that both where L j is defined as in the proof of Lemma 4.3 (this proof also discusses its covering numbers). As a result we have that C.1 Proof of Lemma 4.5 We aim to bound w(M rel j ) in terms of w(M). By the two-sided Sudakov inequality [39] and Lemma C.1 we get that where the last inequality follows from tube C M 2 −2j (M) ⊆ B(0, 1 + C M ) and Lemma C.2. Appealing to the Sudakov inequality once more to bound the second term above we learn that To bound the first term above we note that using the covering number of B(0, 1 + C M ) can be bounded as follows As ε → ε log( 4C M +4 ε ) is non decreasing for ε ∈ (0, 2C M 2 −2j ), we obtain by assuming that where C is an absolute constant. Appealing to (11) now finishes the proof.

C.2 Proof of Lemma 4.6
Let 2C M 2 −2j ≤ε ≤ 1 4 C 1 2 −j . We aim to bound w(M rel j ) in terms of covering numbers for M. To do this we will use Dudley's inequality in combination with the knowledge that M rel j ⊂ B(0, 2) (by definition). By Dudley's inequality where C is an absolute constant. Appealing now to Lemmas C.3 and C.2 for the first and second terms above, respectively, we can see that where the last bound follows from Jensen's inequality.
We can now bound the second term as in the proof of Theorem 3.3 in Appendix B. Doing so we obtain where τ is the reach of M, and C , c are an absolute constants. Appealing to (11) together with Theorem 3.3 now finishes the proof.

D Data-Driven GMRA
The axiomatic definition of GMRA proves useful in deducing theoretical results but lacks connection to concrete applications where the structure of M is not known a priori. Hence, in the following we first describe a probabilistic definition of GMRA which can be well approximated by empirical data (see [3,12,31]) and is connected to the above axioms by applying results from [31]. In fact, we will see that under suitable assumptions the probabilistic GMRA fulfills the axiomatic requirements and its empirical approximation allows one to obtain a version of Theorem 3.1 even when only samples from M are known.

D.1 Probabilistic GMRA
A probabilistic GMRA of M with respect to a Borel probability measure Π, as introduced in [31], is a family of (piecewise linear) operators {P j : R D → R D } j≥0 of the form Here, 1 M denotes the indicator function of a set M and, for each refinement level j ≥ 0, the collection of pairs of measurable subsets and affine projections {(C j,k , P j,k )} Kj k=1 has the following structure. The subsets C j,k ⊂ R D for k = 1, . . . , K j form a partition of R D , i.e., they are pairwise disjoint and their union is R D . The affine projectors are defined by where the minimum is taken over all linear spaces V of dimension d. From now on we will assume uniqueness of these subspaces V j,k . To point out parallels to the axiomatic GMRA definition, think of Π being supported on the tube of a d-dimensional manifold. The axiomatic centers c j,k are then considered to be approximately equal to the conditional means c j,k of some cells C j,k partitioning the space, and the corresponding affine projection spaces P j,k are spanned by eigenvectors of the d leading eigenvalues of the conditional covariance matrix Defined in this way, the P j correspond to projectors onto the GMRA approximations M j introduced above if c j,k = c j,k . From [31] we adopt the following assumptions on the entities defined above, and hence, on the distribution Π. From now on we suppose that for all integers j min ≤ j ≤ j max (A1)-(A4) (see Table 1) hold true.
Remark D.1. Assumption (A1) ensures that each partition element contains a reasonable amount of Πmass. Assumption (A2) guarantees that all samples from Π j,k will lie close to its expection/center. As a result, each c j,k must be somewhat geometrically central within C j,k . Together, (A1) and (A2) have the combined effect of ensuring that the probability mass of Π is somewhat equally distributed onto the different sets C j,k , i.e., the number of points in each set C j,k is approximately the same, at each scale j. The third and fourth assumptions (A3) and (A4) essentially constrain the geometry of the support of Π to being effectively d-dimensional and somewhat regular (e.g., close to a smooth d-dimensional submanifold of R D ). We refer the reader to [31] for more detailed information regarding these assumptions.
An important class of probability measures Π fulfilling (A1)-(A4) is presented in [31]. For the sake of completeness we repeat it here and also discuss a method of constructing the partitions {C jk } Kj k=1 from such probabilities measures. From here on let M be a smooth d-dimensional submanifold of S D−1 ⊂ R D . Let U K denote the uniform distribution on a given set K. We have the following definition.
(A3) Denote the eigenvalues of the covariance matrix Σ j,k by λ j,k 1 ≥ · · · ≥ λ j,k D ≥ 0. Then there exists σ = σ(Π) ≥ 0, θ 3 = θ 3 (Π), θ 4 = θ 4 (Π) > 0, and some α > 0 such that for all k = 1, . . . , K j , (A4) There exists θ 5 = θ 5 (Π) such that  Let us now discuss the construction of suitable partitions {C jk } by making use of cover trees. A cover tree T on a finite set of samples S ⊂ M is a hierarchy of levels with the starting level containing the root point and the last level containing every point in S. To every level a set of nodes is assigned which is associated with a subset of points in S. To be precise, given a set S of n distinct points in some metric space (X, d X ). A cover tree T on S is a sequence of subsets T i ⊂ S, i = 0, 1, . . . that satisfies the following, see [8]: (i) Nesting: T i ⊆ T i+1 , i.e., once a point appears in T i it is in every T j for j ≥ i.
(ii) Covering: For every x ∈ T i+1 there exists exactly one y ∈ T i such that d X (x, y) ≤ 2 −i . Here y is called the parent of x.
(iii) Separation: For all distinct points x, y ∈ T i , d X (x, y) > 2 −i .
The set T i denotes the set of points in S associated with nodes at level i. Note that there exists N ∈ N such that T i = S for all i ≥ N . Herein we will presume that S is large enough to contain an -cover of M for > 0 sufficiently small.
Moreover, the axioms characterizing cover trees are strongly connected to the dyadic structure of GMRA. For a given cover tree (for construction see [8]) on a set X n = {X 1 , . . . X n } of i.i.d. samples from the distribution Π with respect to the Euclidean distance let a j,k for k = 1, . . . , K j be the elements of the jth level of the cover tree, i.e. T j = {a j,k } Kj k=1 and define With this a partition of R D into Voronoi regions can be defined. Maggioni et. al. showed in [31,Theorem 7] that by this construction all assumptions (A1)-(A4) can be fulfilled. The question arises if the properties of the axiomatic definition of GMRA in Definition 2.1 are equally met. As only parts of the axioms are relevant for our analysis, we refrain from giving rigorous justification for all properties.
1. GMRA property (1) holds by construction if the matrices Φ j,k are defined, s.t. Φ T j,k Φ j,k = P V j,k along with any reasonable choice of centers c j,k .
2. The dyadic structure axioms (2a) -(2c) also hold as a trivial consequence of the cover tree properties (i) -(iii) above if the axiomatic centers c j,k are chosen to be the elements of the cover tree set T j (i.e., the a j,k elements). By the (ρ, σ)-model assumption samples drawn from Π will have a quite uniform distribution all over supp(Π). Hence, the probabilistic centers c j,k of each C j,k -set will also tend to be close to the axiomatic centers c j,k = a j,k proposed here for small σ (see, e.g., assumption (A2) above).
3. One can deduce GMRA property (3a) from the fact that our chosen centers a j,k belong to M if supp(Π) = M (or to a small tube around M if σ is small).
4. The first part of (3b) is implied by (A4) with the uniform constant θ 5 for all x ∈ M if a j,k is sufficiently close to c j,k . To show the second part of (3b) note that where in the second last step we used our cover tree properties (recall that c j,k = a j,k ). Again, the constants C, C > 0 do not depend on the chosen x ∈ M as long as S is well chosen (e.g., contains a sufficiently fine cover of M).
Considering the GMRA axioms above we can now see that only the first part of (3b) may not hold in a satisfactory manner if we choose to set Φ T j,k Φ j,k = P V j,k and c j,k = a j,k . And, even when it doesn't hold with C z being independent of j it will still at least still hold with a worse j dependence due to assumption (A2).

D.2 Empirical GMRA
The axiomatic properties only hold above, of course, if the GMRA is constructed with knowledge of the true P V j,k -subspaces. In reality, however, this won't be the case and we are rather given some training data consisting of n samples from near/on M, X n = {X 1 , ..., X n }, which we assume to be i.i.d. with distribution Π. These samples are used to approximate the real GMRA subspaces based on Π such that the operators P j can be replaced by their estimators where {C j,k } Kj k=1 is a suitable partition of R d obtained from the data, and X j,k = C j,k ∩ X n . In other words, working with above model we have one perfect GMRA that cannot be computed (unless Π is known) but fulfills all important axiomatic properties, and an estimated GMRA that is at hand but that is only an approximation to the perfect one. Thankfully, the main results of [31] stated in Appendix E give error bounds on the difference between perfect and estimated GMRA with c j,k = c j,k ≈ c j,k ≈ a j,k that only depend on the number of samples from Π one can acquire. Following their notational convention we will denote the empirical GMRA approximation at level j, i.e., the set P j projects onto, by M j = { P j (z) : z ∈ B(0, 2)}∩B(0, 2) and the affine subspaces by P j,k = { P j,k (z) : z ∈ R D }. We again restrict the approximation to B(0, 2). The single affine spaces will be non-empty as all c j,k lie by definition close to B(0, 1) if supp(Π) is close to M, which we assume.
In the empirical setting OMS has to be slightly modified to conform to our empirical GMRA notation. Hence, (6) and (7) become Theorem E.1 states that under assumptions (A1)-(A4) the empirical GMRA approximates M as well as the perfect probabilistic one as long as the number of samples n is sufficiently large. For the proof of our main theorem we only need the following two bounds which can be deduced from (20) and (21) in [31] by setting t = 2 jd . As both appear in the proof of Theorem E.1, we state them as a corollary. The interested reader may note that n j,k appearing in the original statements can be lower bounded by θ 1 n2 −jd .
Corollary E.2. Under the assumptions of Theorem E.1 the following holds for any C 1 > 0 as long as j, α are sufficiently large and σ is sufficiently small: if n ≥ n min = 2 jd + log(max{d, 8}) min 144 Remark E.3. By Corollary E.2 with probability of at least 1 − O(2 jd exp(−2 jd )) the empirical centers c j,k of one level j have a worst case distance to the perfect centers c j,k of at most O(2 −j−2 ) if n O(2 3jd ). As a result, the empirical centers c j,k will also be at most O(2 −j−2 ) distance from their associated cover tree centers a j,k if n O(2 3jd ) by assumption (A2). The same holds true for the projectors P V j,k and P V j,k in operator norm.
The proof of Theorem 3.1 in this setting follows the same steps as in the axiomatic one. First, we give an empirical version of Lemma 4.13. Then we link x and x * as described in Section 4.2 while controlling the difference between empirical and axiomatic but unknown GMRA by Corollary E.2. The following extension of Lemma 4.3 will be regularly used. Note that we are now setting our empirical GMRA centers c j,k to be the associated mean estimates c j,k as a means of approximating the axiomatic GMRA structure we would have if we had instead chosen our centers to be the true expectations c j,k (recall Appendix D). We also implicitly assume below that there exists a constant C 1 > 0 for which the associated axiomatic GMRA properties in Section 2 hold when the centers c j,k are chosen as these true expectations c j,k and the Φ T j,k Φ j,k as P V j,k . Lemma E.6. Fix j sufficiently large. Under the assumptions of Theorem E.1 and n ≥ n min if m ≥ CC −6 1 2 6(j+1) w(M ∪ P S ( C j )) 2 the index k of the center c j,k chosen in step I of the algorithm fulfills x − c j,k 2 ≤ 16 max{ x − c j,kj (x) 2 , C 1 2 −j−1 }.
Proof : The proof will be similar to the one of Lemma 4.13. By definition we have d H (sign(A c j,k ), y) ≤ d H (sign(A c j,kj (x) ), y).
As, for all z, z ∈ S D−1 , d H (sign(Az), sign(Az )) = m · d A (z, z ), this is equivalent to d A (P S ( c j,k ), x) ≤ d A (P S ( c j,kj (x) ), x).
A union bound over both probabilities yields the result.
Having Lemma E.6 at hand we can now show a detailed version of Theorem 3.1 in this case. For convenience please first read the proof of Theorem 4.14. As above choosing ε = 2 √ j 2 −j yields Theorem 3.1 for OMSsimple with a slightly modified probability of success and slightly different dependencies on C 1 andC x in (9).
Theorem E.7. Let M ⊂ S D−1 be given by its empirical GMRA for some levels j 0 ≤ j ≤ J from samples X 1 , ..., X n for n ≥ n min (defined in Corollary E.2), such that 0 < C 1 < 2 j0+1 where C 1 is the constant from GMRA properties (2b) and (3a) for a GMRA structure constructed with centers c j,k and with the Φ T j,k Φ j,k as P V j,k . Fix j and assume that dist(0, M j ) ≥ 1/2. Further let m ≥ 64 max{C ,C}C −6 1 2 6(j+1) (w(M) + C dj) 2 .
(III) We conclude as in Theorem 4.14.