$\varepsilon$-isometric dimension reduction for incompressible subsets of $\ell_p$

Fix $p\in[1,\infty)$, $K\in(0,\infty)$ and a probability measure $\mu$. We prove that for every $n\in\mathbb{N}$, $\varepsilon\in(0,1)$ and $x_1,\ldots,x_n\in L_p(\mu)$ with $\big\| \max_{i\in\{1,\ldots,n\}} |x_i| \big\|_{L_p(\mu)} \leq K$, there exists $d\leq \frac{32e^2 (2K)^{2p}\log n}{\varepsilon^2}$ and vectors $y_1,\ldots, y_n \in \ell_p^d$ such that $$\forall \ i,j\in\{1,\ldots,n\}, \qquad \|x_i-x_j\|^p_{L_p(\mu)}- \varepsilon \leq \|y_i-y_j\|_{\ell_p^d}^p \leq \|x_i-x_j\|^p_{L_p(\mu)}+\varepsilon.$$ Moreover, the argument implies the existence of a greedy algorithm which outputs $\{y_i\}_{i=1}^n$ after receiving $\{x_i\}_{i=1}^n$ as input. The proof relies on a derandomized version of Maurey's empirical method (1981) combined with a combinatorial idea of Ball (1990) and classical factorization theory of $L_p(\mu)$ spaces. Motivated by the above embedding, we introduce the notion of $\varepsilon$-isometric dimension reduction of the unit ball ${\bf B}_E$ of a normed space $(E,\|\cdot\|_E)$ and we prove that ${\bf B}_{\ell_p}$ does not admit $\varepsilon$-isometric dimension reduction by linear operators for any value of $p\neq2$.

1. Introduction 1.1.Metric dimension reduction.Using standard terminology from metric embeddings (see [Ost13]), we say that a mapping between metric spaces f : (M, d M ) → (N, d N ) is a bi-Lipschitz embedding with distortion at most α ∈ [1, ∞) if there exists a scaling factor σ ∈ (0, ∞) such that (2) The classical Johnson-Lindenstrauss lemma [JL84] asserts that if (H, • H ) is a Hilbert space and x 1 , . . ., x n ∈ H, then for every ε ∈ (0, 1) there exist d ≤ and y 1 , . . ., y n ∈ d 2 such that ∀ i, j ∈ {1, . . ., n}, where C ∈ (0, ∞) is a universal constant.In the above embedding terminology, the Johnson-Lindenstrauss lemma states that for every ε ∈ (0, 1), n ∈ N and d ≥ C log n ε 2 , any n-point subset of Hilbert space admits a bi-Lipschitz embedding into d 2 with distortion at most 1 + ε.In order to prove their result, Johnson and Lindenstrauss introduced in [JL84] the influential random projection method that has since had many important applicatons in metric geometry and theoretical computer science and kickstarted the field of metric dimension reduction (see the recent survey [Nao18] of Naor) which lies at the intersection of those two subjects.
Following [Nao18], we say that an infinite dimensional Banach space (E, • E ) admits bi-Lipschitz dimension reduction if there exists α = α(E) ∈ [1, ∞) such that for every n ∈ N, there exists k n = k n (E, α) ∈ N satisfying lim n→∞ log k n log n = 0 (4) The author was supported by a Junior Research Fellowship from Trinity College, Cambridge.A conference version of this article will be presented in SoCG 2022.and such that any n-point subset S of E admits a bi-Lipschitz embedding with distortion at most α in a finite-dimensional linear subspace F of E with dimF ≤ k n .The only non-Hilbertian space that is known to admit bi-Lipschitz dimension reduction is the 2-convexification of the classical Tsirelson space, as proven by Johnson and Naor in [JN10].Turning to negative results, Matoušek proved in [Mat96] the impossibility of bi-Lipschitz dimension reduction in ∞ , whereas Brinkman and Charikar [BC05] (see also [LN04] for a shorter proof) constructed an n-point subset of 1 which does not admit a bi-Lipschitz embedding into any n o(1) -dimensional subspace of 1 .Their theorem was recently refined by Naor, Pisier and Schechtman [NPS20] who showed that the same n-point subset of 1 does not embed into any n o(1) -dimensional subspace of the trace class S 1 (see also the striking recent work [RV20] of Regev and Vidick, where the impossibility of polynomial almost isometric dimension reduction in S 1 is established).We refer to [Nao18,Theorem 16] for a summary of the best known bounds quantifying the aforementioned qualitative statements.Despite the lapse of almost four decades since the proof of the Johnson-Lindenstrauss lemma, the following natural question remains stubbornly open.
Question 1.For which values of p {1, 2, ∞} does p admit bi-Lipschitz dimension reduction?1.2.Dimensionality and structure.An important feature of the formalism of bi-Lipschitz dimension reduction in a Banach space E is that both the distortion α(E) of the embedding and the dimension k n (E, α) of the target subspace F are independent of the given n-point subset S of E. Nevertheless, there are instances in which one can construct delicate embeddings whose distortion or the dimension of their targets depends on subtle geometric parameters of S. For instance, we mention an important theorem of Schechtman [Sch06, Theorem 5] (which built on work of Klartag and Mendelson [KM05]) who constructed a linear embedding of an arbitrary subset S of 2 into any Banach space E whose distortion depends only on the Gaussian width of S and the -norm of the identity operator id E : E → E. In the special case that E is a Hilbert space, a substantially richer family of such embeddings was devised in [LMPV17].
Let µ be a probability measure.For a subset S of L p (µ), we shall denote (5) and we will say that S is K-incompressible 1 if I(S) ≤ K.The main contribution of the present paper is the following dimensionality reduction theorem for incompressible subsets of L p (µ) which, in contrast to all the results discussed earlier, is valid for any value of p ∈ [1, ∞).
Theorem 2 (ε-isometric dimension reduction for incompressible subsets of L p (µ)).Fix parameters p ∈ [1, ∞), n ∈ N, K ∈ (0, ∞) and let {x i } n i=1 be a K-incompressible family of vectors in L p (µ) for some probability measure µ.Then for every ε ∈ (0, 1), there exists d ∈ N with d ≤ 32e 2 (2K) 2p log n ε 2 and points y 1 , . . ., y n ∈ d p such that ∀ i, j ∈ {1, . . ., n}, Besides the appearance of the incompressibility parameter K in the bound for the dimension d of the target space, Theorem 2 differs from the Johnson-Lindenstrauss lemma in that the error in (6) is additive rather than multiplicative.Recall that a map between metric spaces Embeddings with additive errors occur naturally in metric geometry and, more specifically, in metric dimension reduction (see e.g.[Ver18, Section 9.3]).We mention for instance a result [PV14, Theorem 1.5] of Plan and Vershynin who showed that any subset S of the unit sphere in n 2 admits a δ-isometric embedding into the d-dimensional Hamming cube ({−1, 1} d , • 1 ), where d depends polynomially on δ −1 and the Gaussian width of S. In the above embedding terminology and in view of the elementary inequality |α −β| ≤ |α p −β p | 1/p which holds for every α, β > 0, Theorem 2 asserts that any n-point K-incompressible subset of L p (µ) admits an ε 1/pisometric embedding into d p for the above choice of dimension d.For further occurences of ε-isometric embeddings in the dimensionality reduction and compressed sensing literatures, we refer to [PV14, Jac15, Jac17, LMPV17, Ver18, BG19] and the references therein.
1.3.Method of proof.A large part of the (vast) literature on metric dimension reduction focuses on showing that a typical low-rank linear operator chosen randomly from a specific ensemble acts as an approximate isometry on a given set S with high probability.For subsets S of Euclidean space, this principle has been confirmed for random projections [JL84, FM88, DG03, Nao18], matrices with Gaussian [Gor88, IM99, Sch06], Rademacher [AV99,Ach03] and subgaussian [KM05, IN07, Dir16, LMPV17] entries, randomizations of matrices with the RIP [KW11] as well as more computationally efficient models [Mat08, AC09, AL13, KN14, BDN15] which are based on sparse matrices.Beyond its inherent interest as an p -dimension reduction theorem (albeit, for specific configurations of points), Theorem 2 also differs from the aforementioned works in its method of proof.The core of the argument, rather than sampling from a random matrix ensemble, relies on Maurey's empirical method [Pis81] (see Section 2.1) which is a dimension-free way to approximate points in bounded convex subsets of Banach spaces by convex combinations of extreme points with prescribed length.An application of the method to the positive cone of L p -distance matrices (the use of which in this context is inspired by classical work of Ball [Bal90]) equipped with the supremum norm allows us to deduce (see Proposition 7) the conclusion of Theorem 2 under the stronger assumption that (8) While Maurey's empirical method is an a priori existential statement that is proven via the probabilistic method, recent works (see [Bar18,Iva21]) have focused on derandomizing its proof for specific Banach spaces.In the setting of Theorem 2, we can use these tools to show (see Corollary 13) that there exists a greedy algorithm which receives as input the highdimensional data {x i } n i=1 and produces as output the low-dimensional points {y i } n i=1 .Finally, using a suitable change of measure [Mau74] (see Section 2.3) we are able to relax the stronger assumption (8) to that of K-incompressibility and derive the conclusion of Theorem 2. Finally, we emphasize that, in contrast to most of the dimension reduction algorithms (randomized or not) discussed earlier, the one which gives Theorem 2 is not oblivious but is rather tailored to the specific configuration of points {x i } n i=1 as it relies on the use of Maurey's empirical method.
1.4.ε-isometric dimension reduction.Given two moduli ω, Ω : [0, ∞) → [0, ∞), we say (following [Nao18]) that a Banach space (E, • E ) admits metric dimension reduction with moduli (ω, Ω) if for any n ∈ N there exists as n → ∞ such that for any x 1 , . . ., x n ∈ E, there exists a subspace F of E with dimF ≤ k n and y 1 , . . ., y n ∈ F satisfying In view of Theorem 2, we would be interested in formulating a suitable notion of dimension reduction via ε-isometric embeddings which would be fitting to the moduli appearing in (6).
For any η ∈ (0, 1), we can then choose s large enough (as a function of η and the x i ) such that Therefore, we conclude that E also admits bi-Lipschitz dimension reduction (with distortion b/a).
This simple scaling argument suggests that any reasonable notion of ε-isometric dimension reduction can differ from the corresponding bi-Lipschitz theory only in small scales, thus motivating the following definition.We denote by B E the unit ball of a normed space (E, • E ).
Definition 4 (ε-isometric dimension reduction).Fix ε ∈ (0, 1), r ∈ (0, ∞) and let (E, • E ) be an infinite-dimensional Banach space.We say that B E admits ε-isometric dimension reduction with power r if for every n ∈ N there exists as n → ∞ for which the following condition holds.For every n points x 1 , . . ., x n ∈ B E there exists a linear subspace F of E with dimF ≤ k n and points y 1 , . . ., y n ∈ F satisfying The fact that the whole space 2 admits ε-isometric dimension reduction with r = 1 and corresponding target dimension k 1 n ( 2 , ε) log n ε 2 follows from the additive version of the Johnson-Lindenstrauss lemma, first proven by Liaw, Mehrabian, Plan and Vershynin [LMPV17] (see also [Ver18, Proposition 9.3.2]).In Corollary 9 we obtain the same conclusion for its unit ball B 2 with a slightly weaker bound for the target dimension using our Theorem 2.
It is clear from the definitions that if a Banach space E admits bi-Lipschitz dimension reduction with distortion 1+ε 1−ε , where ε ∈ (0, 1), then B E admits 2ε-isometric dimension reduction with power r = 1.The ε-isometric analogue of Question 1 deserves further investigation.

Question 5. For which values of p 2 does B p admit ε-isometric dimension reduction?
Even though the K-incompressibility assumption of Theorem 2 may a priori seem restrictive, it is satisfied for most configurations of points in B p .Suppose that n, N ∈ N such that N is polynomial2 in n.Then, standard considerations (see Remark 10) show that with high probability, a uniformly chosen n-point subset S of N 1/p B N p is O(log n) 1/p -incompressible.
1.5.ε-isometric dimension reduction by linear maps.A close inspection of the proof of Theorem 2 (see Remark 12) reveals that in fact the low-dimensional points {y i } n i=1 can be realized as images of the initial data {x i } n i=1 under a carefully chosen linear operator.Nevertheless, we will show that for any p 2 and n large enough, there exist an n-point subset of B p whose image under any fixed linear ε-isometric embedding has rank which is linear in n.In fact, we shall prove the following more general statement which refines a theorem that Lee, Mendel and Naor proved in [LMN05] for bi-Lipschitz embeddings.
Theorem 6 (Impossibility of linear dimension reduction in B p ). Fix p 2 and two moduli ω, Ω : [0, ∞) → [0, ∞) with ω(1) > 0. For arbitrarily large n ∈ N, there exists an n-point subset S n,p of B p such that the following holds.If T : span(S n,p ) → d p is a linear operator satisfying

Proof of Theorem 2
We say that a normed space (E, • E ) has Rademacher type p if there exists a universal constant T ∈ (0, ∞) such that for every n ∈ N and every x 1 , . . ., x n ∈ E, (15) The least constant T such that (15) is satisfied is denoted by T p (E).A standard symmetrization argument (see [LT91, Proposition 9.11]) shows that if X 1 , . . ., X n are independent E-valued random variables with E[X i ] = 0 for every i ∈ {1, . . ., n}, then Since X takes values in T, if T ⊆ RB E , we then deduce that there exist x 1 , . . ., x d ∈ T such that While the above argument is probabilistic, recent works have focused on derandomizing Maurey's sampling lemma for smaller classes of Banach spaces, thus constructing deterministic algorithms which output the empirical approximation x 1 +...+x d d of z.The first result in this direction is due to Barman [Bar18] who treated the case that E is an L r (µ)-space, r ∈ (1, ∞).This assumption was recently generalized by Ivanov in [Iva21] who built a greedy algorithm which constructs the desired empirical mean in an arbitrary p-uniformly smooth space.2.2.Dimension reduction in L p (µ) for uniformly bounded vectors.With Maurey's empirical method at hand, we are ready to proceed to the first part of the proof of Theorem 2, namely the ε-isometric dimension reduction property of L p (µ) under the strong assumption that the given point set consists of functions which are bounded in L ∞ (µ).
Without loss of generality we will assume that the given points x 1 , . . ., x n ∈ L p (µ) are simple functions with x i L ∞ (µ) ≤ L. Let {S 1 , . . ., S m } be a partition of the underlying measure space such that each x i is constant on each S k and suppose that and k ∈ {1, . . ., m}.Then, for every i, j ∈ {1, . . ., n}, we have where y(k) def = (a(1, k), . . ., a(n, k)) ∈ L p (µ) n is a vector whose components are constant functions.As µ is a probability measure and {S 1 , . . ., S m } is a partition, identity (25) implies that Observe that since a(i, k) ∈ [−L, L] for every i ∈ {1, . . ., n} and k ∈ {1, . . ., m}, we have ∞ is e-isomorphic to p − 1 for every p ≥ 2 and thus Applying Maurey's sampling lemma (Section 2.1) while taking into account ( 27) and (28), we deduce that for every d ≥ 1 there exist k 1 , . . ., k d ∈ {1, . . ., m} such that we then have e Finally, consider for each i ∈ {1, . . ., n} a vector y i = (y i (1), . . ., y i (d)) ∈ d p given by and notice that (30) can be equivalently rewritten as concluding the proof of the proposition.
Remark 8.It is worth emphasizing that the coordinates of the vectors y 1 , . . ., y n produced in Proposition 7 consist (up to rescaling) of values of the functions x 1 , . . ., x n .Such low-dimensional embeddings via sampling are a central object of study in approximation theory, see e.g. the recent survey [KKLT21] and the references therein.
The additive version of the Johnson-Lindenstrauss lemma, first observed in [LMPV17] as a consequence of a deep matrix deviation inequality (see also [Ver18,Chapter 9]), asserts that for every n points x 1 , . . ., x n in a Hilbert space H and every ε ∈ (0, 1), there exists d ≤ C log n ε 2 and points y 1 , . . ., y n ∈ d 2 such that ∀ i, j ∈ {1, . . ., n}, where C ∈ (0, ∞) is a universal constant.We will now observe that the spherical symmetry of B 2 allows us to deduce a similar conclusion for points in B H by removing the incompressibility assumption from Proposition 7 when p = 2.We shall use the standard notation L N p for the space L p (µ N ) where µ N is the normalized counting measure on the finite set {1, . . ., N }, that is Observe that for 0 < p < q ≤ ∞, we have B L N q ⊆ B L N p .Corollary 9.There exists a universal constant C ∈ (0, ∞) such that the following statement holds.Fix n ∈ N and let {x i } n i=1 be a family of vectors in B H for some Hilbert space H. Then for every ε ∈ (0, 1), there exists d ∈ N with d ≤ C(log n) 3  ε 4 and points y 1 , . . ., y n ∈ d 2 such that ∀ i, j ∈ {1, . . ., n}, Before proceeding to the derivation of (35) we emphasize that since the given points {x i } n i=1 belong in B H , Corollary 9 is formally weaker than the Johnson-Lindenstrauss lemma.However we include it here since it differs from [JL84] in that the low-dimensional point set {y i } n i=1 is not obtained as an image of {x i } n i=1 under a typical low-rank matrix from a specific ensemble.Proof of Corollary 9. Since any n-point subset {x 1 , . . ., x n } of H embeds linearly and isometrically in L n 2 , we assume that x 1 , . . ., x n ∈ B L n 2 .We will need the following claim.Claim.Suppose that X 1 , . . ., X n are (not necessarily independent) random vectors, each uniformly distributed on the unit sphere S n−1 of L n 2 .Then, for some universal constant S ∈ (0, ∞), Proof of the Claim.By a standard estimate of Schechtman and Zinn [SZ90, Theorem 3], for a uniformly distributed random vector X on the unit sphere S n−1 of L n 2 , we have for some absolute constants γ 1 , γ 2 ∈ (0, ∞).Let W def = max i∈{1,...,n} X i L n ∞ and notice that By the union bound, we have Combining ( 38) and (39), we therefore get Choosing K > γ 1 such that K 2 γ 2 > 1, the exponent in the last integrand becomes negative, thus for a large enough constant S ∈ (0, ∞) and the claim follows.Remark 10.Fix p ∈ [1, ∞).The isometric embedding theorem of Ball [Bal90] asserts that any npoint subset of p admits an isometric embedding into N p where N = n 2 +1.Suppose, more generally, that n, N ∈ N are such that N is polynomial in n.Considerations in the spirit of the proof of Corollary 9 (e.g.relying on [SZ90]) then show that if x 1 , . . ., x n are independent uniformly random points in B L N p , then the random set {x 1 , . . ., x n } is O(log n) 1/p -incompressible.In other words, incompressibility is a generic property of random n-point subsets of B L N p .On the other hand, a typical n-point subset of B L N p is known to be approximately a simplex due to work of Arias-de-Reyna, Ball and Villa [AdRBV98] and so, in particular, it can be bi-Lipschitzly embedded in O(log n) dimensions.
2.3.Factorization and proof of Theorem 2. Observe that Proposition 7 is rather non-canonical as the conclusion depends on the pairwise distances between the points {x i } n i=1 in L p (µ) whereas the bound on the dimension depends on L = max i x i L ∞ (µ) .In order to deduce Theorem 2 from this (a priori weaker) statement we shall leverage the fact that Proposition 7 holds for any probability measure µ by optimizing this parameter L over all lattice-isomorphic images of {x i } n i=1 .The optimal such change of measure which allows us to replace L by max i |x i | L p (µ) is a special case of a classical factorization theorem of Maurey (see [Mau74] or [JS01, Theorem 5] for the general statement), whose short proof we include for completeness.
[Pis81]aurey's empirical method and its algorithmic counterparts.A classical theorem of Carathéodory asserts than if T is a subset of R m , then any point z ∈ conv(T) can be expressed as a convex combination of at most m + 1 points of T. Maurey's empirical method is a powerful dimension-free approximate version of Carathéodory's theorem, first popularized in[Pis81], that has numerous applications in geometry and theoretical computer science.Let (E, • E ) be a Banach space, consider a bounded subset T of E and fix z ∈ conv(T).Since z is a convex combination of elements of T, there exists m ∈ N, λ 1 , . .., λ m ∈ (0, ∞) and t 1 , . .., t m ∈ T such that Let X be an E-valued discrete random variable with P{X = t k } = λ k for all k ∈ {1, . .., m} and consider X 1 , . .., X d i.i.d.copies of X.Then, conditions (17) ensure that X is well defined and E[X] = z.Therefore, applying the Rademacher type condition (16) to the centered random variables {X s − z} d s=1 and normalizing, we getE 1 d d s=1 Now let U ∈ O(n) be a uniformly chosen random rotation on R n .The aforementioned claim shows that since x i L n 2 ≤ 1 for every i ∈ {1, . . ., n}, writing xi = x i Therefore, by (42) and Proposition 7 there exists a constant C ∈ (0, ∞) and a rotation U ∈ O(n) such that for every ε ∈ (0, 1) there exists d ≤C(log n) 3 ε 4and points y 1 , . . ., y n ∈ d 2 for which