1 Introduction

1.1 Metric Dimension Reduction

Using standard terminology from metric embeddings (see [38]), we say that a mapping between metric spaces is a bi-Lipschitz embedding with distortion at most \(\alpha \in [1,\infty )\) if there exists a scaling factor \(\sigma \in (0,\infty )\) such that

(1)

Throughout this paper, we shall denote by \(\ell _p^d\) the linear space \(\mathbb {R}^d\) equipped with the p-norm,

$$\begin{aligned} {\forall }a=(a_1,\ldots ,a_d)\in \mathbb {R}^d, \qquad \Vert a\Vert _{\ell _p^d} = \Big ( \sum _{i=1}^d |a_i|^p\Big )^{1/p}. \end{aligned}$$
(2)

The classical Johnson–Lindenstrauss lemma [21] asserts that if \((\mathcal {H},\Vert \cdot \Vert _{\mathcal {H}})\) is a Hilbert space and \(x_1,\ldots ,x_n\in \mathcal {H}\), then for every \(\varepsilon \in (0,1)\) there exists \(d\le \tfrac{C\log n}{\varepsilon ^2}\) and \(y_1,\ldots ,y_n\in \ell _2^d\) such that

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\},\qquad \Vert x_i-x_j\Vert _{\mathcal {H}} \le \Vert y_i-y_j\Vert _{\ell _2^d} \le (1+\varepsilon )\cdot \Vert x_i-x_j\Vert _\mathcal {H}, \end{aligned}$$
(3)

where \(C\in (0,\infty )\) is a universal constant. In the above embedding terminology, the Johnson–Lindenstrauss lemma states that for every \(\varepsilon \in (0,1)\), \(n\in \mathbb {N}\), and \(d\ge \tfrac{C\log n}{\varepsilon ^2}\), any n-point subset of Hilbert space admits a bi-Lipschitz embedding into \(\ell _2^d\) with distortion at most \(1+\varepsilon \). In order to prove their result, Johnson and Lindenstrauss introduced in [21] the influential random projection method that has since had many important applications in metric geometry and theoretical computer science and kickstarted the field of metric dimension reduction (see the recent survey [36] of Naor) which lies at the intersection of those two subjects.

Following [36], we say that an infinite dimensional Banach space \((E,\Vert \cdot \Vert _E)\) admits bi-Lipschitz dimension reduction if there exists \(\alpha = \alpha (E)\in [1,\infty )\) such that for every \(n\in \mathbb {N}\), there exists \(k_n=k_n(E,\alpha )\in \mathbb {N}\) satisfying

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\log k_n}{\log n} = 0 \end{aligned}$$
(4)

and such that any n-point subset \(\mathcal {S}\) of E admits a bi-Lipschitz embedding with distortion at most \(\alpha \) in a finite-dimensional linear subspace F of E with \(\textrm{dim}F\le k_n\). The only non-Hilbertian space that is known to admit bi-Lipschitz dimension reduction is the 2-convexification of the classical Tsirelson space, as proven by Johnson and Naor in [22]. Turning to negative results, Matoušek proved in [32] the impossibility of bi-Lipschitz dimension reduction in \(\ell _\infty \), whereas Brinkman and Charikar [10] (see also [30] for a shorter proof) constructed an n-point subset of \(\ell _1\) which does not admit a bi-Lipschitz embedding into any \(n^{o(1)}\)-dimensional subspace of \(\ell _1\). Their theorem was recently refined by Naor et al. [37] who showed that the same n-point subset of \(\ell _1\) does not embed into any \(n^{o(1)}\)-dimensional subspace of the trace class \(\textsf{S}_1\) (see also the striking recent work [41] of Regev and Vidick, where the impossibility of polynomial almost isometric dimension reduction in \(\textsf{S}_1\) is established). We refer to [36, Thm. 16] for a summary of the best known bounds quantifying the aforementioned qualitative statements. Despite the lapse of almost four decades since the proof of the Johnson–Lindenstrauss lemma, the following natural question remains stubbornly open.

Question 1.1

For which values of \(p\notin \{1,2,\infty \}\) does \(\ell _p\) admit bi-Lipschitz dimension reduction?

1.2 Dimensionality and Structure

An important feature of the formalism of bi-Lipschitz dimension reduction in a Banach space E is that both the distortion \(\alpha (E)\) of the embedding and the dimension \(k_n(E,\alpha )\) of the target subspace F are independent of the given n-point subset \(\mathcal {S}\) of E. Nevertheless, there are instances in which one can construct delicate embeddings whose distortion or the dimension of their targets depends on subtle geometric parameters of \(\mathcal {S}\). For instance, we mention an important theorem of Schechtman [42, Thm. 5] (which built on work of Klartag and Mendelson [26]) who constructed a linear embedding of an arbitrary subset \(\mathcal {S}\) of \(\ell _2\) into any Banach space E whose distortion depends only on the Gaussian width of \(\mathcal {S}\) and the \(\ell \)-norm of the identity operator \(\textsf{id}_E:E\rightarrow E\). In the special case that E is a Hilbert space, a substantially richer family of such embeddings was devised in [31].

Let \(\mu \) be a probability measure. For a subset \(\mathcal {S}\) of \(L_p(\mu )\), we shall denote

$$\begin{aligned} \mathcal {I}(\mathcal {S}) {\mathop {=}\limits ^{\textrm{def}}}\big \Vert \max _{x\in \mathcal {S}}|x|\big \Vert _{L_p(\mu )} \end{aligned}$$
(5)

and we will say that \(\mathcal {S}\) is K-incompressibleFootnote 1 if \(\mathcal {I}(\mathcal {S})\le K\). The main contribution of the present paper is the following dimensionality reduction theorem for incompressible subsets of \(L_p(\mu )\) which, in contrast to all the results discussed earlier, is valid for any value of \(p\in [1,\infty )\).

Theorem 1.2

(\(\varepsilon \)-isometric dimension reduction for incompressible subsets of \(L_p(\mu )\)) Fix parameters \(p\in [1,\infty )\), \(n\in \mathbb {N}\), \(K\in (0,\infty )\) and let \(\{x_i\}_{i=1}^n\) be a K-incompressible family of vectors in \(L_p(\mu )\) for some probability measure \(\mu \). Then for every \(\varepsilon \in (0,1)\), there exist \(d\in \mathbb {N}\) with \(d\le \tfrac{32e^2(2K)^{2p}\log n}{\varepsilon ^2}\) and points \(y_1,\ldots ,y_n\in \ell _p^d\) such that

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \quad \Vert x_i-x_j\Vert ^p_{L_p(\mu )}- \varepsilon \le \Vert y_i-y_j\Vert _{\ell _p^d}^p \le \Vert x_i-x_j\Vert ^p_{L_p(\mu )}+\varepsilon . \end{aligned}$$
(6)

Besides the appearance of the incompressibility parameter K in the bound for the dimension d of the target space, Theorem 1.2 differs from the Johnson–Lindenstrauss lemma in that the error in (6) is additive rather than multiplicative. Recall that a map between metric spaces is called an \(\varepsilon \)-isometric embedding if

(7)

Embeddings with additive errors occur naturally in metric geometry and, more specifically, in metric dimension reduction (see e.g. [44, Sect. 9.3]). We mention for instance a result [40, Thm. 1.5] of Plan and Vershynin who showed that any subset \(\mathcal {S}\) of the unit sphere in \(\ell _2^n\) admits a \(\delta \)-isometric embedding into the d-dimensional Hamming cube \((\{-1,1\}^d,\Vert \cdot \Vert _1)\), where d depends polynomially on \(\delta ^{-1}\) and the Gaussian width of \(\mathcal {S}\). In the above embedding terminology and in view of the elementary inequality \(|\alpha -\beta | \le |\alpha ^p-\beta ^p|^{1/p}\) which holds for every \(\alpha ,\beta >0\), Theorem 1.2 asserts that any n-point K-incompressible subset of \(L_p(\mu )\) admits an \(\varepsilon ^{1/p}\)-isometric embedding into \(\ell _p^d\) for the above choice of dimension d. For further occurrences of \(\varepsilon \)-isometric embeddings in the dimensionality reduction and compressed sensing literatures, we refer to [8, 19, 20, 31, 40, 44] and the references therein.

1.3 Method of Proof

A large part of the (vast) literature on metric dimension reduction focuses on showing that a typical low-rank linear operator chosen randomly from a specific ensemble acts as an approximate isometry on a given set \(\mathcal {S}\) with high probability. For subsets \(\mathcal {S}\) of Euclidean space, this principle has been confirmed for random projections [12, 14, 21, 36], matrices with Gaussian [15, 16, 42], Rademacher [1, 5], and subgaussian [13, 17, 26, 31] entries, randomizations of matrices with the RIP [27] as well as more computationally efficient models [2, 3, 9, 24, 33] which are based on sparse matrices. Beyond its inherent interest as an \(\ell _p\)-dimension reduction theorem (albeit, for specific configurations of points), Theorem 1.2 also differs from the aforementioned works in its method of proof. The core of the argument, rather than sampling from a random matrix ensemble, relies on Maurey’s empirical method [39] (see Sect. 2.1) which is a dimension-free way to approximate points in bounded convex subsets of Banach spaces by convex combinations of extreme points with prescribed length. An application of the method to the positive cone of \(L_p\)-distance matrices (the use of which in this context is inspired by classical work of Ball [6]) equipped with the supremum norm allows us to deduce (see Proposition 2.1) the conclusion of Theorem 1.2 under the stronger assumption that

$$\begin{aligned} K\ge \max _{i\in \{1,\ldots ,n\}} \Vert x_i\Vert _{L_\infty (\mu )}. \end{aligned}$$
(8)

While Maurey’s empirical method is an a priori existential statement that is proven via the probabilistic method, recent works (see [7, 18]) have focused on derandomizing its proof for specific Banach spaces. In the setting of Theorem 1.2, we can use these tools to show (see Corollary 2.7) that there exists a greedy algorithm which receives as input the high-dimensional data \(\{x_i\}_{i=1}^n\) and produces as output the low-dimensional points \(\{y_i\}_{i=1}^n\). Finally, using a suitable change of measure [34] (see Sect. 2.3) we are able to relax the stronger assumption (8) to that of K-incompressibility and derive the conclusion of Theorem 1.2. Finally, we emphasize that, in contrast to most of the dimension reduction algorithms (randomized or not) discussed earlier, the one which gives Theorem 1.2 is not oblivious but is rather tailored to the specific configuration of points \(\{x_i\}_{i=1}^n\) as it relies on the use of Maurey’s empirical method.

1.4 \(\varepsilon \)-Isometric Dimension Reduction

Given two moduli \(\omega ,\Omega :[0,\infty )\rightarrow [0,\infty )\), we say (following [36]) that a Banach space \((E,\Vert \cdot \Vert _E)\) admits metric dimension reduction with moduli \((\omega ,\Omega )\) if for any \(n\in \mathbb {N}\) there exists \(k_n=k_n(E)\in \mathbb {N}\) with \(k_n=n^{o(1)}\) as \(n\rightarrow \infty \) such that for any \(x_1,\ldots ,x_n\in E\), there exist a subspace F of E with \(\textrm{dim}F\le k_n\) and \(y_1,\ldots ,y_n \in F\) satisfying

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \quad \omega (\Vert x_i-x_j\Vert _E) \le \Vert y_i-y_j\Vert _E \le \Omega (\Vert x_i-x_j\Vert _E). \end{aligned}$$
(9)

In view of Theorem 1.2, we would be interested in formulating a suitable notion of dimension reduction via \(\varepsilon \)-isometric embeddings which would be fitting to the moduli appearing in (6).

Remark 1.3

Let \(a,b\in (0,\infty )\), suppose that \(\omega ,\Omega :[0,\infty )\rightarrow [0,\infty )\) are two moduli satisfying

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{\omega (t)}{t} = a \qquad \text{ and } \qquad \lim _{t\rightarrow \infty } \frac{\Omega (t)}{t}=b \end{aligned}$$
(10)

and that the Banach space \((E,\Vert \cdot \Vert _E)\) admits metric dimension reduction with moduli \((\omega ,\Omega )\). Fix \(n\in \mathbb {N}\) and \(x_1,\ldots ,x_n\in E\). Applying the assumption (9) to the points \(sx_1,\ldots ,sx_n\) where \(s>\!\!\!>1\), we deduce that there exist points \(y_1(s),\ldots ,y_n(s)\) in a \(k_n\)-dimensional subspace F(s) of E such that

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \quad \omega (s\Vert x_i-x_j\Vert _E) \le \big \Vert y_i(s)-y_j(s)\big \Vert _E \le \Omega (s\Vert x_i-x_j\Vert _E). \end{aligned}$$
(11)

For any \(\eta \in (0,1)\), we can then choose s large enough (as a function of \(\eta \) and the \(x_i\)) such that

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\},\quad (1-\eta )a\Vert x_i-x_j\Vert _E \le \frac{\Vert y_i(s)-y_j(s)\Vert _E}{s} \le (1+\eta )b\Vert x_i-x_j\Vert _E.\nonumber \\ \end{aligned}$$
(12)

Therefore, we conclude that E also admits bi-Lipschitz dimension reduction (with distortion b/a).

This simple scaling argument suggests that any reasonable notion of \(\varepsilon \)-isometric dimension reduction can differ from the corresponding bi-Lipschitz theory only in small scales, thus motivating the following definition. We denote by \({\textbf {B}}_E\) the unit ball of a normed space \((E,\Vert \cdot \Vert _E)\).

Definition 1.4

(\(\varepsilon \)-isometric dimension reduction) Fix \(\varepsilon \in (0,1)\), \(r\in (0,\infty )\) and let \((E,\Vert \cdot \Vert _E)\) be an infinite-dimensional Banach space. We say that \({\textbf {B}}_E\) admits \(\varepsilon \)-isometric dimension reduction with power r if for every \(n\in \mathbb {N}\) there exists \(k_n=k_n^r(E,\varepsilon )\in \mathbb {N}\) with \(k_n=n^{o(1)}\) as \(n\rightarrow \infty \) for which the following condition holds. For every n points \(x_1,\ldots ,x_n\in {\textbf {B}}_E\) there exist a linear subspace F of E with \(\textrm{dim}F\le k_n\) and points \(y_1,\ldots ,y_n\in F\) satisfying

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \qquad \Vert x_i-x_j\Vert _E^r - \varepsilon \le \Vert y_i-y_j\Vert _E^r \le \Vert x_i-x_j\Vert _E^r +\varepsilon . \end{aligned}$$
(13)

The fact that even high-dimensional infinite subsets of Euclidean space \(\ell _2\) may admit \(\varepsilon \)-isometric embeddings into low-dimensional subspaces follows from the additive version of the Johnson–Lindenstrauss lemma, first proven by Liaw, Mehrabian, Plan, and Vershynin [31] (see also [44, Prop. 9.3.2]). In contrast to that, combining the scaling argument of Remark 1.3 with the fact that any d-dimensional subspace of \(\ell _2\) is isometric to \(\ell _2^d\), we deduce that if \(k_n(\varepsilon )\) is the least dimension such that any n points in \(\ell _2\) embed \(\varepsilon \)-isometrically in \(\ell _2^{k_n(\varepsilon )}\), then \(k_n(\varepsilon )= n-1\). This justifies the restriction of Definition 1.4 to the unit ball \({\textbf {B}}_E\) of E.

It is clear from the definitions that if a Banach space E admits bi-Lipschitz dimension reduction with distortion \(\tfrac{1+\varepsilon }{1-\varepsilon }\), where \(\varepsilon \in (0,1)\), then \({\textbf {B}}_E\) admits \(2\varepsilon \)-isometric dimension reduction with power \(r=1\). The \(\varepsilon \)-isometric analogue of Question 1.1 deserves further investigation.

Question 1.5

For which values of \(p\ne 2\) does \({\textbf {B}}_{\ell _p}\) admit \(\varepsilon \)-isometric dimension reduction?

Even though the K-incompressibility assumption of Theorem 1.2 may a priori seem restrictive, it is satisfied for most configurations of points in \({\textbf {B}}_{\ell _p}\). Suppose that \(n,N\in \mathbb {N}\) such that N is polynomialFootnote 2 in n. Then, standard considerations show that with high probability, a uniformly chosen n-point subset \(\mathcal {S}\) of \(N^{1/p}{} {\textbf {B}}_{\ell _p^N}\) is \(O(\log n)^{1/p}\)-incompressible. We refer to Remark 2.4 for more information on this and related generic properties of finite subsets of rescaled p-balls.

1.5 \(\varepsilon \)-Isometric Dimension Reduction by Linear Maps

A close inspection of the proof of Theorem 1.2 (see Remark 2.6) reveals that in fact the low-dimensional points \(\{y_i\}_{i=1}^n\) can be realized as images of the initial data \(\{x_i\}_{i=1}^n\) under a carefully chosen linear operator. Nevertheless, we will show that for any \(p\ne 2\) and n large enough, there exists an n-point subset of \({{\textbf {B}}}_{\ell _p}\) whose image under any fixed linear \(\varepsilon \)-isometric embedding has rank which is linear in n. In fact, we shall prove the following more general statement which refines a theorem that Lee, Mendel and Naor proved in [29] for bi-Lipschitz embeddings.

Theorem 1.6

(Impossibility of linear dimension reduction in \({\textbf {B}}_{\ell _p}\)) Fix \(p\ne 2\) and two moduli \(\omega ,\Omega :[0,\infty )\rightarrow [0,\infty )\) with \(\omega (1)>0\). For arbitrarily large \(n\in \mathbb {N}\), there exists an n-point subset \(\mathcal {S}_{n,p}\) of \({\textbf {B}}_{\ell _p}\) such that the following holds. If \(T:\textrm{span}(\mathcal {S}_{n,p})\rightarrow \ell _p^d\) is a linear operator satisfying

$$\begin{aligned} {\forall }x,y\in \mathcal {S}_{n,p}, \qquad \omega (\Vert x-y\Vert _{\ell _p}) \le \Vert Tx-Ty\Vert _{\ell _p^d} \le \Omega (\Vert x-y\Vert _{\ell _p}), \end{aligned}$$
(14)

then \(d\ge \left( \tfrac{\omega (1)}{\Omega (1)}\right) ^\frac{2p}{|p-2|} \cdot \tfrac{n-1}{2}\).

2 Proof of Theorem 1.2

We say that a normed space \((E,\Vert \cdot \Vert _E)\) has Rademacher type p if there exists a universal constant \(T\in (0,\infty )\) such that for every \(n\in \mathbb {N}\) and every \(x_1,\ldots ,x_n\in E\),

$$\begin{aligned} \frac{1}{2^n} \sum _{\varepsilon \in \{-1,1\}^n} \Big \Vert \sum _{i=1}^n \varepsilon _i x_i\Big \Vert _E^p \le T^p \sum _{i=1}^n \Vert x_i\Vert _E^p. \end{aligned}$$
(15)

The least constant T such that (15) is satisfied is denoted by \(T_p(E)\). A standard symmetrization argument (see [28, Prop. 9.11]) shows that if \(X_1,\ldots ,X_n\) are independent E-valued random variables with \(\mathbb {E}[X_i]=0\) for every \(i\in \{1,\ldots ,n\}\), then

$$\begin{aligned} \mathbb {E}\Big \Vert \sum _{i=1}^n X_i\Big \Vert _E^p \le \big (2T_p(E)\big )^p \sum _{i=1}^n \mathbb {E} \Vert X_i\Vert _E^p. \end{aligned}$$
(16)

2.1 Maurey’s Empirical Method and Its Algorithmic Counterparts

A classical theorem of Carathéodory asserts that if \(\mathcal {T}\) is a subset of \(\mathbb {R}^m\), then any point z in the convex hull \(\textrm{conv}(\mathcal {T})\) (that is, a convex combination of finitely many elements of \(\mathcal {T}\)) can be expressed as a convex combination of at most \(m+1\) points of \(\mathcal {T}\). Maurey’s empirical method is a powerful dimension-free approximate version of Carathéodory’s theorem, first popularized in [39], that has numerous applications in geometry and theoretical computer science. Let \((E,\Vert \cdot \Vert _E)\) be a Banach space, consider a bounded subset \(\mathcal {T}\) of E and fix \(z\in \textrm{conv}(\mathcal {T})\). Since z is a convex combination of elements of \(\mathcal {T}\), there exist \(m\in \mathbb {N}\), \(\lambda _1,\ldots ,\lambda _m\in (0,\infty )\), and \(t_1,\ldots ,t_m\in \mathcal {T}\) such that

$$\begin{aligned} \sum _{k=1}^m \lambda _k = 1 \quad \text{ and } \quad z=\sum _{k=1}^m \lambda _kt_k. \end{aligned}$$
(17)

Let X be an E-valued discrete random variable with \(\mathbb {P}\{X=t_k\}=\lambda _k\) for all \(k\in \{1,\ldots ,m\}\) and consider \(X_1,\ldots ,X_d\) i.i.d. copies of X. Then, conditions (17) ensure that X is well defined and \(\mathbb {E}[X]=z\). Therefore, applying the Rademacher type condition (16) to the centered random variables \(\{X_s-z\}_{s=1}^d\) and normalizing, we get

$$\begin{aligned} \mathbb {E}\Big \Vert \frac{1}{d} \sum _{s=1}^d X_s - z\Big \Vert _E^p \le \frac{(2T_p(E))^p}{d^{p-1}} \ \mathbb {E}\Vert X-z\Vert _E^p. \end{aligned}$$
(18)

Since X takes values in \(\mathcal {T}\), if \(\mathcal {T} \subseteq R{{\textbf {B}}}_E\), we then deduce that there exist \(x_1,\ldots ,x_d\in \mathcal {T}\) such that

$$\begin{aligned} \Big \Vert \frac{1}{d}\sum _{s=1}^d x_s - z\Big \Vert _E \le \frac{4RT_p(E)}{d^{1-1/p}}. \end{aligned}$$
(19)

While the above argument is probabilistic, recent works have focused on derandomizing Maurey’s sampling lemma for smaller classes of Banach spaces, thus constructing deterministic algorithms which output the empirical approximation \(\tfrac{x_1+\ldots +x_d}{d}\) of z. The first result in this direction is due to Barman [7] who treated the case that E is an \(L_r(\mu )\)-space, \(r\in (1,\infty )\). This assumption was recently generalized by Ivanov in [18] who built a greedy algorithm which constructs the desired empirical mean in an arbitrary p-uniformly smooth space.

2.2 Dimension Reduction in \(L_p(\mu )\) for Uniformly Bounded Vectors

With Maurey’s empirical method at hand, we are ready to proceed to the first part of the proof of Theorem 1.2, namely the \(\varepsilon \)-isometric dimension reduction property of \(L_p(\mu )\) under the strong assumption that the given point set consists of functions which are bounded in \(L_\infty (\mu )\).

Proposition 2.1

Fix \(p\in [1,\infty )\), \(n\in \mathbb {N}\) and let \(\{x_i\}_{i=1}^n\) be a family of vectors in \(L_p(\mu )\) for some probability measure \(\mu \). Denote by \(L=\max _{i\in \{1,\ldots ,n\}} \Vert x_i\Vert _{L_\infty (\mu )}\in [0,\infty ]\). Then for every \(\varepsilon \in (0,1)\), there exist \(d\in \mathbb {N}\) with \(d\le \tfrac{32e^2(2L)^{2p}\log n}{\varepsilon ^2}\) and \(y_1,\ldots ,y_n\in \ell _p^d\) such that

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \qquad \Vert x_i{-}x_j\Vert ^p_{L_p(\mu )}{-} \varepsilon \le \Vert y_i{-}y_j\Vert _{\ell _p^d}^p \le \Vert x_i{-}x_j\Vert ^p_{L_p(\mu )}+\varepsilon . \end{aligned}$$
(20)

Proof

We shall identify \(\ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) with the vector space of all symmetric \(n\times n\) real matrices with 0 on the diagonal equipped with the supremum norm. Consider the set

$$\begin{aligned}{} & {} \mathcal {C}_p = \big \{ \big ( \Vert z_i-z_j\Vert _{L_p(\rho )}^p\big )_{i,j=1,\ldots ,n}: \ \rho \text{ is } \text{ a } \text{ probability } \text{ measure } \text{ and } \nonumber \\{} & {} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad z_1,\ldots ,z_n\in L_p(\rho )\big \} \subseteq \ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }. \end{aligned}$$
(21)

It is obvious that \(\mathcal {C}_p\) is a cone in the sense that \(\mathcal {C}_p = \lambda \mathcal {C}_p\) for every \(\lambda >0\) but moreover \(\mathcal {C}_p\) is convex. To see this, consider \(A,B\in \mathcal {C}_p\), probability spaces \((\Omega _1,\rho _1), (\Omega _2,\rho _2)\), and vectors \(\{z_i\}_{i=1}^n, \{w_i\}_{i=1}^n\) in \(L_p(\rho _1)\) and \(L_p(\rho _2)\) respectively such that

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \quad A_{ij} = \Vert z_i-z_j\Vert _{L_p(\rho _1)}^p \ \ \text{ and } \ \ B_{ij} = \Vert w_i-w_j\Vert _{L_p(\rho _2)}^p. \end{aligned}$$
(22)

Fix \(\lambda \in (0,1)\) and consider the disjoint union \(\Omega _1\sqcup \Omega _2\) of \(\Omega _1\) and \(\Omega _2\) equipped with the probability measure \(\rho (\lambda ) = \lambda \rho _1+(1-\lambda )\rho _2\). Then, by (22) the functions \(\zeta _i:\Omega _1\sqcup \Omega _2\rightarrow \mathbb {R}\) given by \(\zeta _i|_{\Omega _1} = z_i\) and \(\zeta _i|_{\Omega _2}=w_i\), where \(i\in \{1,\ldots ,n\}\), belong to \(L_p(\rho (\lambda ))\) and satisfy the conditions

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \Vert \zeta _i-\zeta _j\Vert _{L_p(\rho (\lambda ))}^p= & {} \lambda \Vert z_i-z_j\Vert _{L_p(\rho _1)}^p + (1-\lambda ) \Vert w_i-w_j\Vert _{L_p(\rho _2)}^p\nonumber \\ {}= & {} \lambda A_{ij} + (1-\lambda ) B_{ij}, \end{aligned}$$
(23)

which ensure that \(\lambda A+(1-\lambda )B\in \mathcal {C}_p\), making \(\mathcal {C}_p\) a convex cone. Consider the embedding \(\mathcal {M}:L_p(\mu )^n\rightarrow \mathcal {C}_p\) mapping a vector \(z=(z_1,\ldots ,z_n)\) to the corresponding distance matrix, i.e.

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \qquad \mathcal {M}(z)_{ij} = \Vert z_i-z_j\Vert _{L_p(\mu )}^p. \end{aligned}$$
(24)

By Ball’s isometric embedding theorem [6], \(x_1,\ldots ,x_n\) have isometric images in \(\ell _p^N\) with \(N=\left( {\begin{array}{c}n\\ 2\end{array}}\right) +1\). Without loss of generality we will thus assume that the given points \(x_1,\ldots ,x_n\in L_p(\mu )\) are simple functions (that is, each of them takes only finitely many values) with \(\Vert x_i\Vert _{L_\infty (\mu )} \le L\). Let \(\{S_1,\ldots ,S_m\}\) be a partition of the underlying measure space such that each function \(x_i\) is constant on each \(S_k\) and suppose that \(x_i|_{S_k} = a(i,k) \in [-L,L]\) for \(i\in \{1,\ldots ,n\}\) and \(k\in \{1,\ldots ,m\}\). Then, for every \(i,j\in \{1,\ldots ,n\}\), we have

$$\begin{aligned} \mathcal {M}(x)_{ij}= & {} \sum _{k=1}^m \int _{S_k} |x_i-x_j|^p \,\mathop {}\!\textrm{d}\mu = \sum _{k=1}^m \mu (S_k) \cdot \big |a(i,k)-a(j,k)\big |^p \nonumber \\= & {} \sum _{k=1}^m \mu (S_k) \ \mathcal {M}\big (y(k)\big )_{ij}, \end{aligned}$$
(25)

where \(y(k) {\mathop {=}\limits ^{\textrm{def}}}(a(1,k),\ldots ,a(n,k))\in L_p(\mu )^n\) is a vector whose components are constant functions. As \(\mu \) is a probability measure and \(\{S_1,\ldots ,S_m\}\) is a partition, identity (25) implies that

$$\begin{aligned} \mathcal {M}(x) \in \textrm{conv} \big \{ \mathcal {M}\big ( y(k)\big ): \ k\in \{1,\ldots ,m\}\big \} \subseteq \ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }.\quad \end{aligned}$$
(26)

Observe that since \(a(i,k)\in [-L,L]\) for every \(i\in \{1,\ldots ,n\}\) and \(k\in \{1,\ldots ,m\}\), we have

$$\begin{aligned} {\forall }k\in \{1,\ldots ,m\}, \qquad \big \Vert \mathcal {M}\big (y(k)\big )\big \Vert _{\ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }} {=} \max _{i,j\in \{1,\ldots ,n\}} \big |a(i,k){-}a(j,k)\big |^p \le (2L)^p.\qquad \end{aligned}$$
(27)

Moreover, \(\ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) is e-isomorphic to \(\ell _{p_n}^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) where \(p_n=\log \left( {\begin{array}{c}n\\ 2\end{array}}\right) \). It is well-known (see [28, Chap. 9]) that \(T_2(\ell _p) \le \sqrt{p-1}\) for every \(p\ge 2\) and thus

$$\begin{aligned} T_2\big (\ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) } \big ) \le e\sqrt{p_n-1} < \sqrt{2e^2\log n}. \end{aligned}$$
(28)

Applying Maurey’s sampling lemma (Sect. 2.1) while taking into account (27) and (28), we deduce that for every \(d\ge 1\) there exist \(k_1,\ldots ,k_d\in \{1,\ldots ,m\}\) such that

$$\begin{aligned} \Big \Vert \frac{1}{d} \sum _{s=1}^d \mathcal {M}\big ( y(k_s)\big ) - \mathcal {M}(x)\Big \Vert _{\ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }} \le \frac{2^{p+\frac{5}{2}}eL^p\sqrt{\log n}}{\sqrt{d}}. \end{aligned}$$
(29)

Therefore, if \(\varepsilon \in (0,1)\) is such that \(d\ge \tfrac{32e^2 (2\,L)^{2p}\log n}{\varepsilon ^2}\) we then have

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\},\qquad \Big | \frac{1}{d} \sum _{s=1}^d \big |a(i,k_s)-a(j,k_s)\big |^p - \Vert x_i-x_j\Vert _{L_p(\mu )}^p \Big | \le \varepsilon .\qquad \end{aligned}$$
(30)

Finally, consider for each \(i\in \{1,\ldots ,n\}\) a vector \(y_i=(y_i(1),\ldots ,y_i(d))\in \ell _p^d\) given by

$$\begin{aligned} {\forall }s\in \{1,\ldots ,d\}, y_i(s) = \frac{a(i,k_s)}{d^{1/p}} \end{aligned}$$
(31)

and notice that (30) can be equivalently rewritten as

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \quad \Vert x_i-x_j\Vert ^p_{L_p(\mu )}- \varepsilon \le \Vert y_i-y_j\Vert _{\ell _p^d}^p \le \Vert x_i-x_j\Vert ^p_{L_p(\mu )}+\varepsilon , \nonumber \\ \end{aligned}$$
(32)

concluding the proof of the proposition. \(\square \)

Remark 2.2

It is worth emphasizing that the coordinates of the vectors \(y_1,\ldots ,y_n\) produced in Proposition 2.1 consist (up to rescaling) of values of the functions \(x_1,\ldots ,x_n\). Such low-dimensional embeddings via sampling are a central object of study in approximation theory, see e.g. the recent survey [25] and the references therein.

The additive version of the Johnson–Lindenstrauss lemma, first observed in [31] as a consequence of a deep matrix deviation inequality (see also [44, Chap. 9]), asserts that for every n-point subset \(\mathcal {X}=\{x_1,\ldots ,x_n\}\) of a Hilbert space \(\mathcal {H}\) and every \(\varepsilon \in (0,1)\), there exist \(d\le \tfrac{C w(\mathcal {X})^2}{\varepsilon ^2}\) and points \(y_1,\ldots ,y_n\in \ell _2^d\) such that

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \qquad \Vert x_i-x_j\Vert _{\mathcal {H}}-\varepsilon \le \Vert y_i-y_j\Vert _{\ell _2^d} \le \Vert x_i-x_j\Vert _\mathcal {H}+\varepsilon , \end{aligned}$$
(33)

where \(w(\mathcal {X})\) is the mean width of \(\mathcal {X}\). We will now observe that the spherical symmetry of \({\textbf {B}}_{\ell _2}\) allows us to deduce a similar conclusion for points in \({\textbf {B}}_{\mathcal {H}}\) by removing the incompressibility assumption from Proposition 2.1 when \(p=2\). We shall use the standard notation \(L_p^N\) for the space \(L_p(\mu _N)\) where \(\mu _N\) is the normalized counting measure on the finite set \(\{1,\ldots ,N\}\), that is

$$\begin{aligned} {\forall }a=(a_1,\ldots ,a_N)\in \mathbb {R}^N, \Vert a\Vert _{L_p^N} {\mathop {=}\limits ^{\textrm{def}}}\Big (\frac{1}{N}\sum _{i=1}^N |a_i|^p\Big )^{1/p}. \end{aligned}$$
(34)

Observe that for \(0<p<q\le \infty \), we have \({\textbf {B}}_{L_q^N} \subseteq {\textbf {B}}_{L_p^N}\).

Corollary 2.3

There exists a universal constant \(C\in (0,\infty )\) such that the following statement holds. Fix \(n\in \mathbb {N}\) and let \(\{x_i\}_{i=1}^n\) be a family of vectors in \({\textbf {B}}_{\mathcal {H}}\) for some Hilbert space \(\mathcal {H}\). Then for every \(\varepsilon \in (0,1)\), there exist \(d\in \mathbb {N}\) with \(d\le \tfrac{C(\log n)^3}{\varepsilon ^4}\) and points \(y_1,\ldots ,y_n\in \ell _2^d\) such that

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\}, \quad \Vert x_i-x_j\Vert _{\mathcal {H}}- \varepsilon \le \Vert y_i-y_j\Vert _{\ell _2^d} \le \Vert x_i-x_j\Vert _{\mathcal {H}}+\varepsilon . \end{aligned}$$
(35)

Before proceeding to the derivation of (35) we emphasize that since the given points \(\{x_i\}_{i=1}^n\) belong to \({\textbf {B}}_\mathcal {H}\), Corollary 2.3 is formally weaker than the Johnson–Lindenstrauss lemma. However we include it here since it differs from [21] in that the low-dimensional point set \(\{y_i\}_{i=1}^n\) is not obtained as an image of \(\{x_i\}_{i=1}^n\) under a typical low-rank matrix from a specific ensemble.

Proof of Corollary 2.3

Since any n-point subset \(\{x_1,\ldots ,x_n\}\) of \(\mathcal {H}\) embeds linearly and isometrically in \(L_2^n\), we assume that \(x_1,\ldots ,x_n\in {\textbf {B}}_{L_2^{n}}\). We will need the following claim.

Claim. Suppose that \(X_1,\ldots ,X_n\) are (not necessarily independent) random vectors, each uniformly distributed on the unit sphere \(\mathbb {S}^{n-1}\) of \(L_2^{n}\). Then, for some universal constant \(S\in (0,\infty )\),

$$\begin{aligned} \mathbb {E} \big [ \max _{i\in \{1,\ldots ,n\}} \Vert X_i\Vert _{L_\infty ^{n}} \big ] \le S \sqrt{\log n}, \end{aligned}$$
(36)

Proof of the Claim

By a standard estimate of Schechtman and Zinn [43, Thm. 3], for a uniformly distributed random vector X on the unit sphere \(\mathbb {S}^{n-1}\) of \(L_2^{n}\), we have

$$\begin{aligned} {\forall }t\ge \gamma _1\sqrt{\log n}, \qquad \mathbb {P}\big \{ \Vert X\Vert _{L_\infty ^{n}} > t\big \} \le e^{-\gamma _2 t^2} \end{aligned}$$
(37)

for some absolute constants \(\gamma _1,\gamma _2\in (0,\infty )\). Let \(W{\mathop {=}\limits ^{\textrm{def}}}\max _{i\in \{1,\ldots ,n\}} \Vert X_i\Vert _{L_\infty ^{n}}\) and notice that

$$\begin{aligned} \forall \; K\in (\gamma _1,\infty ), \quad \mathbb {E}[W] = \int _0^\infty \mathbb {P}\{W{>}t\} \,\mathop {}\!\textrm{d}t {\le } K\sqrt{\log n} {+} \int _{K\sqrt{\log n}}^\infty \mathbb {P}\{W{>}t\} \,\mathop {}\!\textrm{d}t.\qquad \end{aligned}$$
(38)

By the union bound, we have

$$\begin{aligned} {\forall }t>0, \qquad \mathbb {P}\{W>t\} \le \sum _{i=1}^n \mathbb {P}\{X_i>t\} = n \mathbb {P}\{X_1>t\}. \end{aligned}$$
(39)

Combining (38) and (39), we therefore get

$$\begin{aligned} \begin{aligned} \mathbb {E}[W]&\le K\sqrt{\log n} +n \int _{K\sqrt{\log n}} \mathbb {P}\{X_1>t\}\,\mathop {}\!\textrm{d}t {\mathop {\le }\limits ^{37}} K\sqrt{\log n} + n \int _{K\sqrt{\log n}}^\infty e^{-\gamma _2t^2} \,\mathop {}\!\textrm{d}t \\ {}&= K\sqrt{\log n} +n\sqrt{\log n} \int _{K}^\infty n^{-\gamma _2u^2} \,\mathop {}\!\textrm{d}u \\ {}&= K\sqrt{\log n} + \sqrt{\log n} \int _K^\infty n^{1-\gamma _2u^2} \,\mathop {}\!\textrm{d}u. \end{aligned} \end{aligned}$$
(40)

Choosing \(K>\gamma _1\) such that \(K^2\gamma _2>1\), the exponent in the last integrand becomes negative, thus

$$\begin{aligned} \mathbb {E}[W]\le K\sqrt{\log n}+2\sqrt{\log n} \int _K^\infty 2^{-\gamma _2 u^2}\,\mathop {}\!\textrm{d}u \le S\sqrt{\log n} \end{aligned}$$
(41)

for a large enough constant \(S\in (0,\infty )\) and the claim follows.

Now let \(U \in \mathcal {O}(n)\) be a uniformly chosen random rotation on \(\mathbb {R}^{n}\). The aforementioned claim shows that since \(\Vert x_i\Vert _{L_2^{n}}\le 1\) for every \(i\in \{1,\ldots ,n\}\), writing \(\hat{x}_i =\tfrac{x_i}{\Vert x_i\Vert _{L_2^{n}}}\), we have the estimate

$$\begin{aligned} \mathbb {E}\big [ \max _{i\in \{1,\ldots ,n\}} \Vert Ux_i\Vert _{L_\infty ^{n}}\big ] \le \mathbb {E}\big [ \max _{i\in \{1,\ldots ,n\}} \Vert U\hat{x}_i\Vert _{L_\infty ^{n}}\big ] \le S\sqrt{\log n}. \end{aligned}$$
(42)

Therefore, by (42) and Proposition 2.1 there exist a constant \(C\in (0,\infty )\) and a rotation \(U\in \mathcal {O}(n)\) such that for every \(\varepsilon \in (0,1)\) there exist \(d\le \tfrac{C(\log n)^3}{\varepsilon ^4}\) and points \(y_1,\ldots ,y_n\in \ell _2^d\) for which

$$\begin{aligned} {\forall }i,j\in \{1,\ldots ,n\},\qquad \Vert Ux_i-Ux_j\Vert ^2_{L_2^{n}}{-} \varepsilon ^2 \le \Vert y_i{-}y_j\Vert _{\ell _2^d}^2 {\le } \Vert Ux_i{-}Ux_j\Vert ^2_{L_2^{n}}{+}\varepsilon ^2. \nonumber \\ \end{aligned}$$
(43)

Since \(\Vert Ua-Ub\Vert _{L_2^{n}} = \Vert a-b\Vert _{L_2^{n}}\) for every \(a,b\in L_2^n\), the conclusion follows by the elementary inequality \(|\alpha -\beta | \le \sqrt{|\alpha ^2-\beta ^2|}\) which holds for every positive numbers \(\alpha ,\beta \in (0,\infty )\). \(\square \)

Remark 2.4

Fix \(p\in [1,\infty )\). The isometric embedding theorem of Ball [6] asserts that any n-point subset of \(\ell _p\) admits an isometric embedding into \(\ell _p^N\) where \(N=\left( {\begin{array}{c}n\\ 2\end{array}}\right) +1\). Suppose, more generally, that \(n,N\in \mathbb {N}\) are such that N is polynomial in n. Considerations in the spirit of the proof of Corollary 2.3 (e.g. relying on [43]) then show that if \(x_1,\ldots ,x_n\) are independent uniformly random points in \({\textbf {B}}_{L_p^N}\), then the random set \(\{x_1,\ldots ,x_n\}\) is \(O(\log n)^{1/p}\)-incompressible. In other words, incompressibility is a generic property of random n-point subsets of \({\textbf {B}}_{L_p^N}\). On the other hand, a typical n-point subset of \({\textbf {B}}_{L_p^N}\) is known to be approximately a simplex due to work of Arias-de-Reyna, Ball, and Villa [4] and so, in particular, it can be bi-Lipschitzly embedded in \(O(\log n)\) dimensions.

2.3 Factorization and Proof of Theorem 1.2

Observe that Proposition 2.1 is rather non-canonical as the conclusion depends on the pairwise distances between the points \(\{x_i\}_{i=1}^n\) in \(L_p(\mu )\) whereas the bound on the dimension depends on \(L=\max _i \Vert x_i\Vert _{L_\infty (\mu )}\). In order to deduce Theorem 1.2 from this (a priori weaker) statement we shall leverage the fact that Proposition 2.1 holds for any probability measure \(\mu \) by optimizing this parameter L over all lattice-isomorphic images of \(\{x_i\}_{i=1}^n\). The optimal such change of measure which allows us to replace L by \(\Vert \max _i |x_i|\Vert _{L_p(\mu )}\) is a special case of a classical factorization theorem of Maurey (see [34] or [23, Thm. 5] for the general statement), whose short proof we include for completeness.

Proposition 2.5

Fix \(n\in \mathbb {N}\), \(p\in (0,\infty )\), and a probability space \((\Omega ,\mu )\). For every points \(x_1,\ldots ,x_n\in L_p(\mu )\), there exists a nonnegative density function \(f:\Omega \rightarrow \mathbb {R}_+\) supported on the support of \(\max _i|x_i|\) such that if \(\nu \) is the probability measure on \(\Omega \) given by \(\tfrac{\mathop {}\!\textrm{d}\nu }{\mathop {}\!\textrm{d}\mu }=f\), then

$$\begin{aligned} \max _{i\in \{1,\ldots ,n\}}\big \Vert x_i f^{-1/p}\big \Vert _{L_\infty (\nu )} \le \big \Vert \max _{i\in \{1,\ldots ,n\}} |x_i|\big \Vert _{L_p(\mu )}. \end{aligned}$$
(44)

Proof

Let \(V=\textrm{supp}(\max _i |x_i|)\subseteq \Omega \) and define the change of measure f as

$$\begin{aligned} {\forall }\omega \in \Omega , \quad f(\omega ) {\mathop {=}\limits ^{\textrm{def}}}\frac{\max _{i\in \{1,\ldots ,n\}}|x_i(\omega )|^p}{\int _\Omega \max _{i\in \{1,\ldots ,n\}}|x_i(\theta )|^p\,\mathop {}\!\textrm{d}\theta }. \end{aligned}$$
(45)

Then, (44) is elementary to check. \(\square \)

We are now ready to complete the proof of Theorem 1.2.

Proof of Theorem 1.2

Fix a K-incompressible family of vectors \(x_1,\ldots ,x_n\in L_p(\Omega ,\mu )\) and let \(V=\textrm{supp}( \max _i |x_i|)\subseteq \Omega \). Denote by \(f:\Omega \rightarrow \mathbb {R}_+\) the change of density from Proposition 2.5. If \(\tfrac{\mathop {}\!\textrm{d}\nu }{\mathop {}\!\textrm{d}\mu }=f\), then the linear operator \(T:L_p(V,\mu )\rightarrow L_p(\Omega ,\nu )\) given by \(Tg = f^{-1/p}g\) is (trivially) a linear isometry. Therefore, Proposition 2.1 and (44) show that there exist \(d\in \mathbb {N}\) with \(d\le \tfrac{32e^2(2K)^{2p}\log n}{\varepsilon ^2}\) and points \(y_1,\ldots ,y_n\in \ell _p^d\) such that the condition

$$\begin{aligned} \Vert x_i-x_j\Vert ^p_{L_p(\mu )}- \varepsilon= & {} \Vert Tx_i-Tx_j\Vert ^p_{L_p(\nu )}- \varepsilon \le \Vert y_i-y_j\Vert _{\ell _p^d}^p \nonumber \\\le & {} \Vert Tx_i-Tx_j\Vert ^p_{L_p(\nu )} + \varepsilon = \Vert x_i-x_j\Vert ^p_{L_p(\mu )}+\varepsilon , \end{aligned}$$
(46)

is satisfied for every \(i,j\in \{1,\ldots ,n\}\). This concludes the proof of Theorem 1.2.

Remark 2.6

A careful inspection of the proof of Theorem 1.2 reveals that the low-dimensional points \(\{y_i\}_{i=1}^n\) can be obtained as images of the given points \(\{x_i\}_{i=1}^n\) under a linear transformation. Indeed, starting from a K-incompressible family of points \(\{x_i\}_{i=1}^n\) in \(L_p(\Omega ,\mu )\), we use Proposition 2.5 to find a change of measure \(T:L_p(V,\mu )\rightarrow L_p(\Omega ,\nu )\) such that \(\{Tx_i\}_{i=1}^n\) satisfy the stronger assumption of Proposition 2.1. Then, for some \(d\in \mathbb {N}\) with \(d\le \tfrac{32e^2(2K)^{2p}\log n}{\varepsilon ^2}\) we find pairwise disjoint measurable subsets \(S_1,\ldots ,S_d\) of \(\Omega \), each with positive measure, such that if \(S:L_p(\Omega ,\nu )\rightarrow \ell _p^d\) is the linear map

$$\begin{aligned} {\forall }z\in L_p(\Omega ,\nu ), \quad Sz {\mathop {=}\limits ^{\textrm{def}}}\frac{1}{d^{1/p}}\Big ( \frac{1}{\mu (S_1)} \int _{S_1} z \,\mathop {}\!\textrm{d}\nu , \ldots , \frac{1}{\mu (S_d)} \int _{S_d} z\,\mathop {}\!\textrm{d}\nu \Big ) \in \ell _p^d, \end{aligned}$$
(47)

then the points \(\{y_i\}_{i=1}^n = \{(S\circ T)x_i\}_{i=1}^n\subseteq \ell _p^d\) satisfy the desired conclusion (6).

We conclude this section by observing that the argument leading to Theorem 1.2 is constructive.

Corollary 2.7

In the setting of Theorem 1.2, there exists a greedy algorithm which receives as input the high-dimensional points \(\{x_i\}_{i=1}^n\) and produces as output the low-dimensional points \(\{y_i\}_{i=1}^n\).

Proof

As the density (45) is explicitly defined, the linear operator \(T:L_p(V,\mu )\rightarrow L_p(\Omega ,\nu )\) can also be efficiently constructed. On the other hand, in order to construct the operator S defined by (47) one needs to find the corresponding partition \(\{S_1,\ldots ,S_d\}\) and this was achieved in Proposition 2.1 via an application of Maurey’s sampling lemma to the cone \(\mathcal {C}_p \subseteq \ell _\infty ^{N}\) where \(N=\left( {\begin{array}{c}n\\ 2\end{array}}\right) \). As \(\ell _\infty ^{N}\) is e-isomorphic to the 2-uniformly smooth space \(\ell _{\log N}^{N}\), Ivanov’s result from [18] implies that the construction can be implemented by a greedy algorithm. \(\square \)

Analysis of the algorithm. The only nontrivial algorithmic task in our dimensionality reduction result is the implementation of Maurey’s approximate Carathéodory theorem. In the special case of \(\ell _p\) spaces, various constructive proofs of Maurey’s lemma are known [7, 11, 35], each of which allows for an analysis of the algorithm’s running time. Assume that the initial points \(x_1,\ldots ,x_n\in {\textbf {B}}_{\ell _p^m}\) for some finite m. Implementing, for instance, the mirror descent algorithm of [35, Thm. 3.5] on the convex hull of \(\mathcal {M}(y(1)),\ldots ,\mathcal {M}(y(m))\) appearing in the proof of Theorem 1.2, the corresponding indices \(k_1,\ldots ,k_d\) can be produced in time \(O(m n^2 \log n /\varepsilon ^2)\). Therefore, assuming that the points \(x_1,\ldots ,x_n\) a priori lie in a \(\textrm{poly}(n)\)-dimensional space (as is reasonable by Ball’s embedding theorem), the output points \(y_1,\ldots ,y_n\) can be constructed in time \(\textrm{poly}(n,1/\varepsilon )\).

3 Proof of Theorem 1.6

In this section we prove Theorem 1.6. The constructed subset of \({\textbf {B}}_{\ell _p}\) which does not embed linearly into \(\ell _p^d\) for small d is a slight modification of the one considered in [29].

Proof of Theorem 1.6

Fix \(m\in \mathbb {N}\) and denote by \(\{w_i\}_{i=1}^{2^m}\) the rows of the \(2^m\times 2^m\) Walsh matrix and by \(\{e_i\}_{i=1}^{2^m}\) the coordinate basis vectors of \(\mathbb {R}^{2^m}\). Consider the n-point set

$$\begin{aligned} \mathcal {S}_{n,p} = \{0\} \cup \{e_1,\ldots ,e_{2^m}\} \cup \big \{ \tfrac{w_1}{2^{m/p}}, \ldots , \tfrac{w_{2^m}}{2^{m/p}}\big \} \subseteq {\textbf {B}}_{\ell _p^{2^m}} \end{aligned}$$
(48)

where \(n=2^{m+1}+1\) and suppose that \(T:\ell _p^{2^m} \rightarrow \ell _p^d\) is a linear operator such that

$$\begin{aligned} {\forall }x,y\in \mathcal {S}_{n,p}, \omega (\Vert x-y\Vert _{\ell ^{2^m}_p}) \le \Vert Tx-Ty\Vert _{\ell _p^d} \le \Omega (\Vert x-y\Vert _{\ell ^{2^m}_p}). \end{aligned}$$
(49)

Assume first that \(1\le p<2\). If we write \(w_i = \sum _{j=1}^{2^m} w_i(j) e_j\) then by orthogonality of \(\{w_i\}_{i=1}^{2^m}\),

$$\begin{aligned} \sum _{i=1}^{2^m} \Vert Tw_i\Vert _{\ell _2^d}^2= & {} \sum _{i=1}^{2^m} \Big \Vert \sum _{j=1}^{2^m} w_i(j) Te_j\Big \Vert _{\ell _2^d}^2 = \sum _{j,k=1}^{2^m} \langle w_j, w_k\rangle \langle Te_j, Te_k\rangle \nonumber \\ {}= & {} 2^{m} \sum _{j=1}^{2^m} \Vert Te_j\Vert _{\ell _2^d}^2. \end{aligned}$$
(50)

By assumption (49) on T, we have

$$\begin{aligned} {\forall }j\in \{1,\ldots ,2^m\}, \qquad \Vert Te_j\Vert _{\ell _2^d}^2 \le \Vert Te_j\Vert _{\ell _p^d}^2 \le \Omega (1)^2 \end{aligned}$$
(51)

and

$$\begin{aligned} {\forall }j\in \{1,\ldots ,2^m\}, \qquad \Vert Tw_j\Vert _{\ell _2^d}^2 {\ge } 2^{\frac{2m}{p}} d^{-\frac{2-p}{p}}\big \Vert T\big (\tfrac{w_j}{2^{m/p}} \big )\big \Vert _{\ell _p^d}^2 {\ge } 2^{\frac{2m}{p}} d^{-\frac{2-p}{p}} \omega (1)^2.\qquad \quad \end{aligned}$$
(52)

Combining (50), (51), and (52) we deduce that

$$\begin{aligned} 2^{m(1+\frac{2}{p})} d^{-\frac{2-p}{p}} \omega (1)^2 \le 4^m\Omega (1)^2, \end{aligned}$$
(53)

which is equivalent to \(d\ge \left( \tfrac{\omega (1)}{\Omega (1)}\right) ^\frac{2p}{2-p} 2^m = \left( \tfrac{\omega (1)}{\Omega (1)}\right) ^\frac{2p}{|p-2|} \cdot \tfrac{n-1}{2}\). The case \(p>2\) is treated similarly.

Remark 3.1

The point set \(\mathcal {S}_{n,p}\) considered in the proof of Theorem 1.6 for \(p\ne 2\) is \(O(n^{1/p})\) incompressible and does not admit a linear \(\tfrac{1}{2}\)-isometric embedding in fewer than \(\Omega (n)\) dimensions. This shows that the dimension of the linear embedding exhibited in Theorem 1.2 has to be of order at least \(\Omega (K^p)\) up to lower order terms. This should be compared with the \(O(K^{2p}\log n)\) upper bound of Theorem 1.2.