# Dimension Reduction for Finite Trees in \(\varvec{\ell _1}\)

## Abstract

We show that every \(n\)-point tree metric admits a \((1+\varepsilon )\)-embedding into \(\ell _1^{C(\varepsilon ) \log n}\), for every \(\varepsilon > 0\), where \(C(\varepsilon ) \le O\big ((\frac{1}{\varepsilon })^4 \log \frac{1}{\varepsilon })\big )\). This matches the natural volume lower bound up to a factor depending only on \(\varepsilon \). Previously, it was unknown whether even complete binary trees on \(n\) nodes could be embedded in \(\ell _1^{O(\log n)}\) with \(O(1)\) distortion. For complete \(d\)-ary trees, our construction achieves \(C(\varepsilon ) \le O\big (\frac{1}{\varepsilon ^2}\big )\).

## Keywords

Dimension reduction Metric embeddings Bi-Lipschitz distortion## 1 Introduction

*finite tree metric.*

*Lipschitz constant of*\(f\) by

*-Lipschitz*map is one for which \(\Vert f\Vert _{\mathrm {Lip}} \le L\). One defines the

*distortion*of the mapping \(f\) to be dist \((f) = \Vert f\Vert _{\mathrm {Lip}} \cdot \Vert f^{-1}\Vert _{\mathrm {Lip}}\), where the distortion is understood to be infinite when \(f\) is not injective. We say that \((X,d_X) D\)-embeds into \((Y,d_Y)\) if there is a mapping \(f : X \rightarrow Y\) with dist \((f) \le D\).

Using the notation \(\ell _1^k\) for the space \({\mathbb {R}}^k\) equipped with the \(\Vert \cdot \Vert _1\) norm, we study the following question: How large must \(k=k(n,\varepsilon )\) be so that every \(n\)-point tree metric \((1+\varepsilon )\)-embeds into \(\ell _1^k\)?

### 1.1 Dimension Reduction in \(\ell _1\)

A seminal result of Johnson and Lindenstrauss [8] implies that for every \(\varepsilon > 0\), every \(n\)-point subset \(X \subseteq \ell _2\) admits a \((1+\varepsilon )\)-distortion embedding into \(\ell _2^k\), with \(k = O\big (\frac{\log n}{\varepsilon ^2}\big )\). On the other hand, the known upper bounds for \(\ell _1\) are much weaker. Talagrand [19], following earlier results of Bourgain–Lindenstrauss–Milman [3] and Schechtman [17], showed that every \(n\)-dimensional subspace \(X \subseteq \ell _1\) (and, in particular, every \(n\)-point subset) admits a \((1+\varepsilon )\)-embedding into \(\ell _1^k\), with \(k=O\big (\frac{n \log n}{\varepsilon ^2}\big )\). For \(n\)-point subsets, this was very recently improved to \(k=O(n/\varepsilon ^2)\) by Newman and Rabinovich [15], using the spectral sparsification techniques of Batson et al. [4].

On the other hand, Brinkman and Charikar [2] showed that there exist \(n\)-point subsets \(X \subseteq \ell _1\) such that any \(D\)-embedding of \(X\) into \(\ell _1^k\) requires \(k \ge n^{\Omega (1/D^2)}\) (see also [10] for a simpler proof). Thus the exponential dimension reduction achievable in the \(\ell _2\) case cannot be matched for the \(\ell _1\) norm. More recently, it has been shown by Andoni et al. [1] that there exist \(n\)-point subsets such that any \((1+\varepsilon )\)-embedding requires dimension at least \(n^{1-O(1/\log (\varepsilon ^{-1}))}\). Regev [16] has given an elegant proof of both these lower bounds based on information theoretic arguments.

One can still ask about the possibility of more substantial dimension reduction for certain finite subsets of \(\ell _1\). Such a study was undertaken by Charikar and Sahai [5]. In particular, it is an elementary exercise to verify that every finite tree metric embeds isometrically into \(\ell _1\), thus the \(\ell _1\) dimension reduction question for trees becomes a prominent example of this type. It was shown^{1} [5] that for every \(\varepsilon > 0\), every \(n\)-point tree metric \((1+\varepsilon )\)-embeds into \(\ell _1^k\) with \(k = O\big (\frac{\log ^2 n}{\varepsilon ^2}\big )\). It is quite natural to ask whether the dependence on \(n\) can be reduced to the natural volume lower bound of \(\Omega (\log n)\). Indeed, it is Question 3.6 in the list “Open problems on embeddings of finite metric spaces” maintained by Matoušek [13], asked by Gupta et al.^{2} As noted there, the question was, surprisingly, even open for the complete binary tree on \(n\) vertices. The present paper resolves this question, achieving the volume lower bound for all finite trees.

**Theorem 1.1**

For every \(\varepsilon > 0\) and \(n \in \{1,2,3,\ldots \}\), the following holds. Every \(n\)-point tree metric admits a \((1+\varepsilon )\)-embedding into \(\ell _1^k\) with \(k = O\big ((\frac{1}{\varepsilon })^4 \log \frac{1}{\varepsilon } \log n\big )\). If the tree is a complete \(d\)-ary tree of some height, the bound improves to \(k = O\big ((\frac{1}{\varepsilon })^2 \log n\big )\).

The proof for the general case is presented in Sect. 3.1. The special case of complete \(d\)-ary trees is addressed in Sect. 2. We remark that the proof also yields a randomized polynomial-time algorithm to construct the embedding.

By simple volume arguments, the \(\Theta (\log n)\) factor is necessary. Regarding the dependence on \(\varepsilon \), it is known [9] that for complete binary trees, one must have \(k \ge \Omega \big (\frac{\log n}{\varepsilon ^2 \log (1/\varepsilon )}\big )\), showing that, for this special case, Theorem 1.1 is tight up to a \(\log (1/\varepsilon )\) factor.

### 1.2 Notation

For a graph \(G=(V,E)\), we use the notations \(V(G)\) and \(E(G)\) to denote the vertex and edge sets of \(G\), respectively. For a connected, rooted tree \(T=(V,E)\) and \(x,y \in V\), we use the notation \(P_{xy}\) for the unique path between \(x\) and \(y\) in \(T\), and \(P_{x}\) for \(P_{rx}\), where \(r\) is the root of \(T\).

We use \({\mathbb {N}}\) for the set of positive integers \(\{1,2,3,\ldots \}\). For \(k \in {\mathbb {N}}\), we write \([k] = \{1,2,\ldots ,k\}\). We also use the asymptotic notation \(A \lesssim B\) to denote that \(A = O(B)\), and \(A \asymp B\) to denote the conjunction of \(A \lesssim B\) and \(B \lesssim A\).

### 1.3 Proof Outline and Related Work

**Re-randomization**. Consider an unweighted, complete binary tree of height \(h\). Denote the tree by \(T_h = (V_h, E_h)\), let \(n=2^{h+1}-1\) be the number of vertices, and let \(r\) denote the root of the tree. Let \(\kappa \in {\mathbb {N}}\) be some constant which we will choose momentarily. If we assign to every edge \(e \in E_h\), a label \(\lambda (e) \in {\mathbb {R}}^{\kappa }\), then there is a natural mapping \(\tau _{\lambda } : V_h \rightarrow \{0,1\}^{\kappa h}\) given by

If we choose the label map \(\lambda : E_h \rightarrow \{0,1\}^{\kappa }\) uniformly at random, the probability for the embedding \(\tau _{\lambda }\) specified in (2) to have \(O(1)\) distortion is at most exponentially small in \(n\). In fact, the probability for \(\tau _{\lambda }\) to be injective is already this small. This is because for two nodes \(u,v \in V_h\) which are the children of the same node \(w\), there is \(\Omega (1)\) probability that \(\tau _{\lambda }(u)=\tau _{\lambda }(v)\), and there are \(\Omega (n)\) such independent events. In Sect. 2, we show that a judicious application of the Lovász Local Lemma [6] can be used to show that \(\tau _{\lambda }\) has \(O(1)\) distortion with non-zero probability. In fact, we show that this approach can handle arbitrary \(k\)-ary complete trees, with distortion \(1+\varepsilon \). Unknown to us at the time of discovery, a closely related construction occurs in the context of tree codes for interactive communication [18].

Unfortunately, the use of the Local Lemma does not extend well to the more difficult setting of arbitrary trees. For the general case, we employ an idea of Schulman [18] based on *re-randomization*. To see the idea in our simple setting, consider \(T_h\) to be composed of a root \(r\), under which lie two copies of \(T_{h-1}\), which we call A and B, having roots \(r_\mathrm{A}\) and \(r_\mathrm{B}\), respectively.

The idea is to assume that, inductively, we already have a labeling \(\lambda _{h-1} : E_{h-1} \rightarrow \{0,1\}^{\kappa (h-1)}\) such that the corresponding map \(\tau _{\lambda _{h-1}}\) has \(O(1)\) distortion on \(T_{h-1}\). We will then construct a random labeling \(\lambda _h : E_h \rightarrow \{0,1\}^{\kappa }\) by using \(\lambda _{h-1}\) on the A-side, and \(\pi (\lambda _{h-1})\) on the B-side, where \(\pi \) randomly alters the labeling in such a way that \(\tau _{\pi (\lambda _{h-1})}\) is simply \(\tau _{\lambda _{h-1}}\) composed with a random isometry of \(\ell _1^{\kappa (h-1)}\). We will then argue that with positive probability (over the choice of \(\pi \)), \(\tau _{\lambda _h}\) has \(O(1)\) distortion,

Now consider some pair \(x \in V(A)\) and \(y \in V(B)\). It is simple to argue that it suffices to bound the distortion for pairs with \(m=d_{T_h}(r,x)=d_{T_h}(r,y)\) for \(m \in \{1,2,\ldots ,h\}\), so we will assume that \(x,y\) have the same height in \(T_h\).

This illustrates how re-randomization (applying a distribution over random isometries to one side of a tree) can be used to achieve \(O(1)\) distortion for embedding \(T_h\) into \(\ell _1^{O(h)}\). Unfortunately, the arguments become significantly more delicate when we handle less uniform trees. The full-blown re-randomization argument occurs in Sect. 5.

**Scale Selection**. The first step beyond complete binary trees would be in passing to complete \(d\)-ary trees for \(d \ge 3\). The same construction as above works, but now one has to choose \(\kappa \asymp \log d\). Unfortunately, if the degrees of our tree are not uniform, we have to adopt a significantly more delicate strategy. It is natural to choose a single number \(\kappa (e) \in {\mathbb {N}}\) for every edge \(e \in E\), and then put \(\lambda (e) \in \frac{1}{\kappa (e)} \{0,1\}^{\kappa (e)}\) (this ensures that the analogue of the embedding \(\tau _{\lambda }\) specified in (2) is 1-Lipschitz).

In fact, there are examples which show that it is impossible to choose \(\kappa (u,v)\) to depend only on the geometry of the subtree rooted at \(u\). These “scale selector” values have to look at the global geometry, and in particular have to encode the volume growth of the tree at many scales simultaneously. Our eventual scale selector is fairly sophisticated and impossible to describe without delving significantly into the details of the proof. For our purposes, we need to consider more general embeddings of type (1). In particular, the coordinates of our labels \(\lambda (e) \in {\mathbb {R}}^k\) will take a range of different values, not simply a single value as for complete trees.

We do try to maintain one important, related invariant: If \(P_v\) is the sequence of edges from the root to some vertex \(v\), then ideally for every coordinate \(i \in \{1,2,\ldots ,k\}\) and every value \(j \in {\mathbb {Z}}\), there will be at most one \(e \in P_v\) for which \(\lambda (e)_i \in [2^j, 2^{j+1})\). Thus instead of every coordinate being “touched” at most once on the path from the root to \(v\), every coordinate is touched at most once *at every scale* along every such path. This ensures that various scales do not interact. For technical reasons, this property is not maintained exactly, but analogous concepts arise frequently in the proof.

The restricted class of embeddings we use, along with a discussion of the invariants we maintain, are introduced in Sect. 3.2. The actual scale selectors are defined in Sect. 4.

**Controlling the Topology**. One of the properties that we used above for complete \(d\)-ary trees is that the depth of such a tree is \(O(\log _d n)\), where \(n\) is the number of nodes in the tree. This allowed us to concatenate vectors down a root–leaf path without exceeding our desired \(O(\log n)\) dimension bound. Of course, for general trees, no similar property need hold. However, there is still a bound on the *topological* depth of any \(n\)-node tree.

To explain this, let \(T=(V,E)\) be a tree with root \(r\), and define a *monotone coloring of* \(T\) to be a mapping \(\chi : E \rightarrow {\mathbb {N}}\) such that for every \(c \in {\mathbb {N}}\), the color class \(\chi ^{-1}(c)\) is a connected subset of some root–leaf path. Such colorings were used in previous works on embedding trees into Hilbert spaces [7, 11, 12], as well as for preivous low-dimensional embeddings into \(\ell _1\) [5]. The following lemma is well-known and elementary.

**Lemma 1.2**

Every connected \(n\)-vertex rooted tree \(T\) admits a monotone coloring such that every root–leaf path in \(T\) contains at most \(1+\log _2 n\) colors.

*Proof*

For an edge \(e \in E(T)\), let \(\ell (e)\) denote the number of leaves beneath \(e\) in \(T\) (including, possibly, an endpoint of \(e\)). Letting \(\ell (T) = \max _{e \in E} \ell (e)\), we will prove that for \(\ell (T) \ge 1\), there exists a monotone coloring with at most \(1+\log _2 (\ell (T)) \le 1+\log _2 n\) colors on any root–leaf path.

Suppose that \(r\) is the root of \(T\). For an edge \(e\), let \(T_e\) be the subtree beneath \(e\), including the edge \(e\) itself. If \(r\) is the endpoint of edges \(e_1, e_2, \ldots , e_k\), we may color the edges of \(T_{e_1}, T_{e_2}, \ldots , T_{e_k}\) separately, since any monotone path is contained completely within exactly one of these subtrees. Thus we may assume that \(r\) is the endpoint of only one edge \(e_1\), and then \(\ell (T)=\ell (e_1)\).

Choose a leaf \(x\) in \(T\) such that each connected component of \(T'\) of \(T \setminus E(P_{rx})\) has \(\ell (T') \le \ell (e_1)/2\) (this is easy to do by, e.g., ordering the leaves from left to right in a planar drawing of \(T\)). Color the edges \(E(P_{rx})\) with color 1, and inductively color each non-trivial connected component \(T'\) with disjoint sets of colors from \({\mathbb {N}} \setminus \{1\}\). By induction, the maximum number of colors appearing on a root–leaf path in \(T\) is at most \(1+\log _2(\ell (e_1)/2) = 1+\log _2(\ell (T))\), completing the proof. \(\square \)

Instead of dealing directly with edges in our actual embedding, we will deal with color classes. This poses a number of difficulties, and one major difficulty involving vertices which occur in the middle of such classes. For dealing with these vertices, we will first preprocess our tree by embedding it into a product of a small number of new trees, each of which admits colorings of a special type. This is carried out in Sect. 3.1.

## 2 Warm-Up: Embedding Complete \(k\)-ary Trees

We first prove our main result for the special case of complete \(k\)-ary trees, with an improved dependence on \(\varepsilon \). The main novelty is our use of the Lovász Local Lemma to analyze a simple random embedding of such trees into \(\ell _1\). The proof illustrates the tradeoff between concentration and the sizes of the sets \(\{ \{u,v\} \subseteq V : d_T(u,v)=j \}\) for each \(j=1,2,\ldots \)

**Theorem 2.1**

Let \(T_{k,h}\) be the unweighted, complete \(k\)-ary tree of height \(h\). For every \(\varepsilon > 0\), there exists a \((1+\varepsilon )\)-embedding of \(T_{k,h}\) into \(\ell _1^{O((h \log k)/\varepsilon ^2)}\).

In the next section, we introduce our random embedding and analyze the success probability for a single pair of vertices based on their distance. Then in Sect. 2.2, we show that with non-zero probability, the construction succeeds for all vertices. In the coming sections and later, in the proof of our main theorem, we will employ the following concentration inequality [14].

**Theorem 2.2**

### 2.1 A Single Event

**Observation 2.3**

For any \(v\in V\) and \(u\in V(P_v)\), we have \(d_T(u,v)= \Vert g(u)-g(v)\Vert _1\).

For \(m,n\in {\mathbb {N}}\), and \(A\in {\mathbb {R}}^{m\times n}\), we use the notation \(A[i] \in {\mathbb {R}}^n\) to refer to the \(i\)th row of \(A\). We now bound the probability that a given pair of vertices experiences a large contraction.

**Lemma 2.4**

*Proof*

### 2.2 The Local Lemma Argument

We first give the statement of the Lovász Local Lemma [6] and then use it in conjunction with Lemma 2.4 to complete the proof of Theorem 2.1.

**Theorem 2.5**

*Proof of Theorem 2.1*

## 3 Colors and Scales

In the present section, we develop some tools for our eventual embedding. The proof of our main theorem appears in the next section, but relies on a key theorem which is only proved in Sect. 5.

### 3.1 Monotone Colorings

*monotone coloring*is a mapping \(\chi : E \rightarrow {\mathbb {N}}\) such that each color class \(\chi ^{-1}(c) = \{ e \in E : \chi (e) = c \}\) is a connected subset of some root–leaf path. For a set of edges \(S \subseteq E\), we write \(\chi (S)\) for the set of colors occurring in \(S\). We define the

*multiplicity of*\(\chi \) by

**Theorem 3.1**

The problem one now confronts is whether the loss in the \(\rho _{\chi }(x,y; \delta )\) term can be tolerated. In general, we do not have a way to do this, so we first embed our tree into a product of a small number of trees in a way that allows us to control the corresponding \(\rho \)-terms.

**Lemma 3.2**

- (a)We have$$\begin{aligned} \frac{1}{k} \sum _{i=1}^k d_{T_i}(f_i(x),f_i(y)) \ge (1-\varepsilon )\,d_T(x,y). \end{aligned}$$(20)
- (b)For all \(i \in [k]\), we have$$\begin{aligned} d_{T_i}(f_i(x),f_i(y)) \le (1+\varepsilon )\,d_T(x,y). \end{aligned}$$(21)
- (c)There exists a number \(j \in [k]\) such that$$\begin{aligned} \varepsilon \,d_T(x,y)\ge \frac{2^{-(k+1)}}{k} \mathop {\sum _{i=1}^k}_{i \ne j} \rho _{\chi _i}(f_i(x),f_i(y);2^{-(k+1)}). \end{aligned}$$(22)

Using Lemma 3.2 in conjunction with Theorem 3.1, we can now prove the main theorem (Theorem 1.1).

*Proof of Theorem 1.1*

Let \(\varepsilon > 0\) be given, let \(T=(V,E)\) be an \(n\)-vertex metric tree. Let \(\chi : E \rightarrow {\mathbb {N}}\) be a monotone coloring with \(M(\chi ) \le O(\log n)\), which exists by Lemma 1.2. Apply Lemma 3.2 to obtain metric trees \(T_1, \ldots , T_k\) with corresponding monotone colorings \(\chi _1, \ldots , \chi _k\) and mappings \(f_i : V \rightarrow V(T_i)\). Observe that \(M(\chi _i) \le O(\log n)\) for each \(i \in [k]\).

First, observe that each \(F_i\) is \(1\)-Lipschitz (Theorem 3.1). In conjunction with condition (b) of Lemma 3.2 which says that \(\Vert f_i\Vert _{\mathrm {Lip}} \le 1+\varepsilon \) for each \(i \in [k]\), we have \(\Vert F\Vert _{\mathrm {Lip}} \le 1+\varepsilon \).

We now move on to the proof of Lemma 3.2. We begin by proving an analogous statement for the half line \([0,\infty )\). An \({\mathbb {R}}\) *-star* is a metric space formed as follows: Given a sequence \(\{a_i\}_{i=1}^{\infty }\) of positive numbers, one takes the disjoint union of the intervals \(\{[0,a_1], [0,a_2], \ldots \}\), and then identifies the 0 point in each, which is canonically called the *root of the* \({\mathbb {R}}\) *-star.* An \({\mathbb {R}}\)-star \(S\) carries the natural induced length metric \(d_S\). We refer to the associated intervals as *branches*, and the *length of a branch* is the associated number \(a_i\). Finally, if \(S\) is an \({\mathbb {R}}\)-star, and \(x \in S \setminus \{0\}\), we use \(\ell (x)\) to denote the length of the branch containing \(x\). We put \(\ell (0)=0\).

**Lemma 3.3**

- (i)
For each \(i \in [k], f_i(0)\) is the root of \(S_i\).

- (ii)
For all \(x,y \in [0,\infty )\), \(\frac{1}{k} \sum _{i=1}^k d_{S_i}(f_i(x),f_i(y)) \ge (1-\frac{7}{k}) |x-y|.\)

- (iii)
For each \(i \in [k], f_i\) is \((1+2^{-k+1})\)-Lipschitz.

- (iv)
For \(x \in [0,\infty )\), we have \(\ell (f_i(x)) \le 2^{k-1} x.\)

- (v)For \(x\in [0,\infty )\), there are at most two values of \(i \in [k]\) such that$$\begin{aligned} d_{S_i}(f_i(0),f_i(x)) \le 2^{-k} \,\ell (f_i(x)). \end{aligned}$$
- (vi)For all \(x,y \in [0,\infty )\), there is at most one value of \(i \in [k]\) such that \(f_i(x)\) and \(f_i(y)\) are in different branches of \(S_i\) and$$\begin{aligned} 2^{-k} \big (\ell (f_i(x)) + \ell (f_i(y))\big ) > 2\, |x-y|. \end{aligned}$$

*Proof*

Assume that \(k \ge 2\). We first construct \({\mathbb {R}}\)-stars \(S_1, \ldots , S_k\). We will index the branches of each star by \({\mathbb {Z}}\). For \(i \in [k], S_i\) is a star whose \(j\)th branch, for \(j \in {\mathbb {Z}}\), has length \(2^{i-1+k(j+1)}\). We will use the notation \((i,j,d)\) to denote the point at distance \(d\) from the root on the \(j\)th branch of \(S_i\). Observe that \((i,j,0)\) and \((i,j',0)\) describe the same point (the root of \(S_i\)) for all \(j,j' \in {\mathbb {N}}\).

To verify condition (v), note that for \(x\in [0,\infty )\), the inequality \(d_{S_i}(f_i(x),f_i(0))\le x/2\) can only hold for \(i {\text { mod }}k \in \{\lfloor \log _2 x\rfloor , \lfloor \log _2 x \rfloor + 1\}\), hence condition (iv) implies condition (v).

Finally, we move onto the proof of Lemma 3.2.

*Proof of Lemma 3.2*

Applying the inductive hypothesis to \(T'\) and \(\chi |_{E(T')}\) yields metric trees \(T_1', T_2', \ldots ,\) \( T_k'\) with colorings \(\chi _i' : E(T_i') \rightarrow {\mathbb {N}}\) and mappings \(f'_i : V(T') \rightarrow V(T_i')\).

### 3.2 Multi-scale Embeddings

We now present the basics of our multi-scale embedding approach. The next lemma is devoted to combining scales together without using too many dimensions, while controlling the distortion of the resulting map.

**Lemma 3.4**

*Proof*

In Sect. 5, we will require the following straightforward corollary.

**Corollary 3.5**

## 4 Scale Assignment

Let \(T=(V,E)\) be a metric tree with root \(r \in V\), equipped with a monotone coloring \(\chi : E \rightarrow {\mathbb {N}}\). We will now describe a way of assigning “scales” to the vertices of \(T\). These scale values will be used in Sect. 5 to guide our eventual embedding. The scales of a vertex will describe, roughly, the subset and magnitude of coordinates that should differ between the vertex and its neighbors. First, we fix some notation.

For every \(c \in \chi (E)\), we use \(\gamma _c\) to denote the path in \(T\) colored \(c\), and we use \(v_c\) to denote the vertex of \(\gamma _c\) which is closest to the root. We will also use the notation \(T(c)\) to denote the subtree of \(T\) under the color \(c\); formally, \(T(c)\) is the induced (rooted) subtree on \(\{v_c\} \cup V(T_u)\) where \(u \in V\) is the child of \(v_c\) such that \(\chi (v_c,u)=c\), and \(T_u\) is the subtree rooted at \(u\).

We will write \(p(v)\) for the parent of a vertex \(v \in V\), and \(p(r)=r\). Furthermore, we define the “parent color” of a color class by \(\rho (c) = \chi (v_c, p(v_c))\) with the convention that \(\chi (r,r)=c_0\), where \(c_0 \in {\mathbb {N}} \setminus \chi (E)\) is some fixed element. Finally, we put \(T(c_0)=T\).

### 4.1 Scale Selectors

**Observation 4.1**

For \(v\in V\) and \(k\ge \big \lfloor \log _2\big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \), if \(\tau _k(v)=0\) then for all \(i\ge k, \tau _i(v)=0\).

Comparing part (A) of (34) for \(\tau _i(v)\) and \(\tau _{i+1}(v)\) also allows us to observe the following.

**Observation 4.2**

For \(v\in V\) and \(k\ge \big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \), if part (A) in (34) for \(\tau _k(v)\) is less than or equal to part (B) then for all \(i> k, \tau _i(v)=0\).

### 4.2 Properties of the Scale Selector Maps

We now prove some key properties of the maps \(\kappa , \varphi \), and \(\{\tau _i\}\).

**Lemma 4.3**

For every vertex \(v\in V\) with \(c={\chi }(v,p(v))\), the following holds. For all \({i}\in {\mathbb {Z}}\) with \({{d_T(v,v_c)}\over \kappa (c)} \le 2^{i-1}\), we have \(\tau _i(v)=0.\)

*Proof*

If \(\tau _k(v)=0\), then by Observation 4.1, for all \(i\ge k, \tau _i(v)=0\).

The next lemma shows how the values \(\{\tau _i(v)\}\) track the distance from \(v_c\) to \(v\).

**Lemma 4.4**

*Proof*

The following lemma shows that for any color \(c\in \chi (E)\) the value of \(\tau _i\) does not decrease as we move further from \(v_c\) in \(\gamma _c\).

**Lemma 4.5**

*Proof*

For \(i>k\), by Observation 4.2 we have, \(\tau _i(w)=0\). Therefore, for \(i>k\), we have \(\tau _i(u)\ge \tau _i(w)\). We now use induction on \(i\) to show that for \(i< k, \tau _i(u)=\tau _i(w)\), and for \(i=k, \tau _k(u)\ge \tau _k(w)\). Recall that, for \(i<\big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \), we have \(\tau _i(w)=\tau _i(u)=0\), which gives us the base case of the induction.

The next lemma bounds the distance between two vertices in the graph based on \(\{\tau _i\}\).

**Lemma 4.6**

*Proof*

The next lemma and the following two corollaries bound the number of colors \(c\) in the tree which have a small value of \(\varphi (c)\).

**Lemma 4.7**

*Proof*

We start the proof by comparing the size of the subtrees \(T(c')\) and \(T(c)\) for \(c'\in {\chi }(E(T(c)))\).

The following two corollaries are immediate from Lemma 4.7.

**Corollary 4.8**

**Corollary 4.9**

The next lemma is similar to Lemma 4.6. The assumption is more general, and the conclusion is correspondingly weaker. This result is used primarily to enable the proof of Lemma 4.11.

**Lemma 4.10**

*Proof*

In Sect. 5, we give the description of our embedding and analyze its distortion. In the analysis of the embedding, for a given pair of vertices \(x,y\in V\), we divide the path between \(x\) and \(y\) into subpaths and for each subpath we show that either the contribution of that subpath to the distance between \(x\) and \(y\) in the embedding is “large” through a concentration of measure argument, or we use the following lemma to show that the length of the subpath is “small,” compared to the distance between \(x\) and \(y\). The complete argument is somewhat more delicate and one can find the details of how Lemma 4.11 is used in the proof of Lemma 5.15.

**Lemma 4.11**

*Proof*

Let \(r'=v_c\), and let \(c_1,\ldots , c_m\) be the set of colors that appear on the path \(P_{vr'}\) in order from \(v\) to \(r'\), and put \(c_{m+1}={\chi }(r',p(r'))\). We define \(y_0=v\), and for \(i\in [m], y_i=v_{c_i}\). Note that \(\{y_0,\ldots , y_m\}=\{v\}\cup \{v_a:a\in \chi (E(P_{v \,v_c}))\}\), and for \(i\le m, \chi (y_i,p(y_i))=c_{i+1}\). We give a constructive proof for the lemma.

For \(i\in {\mathbb {N}}\), we construct a sequence \((a_i,b_i)\in {{\mathbb {N}}}\times {\mathbb {N}}\), the idea being that \(P_{y_{a_i},y_{b_i}}\) is a nonempty subpath \(P_{vr'}\) such that for different values of \(i\), these subpaths are edge disjoint. At each step of the construction either we can use \((a_i,b_i)\) to find \(u\) and \(u'\) such that they satisfy the properties of this lemma, or we find \((a_{i+1},b_{i+1})\) such that \(b_{i+1}<b_i\). The last condition guarantees that we can always find \(u\) and \(u'\) that satisfy the conditions of this lemma.

- (i)
\(\varphi (c_{b_i+1})-\varphi (c_{a_i+1})\ge \varphi (c_{a_i+1} )-\varphi (\chi (r',p(r')))\);

- (ii)
\(d_T(y_{b_i},v) \ge \varepsilon d_T(y_{b_i},y_{a_i})\);

- (iii)
\(a_i>b_i\).

**Case I**: \(\varphi (c_{j+2})-\varphi (c_{b_i+1})\ge 2(\varphi (c_{b_i+1})-\varphi (c_{a_{i}+1})).\)

**Case II**: \(\varphi (c_{j+2})-\varphi (c_{b_i+1})< 2(\varphi (c_{b_i+1})-\varphi (c_{a_{i}+1}))\) and \(\varphi (c_{j+1})-\varphi (c_{b_i+1})\ge 6(\varphi (c_{b_i+1})-\varphi (c_{a_i+1}))\).

**Case III**: \(\varphi (c_{j+1})-\varphi (c_{b_i+1})< 6(\varphi (c_{b_i+1})-\varphi (c_{a_i+1}))\).

## 5 The Embedding

We now present a proof of Theorem 3.1, thereby completing the proof of Theorem 1.1. We first introduce a random embedding of the tree \(T\) into \(\ell _1\), and then show that, for a suitable choice of parameters, with non-zero probability our construction satisfies the conditions of the theorem.

**Notation**: We use the notations and definitions introduced in Sect. 4. Moreover, in this section, for \(c\in \chi (E)\cup \{{\chi }(r,p(r))\}\), we use \(\rho ^{-1}(c)\) to denote the set of colors \(c'\in \chi (E)\) such that \(\rho (c')=c\), i.e. the colors of the “children” of \(c\). For \(m,n\in {\mathbb {N}}\), and \(A\in {\mathbb {R}}^{m\times n}\), we use the notation \(A[i]\) to refer to the \(i\)th row of \(A\) and \(A[i,j]\) to refer to the \(j\)th element in the \(i\)th row.

### 5.1 The Construction

Now, we present some key properties of the map \(\Delta _i(v)\). The following two observations follow immediately from the definitions.

**Observation 5.1**

For \(v\in V\) and \(i\in {\mathbb {Z}}\), each row in \(\Delta _i(v)\) has at most one non-zero coordinate.

**Observation 5.2**

Proofs of the next four lemmas will be presented in Sect. 5.2.

**Lemma 5.3**

For \(v\in V\), there is at most one \(i\in {\mathbb {Z}}\) and at most one couple \((j,k)\in [m]\times [t]\) such that \(\Delta _i(v)[j,k]\notin \{0,{2^i\over t^2}\}\).

**Lemma 5.4**

**Lemma 5.5**

**Lemma 5.6**

For \(c\in \chi (E), u,w\in V(\gamma _c)\setminus \{v_c\}, i> j\) and \(k \in [m]\), if both \(\Vert \Delta _{i}(u)[k]-\Delta _i(w)[k]\Vert _1\ne 0\), and \(\Vert \Delta _j(u)[k]-\Delta _j(w)[k]\Vert _1\ne 0\), then \(d_T(u,w)\ge {2^{j-1}}\).

**Re-randomization**. For \(t\in {\mathbb {N}}\), let \(\pi _t:{\mathbb {R}}^t\rightarrow {\mathbb {R}}^t\) be a random mapping obtained by uniformly permuting the coordinates in \({\mathbb {R}}^t\). Let \(\{\sigma _i\}_{i \in [m]}\) be a sequence of i.i.d. random variables with the same distribution as \(\pi _t\). We define the random variable \(\pi _{t,m}:{\mathbb {R}}^{m\times t} \rightarrow {\mathbb {R}}^{m\times t}\) as follows:

**The Construction**. We now use re-randomization to construct our final embedding. For \(c\in \chi (E)\), and \(i\in {\mathbb {Z}}\), the map \(f_{i,c}: V(T({c}))\rightarrow {\mathbb {R}}^{m\times t}\) will represent an embedding of the subtree \(T({c})\) at scale \(2^i/t^2\). Recall that

**Lemma 5.7**

We will prove Lemma 5.7 in Sect. 5.3. We first make two observations, and then use them to prove Theorem 3.1. Our first observation is immediate from Observations 5.1 and 5.2, since in the third case of (53), by Observation 5.2 \(,\Delta _i(v_c')\) and \(\Pi _{i,c'}( f_{i,c'}(x))\) must be supported on disjoint sets of rows.

**Observation 5.8**

For any \(v\in V\) and for any row \(j\in [m]\), there is at most one non-zero coordinate in \(f_i(v)[j]\).

Observation 5.2 and Lemma 5.5 also imply the following.

**Observation 5.9**

Using these, together with Corollary 3.5, we now prove Theorem 3.1.

*Proof of Theorem 3.1*

We will now split the rest of the proof into two cases.

**Case 1**: \(\tau _{m_w-1}(u)=0.\)

**Case 2**: \(\tau _{m_w-1}(u)\ne 0.\)

Suppose that there exists \(k\in [m]\) such that \(\Vert (g_i(u)-g_i(w))[k]\Vert _1\ne 0\). Now, we divide the proof into two cases again.

**Case 2.1**: There exists a \(j<i\) such that \(\Vert (g_j(x)\!-\!g_j(u))[k]\Vert _1\!+\!\Vert (g_j(y)-g_j(w))[k]\Vert _1\ne 0.\)

**Case 2.2**: \(\Vert (g_j(x)-g_j(u))[k]\Vert _1+\Vert (g_j(y)-g_j(w))[k]\Vert _1= 0\) for all \(j<i\).

### 5.2 Properties of the \(\Delta _i\) Maps

We now present proofs of Lemmas 5.3–5.6.

*Proof of Lemma 5.3*

For a fixed \(i\in {\mathbb {Z}}\), by (50) there is at most one element in \(\Delta _i(v)\) that takes a value other than \(\{0,{2^i\over t^2}\}\).

*Proof of Lemma 5.4*

For \(i\!\!<\!\! \big \lfloor \log _2\big (\frac{m(T)}{M(\chi ){+}\log _2 |E|}\big )\big \rfloor \) we have \(\Vert \Delta _k(u)\Vert {=}\Vert \Delta _k(w)\Vert _1{=}0\).

Let \(\nu \) be the minimum integer greater than \(\big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor -1\) such that part (A) of (34) for \(\tau _\nu (w)\) is less that or equal to part (B). This \(\nu \) exists since, by (35), part (B) of (34) is always positive, while by Lemma 4.3, part (A) of (34) must be zero for some \(\nu \in \mathbb {Z}\) large enough. First we analyze the case when \(i<\nu \).

*Proof of Lemma 5.5*

*Proof of Lemma 5.6*

### 5.3 The Probabilistic Analysis

We are thus left to prove Lemma 5.7. For \(c\in \chi (E)\), we analyze the embedding for \(T(c)\) by going through all \(c'\in \chi (E(T(c)))\) one by one in increasing order of \(\varphi (c')\). Our first lemma bounds the probability of a bad event, i.e. of a subpath not contributing enough to the distance in the embedding.

**Lemma 5.10**

*Proof*

**The**\(\Gamma _a\)

**Mappings**. Before proving Lemma 5.7, we need some more definitions. For a color \(a\in \chi (E)\), we define a map \(\Gamma _{a}:{V(T(a))}\rightarrow V(T(a))\) based on Lemma 5.10. For \(u\in V(\gamma _a)\), we put \(\Gamma _a(u)=u\). For all other vertices \(u\in V(T(a))\setminus V(\gamma _a)\), there exists a unique color \(b\in \rho ^{-1}(a)\) such that \(u\in V(T(b))\). We define \(\Gamma _{a}(u)\) as the vertex \(w \in V(P_{u v_b})\) which is closest to the root among those vertices satisfying the following condition: For all \(v \in V(P_{u w}) \setminus \{w\}\) and \(k \in {\mathbb {Z}}, \tau _k(v)\ne 0\) implies

**Lemma 5.11**

Consider any \(a\in \chi (E)\) and \(u\in V(T(a))\) such that \(\Gamma _{a}(u)\ne u\). Then we have \(\Gamma _{a}(u)=v_{c}\) for some \(c\in \chi (E(P_{u v_a}))\setminus \{a\}\).

*Proof*

Let \(w\in V(P_{u\,\Gamma _a(u)})\) be such that \(\Gamma _a(u)=p(w)\). The vertex \(w\) always exists because \(\Gamma _a(u)\in V(P_u)\setminus \{u\}\). If \(\chi (w,\Gamma _a(u))\ne \chi (\Gamma _a(u),p(\Gamma _a(u)))\) then \(\Gamma _a(u)\) is \(v_{c}\) for some \(c\in \chi (E(P_{u\, v_a}))\setminus \{a\}\).

**Lemma 5.12**

Suppose that \(a\in \chi (E)\) and \(u\in V(T(a))\). For any \(w\in V(P_{u\,\Gamma _{a}(u)})\) such that \({\chi }(u,p(u))={\chi }(w,p(w))\) we have \(\Gamma _{a}(w)\in V(P_{u\,\Gamma _{a}(u)})\).

*Proof*

**Defining Representatives for** \(\gamma _c\). Now, for each \(c\in \chi (E)\), we define a small set of representatives for vertices in \(\gamma _c\). Later, we use these sets to bound the contraction of pairs of vertices that have one endpoint in \(\gamma _c\).

**Lemma 5.13**

*Proof*

The following lemma, in conjunction with Lemma 5.13, reduces the number of vertices in \(V(\gamma _c)\) that we need to analyze using Lemma 5.10.

**Lemma 5.14**

Let \((X,d)\) be a pseudometric, and let \(f:V\rightarrow X\) be a \(1\)-Lipschitz map. For \(x,y\in V\), and \(x',y'\in V(P_{xy})\) and \(h\ge 0\), if \(d(f(x),f(y))\ge d_T(x,y)-h\) then \(d(f(x'),f(y'))\ge d_T(x',y')-h\).

*Proof*

The following lemma constitutes the inductive step of the proof of Lemma 5.7.

**Lemma 5.15**

*Proof*

Write \(\chi (E(T(c))) = \{c_1, c_2, \ldots , c_n\}\), where the colors are ordered so that \(\varphi (c_j) \le \varphi (c_{j+1})\) for \(j=1,2,\ldots ,n-1\). Let \(\varepsilon _1=24 \varepsilon \), where the constant \(24\) comes from Lemma 5.10. And let \(\varepsilon _2=2 \cdot C' \varepsilon \), where \(C'\) is the constant from Lemma 4.11.

**Proof of**(90). Suppose that \(X_1,\ldots ,X_{i-1}\) and \(Y_i\) hold. We will show that \(X_i\) holds as well. First note that for all vertices in \(x,y\in V(\gamma _{c_i})\), by Lemma 5.5 and the definition of \(f_{k,c_i}\) (53), we have

**Case I**: \(x\in V(\gamma _{c_i})\) with \(x\ne \Gamma _c(x)\), and \(y\in V(\gamma _{c_j})\) for some \(j< i\), and \(\mu (x,y)=\varphi (c)\).

Since \(z \in R_c(c_i)\), by definition we have \(\Gamma _c(z)\ne z\), therefore by Lemma 5.11, \(\Gamma _c(z)=v_{c'}\) for some color \(c'\in \chi (P_{z\,v_c})\setminus \{c\}\). The function \(\varphi \) is non-decreasing along any root–leaf path, hence \(\chi (\Gamma _c(z),p(\Gamma _c(z)))= c_{\ell }\) for some \(\ell <i\).

**Case II**: \(x\in V(\gamma _{c_i})\) with \(x= \Gamma _c(x)\), and \(y\in V(\gamma _{c_j})\) for some \(j<i\), and \(\mu (x,y)= \varphi (c)\).

**Proof of** (91). We prove this inequality by first bounding the probability that (92) holds for a fixed \(x\) and all \(y\in V(\gamma _{c_j})\) (for a fixed \(j\in \{1,\ldots , i-1\}\)) with \(\mu (x,y)=\varphi (c)\). Then we use a union bound to complete the proof.

Finally, we present the proof of Lemma 5.7.

*Proof of Lemma 5.7*

## Footnotes

- 1.
The original bound proved in [5] grew like \(\log ^3 n\), but this was improved using an observation of A. Gupta.

- 2.
Asked at the DIMACS Workshop on Discrete Metric spaces and their Algorithmic Applications (2003). The question was certainly known to others before 2003, and was asked to the first-named author by Assaf Naor earlier that year.

## Notes

### Acknowledgments

This research was partially supported by NSF Grants CCF-0644037, CCF-0915251, and a Sloan Research Fellowship. A significant portion of this work was completed during a visit of the authors to the Institut Henri Poincaré.

## References

- 1.Andoni, A., Charikar, M., Neiman, O., Nguyen, H.L.: Near linear lower bound for dimension reduction in l1. In: FOCS, pp. 315–323 (2011)Google Scholar
- 2.Brinkman, B., Charikar, M.: On the impossibility of dimension reduction in \(\ell _1\). J. ACM
**52**(5), 766–788 (2005)MathSciNetCrossRefGoogle Scholar - 3.Bourgain, J., Lindenstrauss, J., Milman, V.: Approximation of zonoids by zonotopes. Acta Math.
**162**(1–2), 73–141 (1989)MathSciNetCrossRefMATHGoogle Scholar - 4.Batson, J.D., Spielman, D.A., Srivastava, N.: Twice-Ramanujan sparsifiers. SIAM J. Comput.
**41**(6), 1704–1721 (2012)MathSciNetCrossRefMATHGoogle Scholar - 5.Charikar, M., Sahai, A.: Dimension reduction in the \(\ell _1\) norm. In: FOCS, pp. 551–560 (2002)Google Scholar
- 6.Erdős, P., Lovász, L.: Problems and results on 3-chromatic hypergraphs and some related questions. In: Infinite and Finite Sets (Colloq., Keszthely, 1973; dedicated to P. Erdős on his 60th birthday), vol. II, pp. 609–627. Colloquium Mathematical Society János Bolyai, vol. 10. North-Holland, Amsterdam (1975)Google Scholar
- 7.Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: FOCS, pp. 534–543 (2003)Google Scholar
- 8.Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Conference in Modern Analysis and Probability (New Haven, CT, 1982), Contemporary Mathematics, vol. 26, pp. 189–206. American Mathematical Society, Providence (1984)Google Scholar
- 9.Lee, J.R., de Mesmay, A., Moharrami, M.: Dimension reduction for finite trees in \(l_{1}\). In: SODA, pp. 43–50 (2012)Google Scholar
- 10.Lee, J.R., Naor, A.: Embedding the diamond graph in \(L_{p}\) and dimension reduction in \(L_{1}\). Geom. Funct. Anal.
**14**(4), 745–747 (2004)MathSciNetCrossRefMATHGoogle Scholar - 11.Lee, J.R., Naor, A., Peres, Y.: Trees and Markov convexity. Geom. Funct. Anal.
**18**(5), 1609–1659 (2009)MathSciNetCrossRefMATHGoogle Scholar - 12.Matoušek, J.: On embedding trees into uniformly convex Banach spaces. Israel J. Math.
**114**, 221–237 (1999)MathSciNetCrossRefMATHGoogle Scholar - 13.Matoušek, J.: Open problems on embeddings of finite metric spaces. http://kam.mff.cuni.cz/matousek/haifaop.ps (2002)
- 14.McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics, Algorithms and Combinations, vol. 16, pp. 195–248. Springer, Berlin (1998)CrossRefGoogle Scholar
- 15.Newman, I., Rabinovich, Y.: On cut dimension of \(\ell _1\) metrics and volumes, and related sparsification techniques. CoRR. http://CoRR/abs/1002.3541 (2010)
- 16.Regev, O.: Entropy-based bounds on dimension reduction in \(L_1\). Israel J. Math. http://arxiv/abs/1108.1283 (2011)
- 17.Schechtman, G.: More on embedding subspaces of \(L_p\) in \(l^{n}_{r}\). Compos. Math.
**61**(2), 159–169 (1987)MathSciNetMATHGoogle Scholar - 18.Schulman, L.J.: Coding for interactive communication. IEEE Trans. Inform. Theory 42(6, part 1), 1745–1756 (1996). Codes and complexityGoogle Scholar
- 19.Talagrand, M.: Embedding subspaces of \(L_1\) into \(l^N_1\). Proc. Am. Math. Soc.
**108**(2), 363–369 (1990)MathSciNetMATHGoogle Scholar