1 Introduction

Let \(T=(V,E)\) be a finite, connected, undirected tree, equipped with a length function on edges, \(\mathsf{{len}}\) \(: E \rightarrow [0,\infty )\). This induces a shortest-path pseudometric,

$$\begin{aligned} d_T(u,v) = \text {length of the shortest }{u {--}v} \text { path in }T. \end{aligned}$$

(This is a pseudometric because we may have \(d(u, v)=0\) even for distinct \(u, v \in V\).) Such a metric space \((V,d_T)\) is called a finite tree metric.

Given two metric spaces \((X,d_X)\) and \((Y,d_Y)\), and a mapping \(f : X \rightarrow Y\), we define the Lipschitz constant of \(f\) by

$$\begin{aligned} \Vert f\Vert _{\mathrm {Lip}} = \sup _{x \ne y \in X} \frac{d_Y(f(x),f(y))}{d_X(x,y)}. \end{aligned}$$

An \(L\) -Lipschitz map is one for which \(\Vert f\Vert _{\mathrm {Lip}} \le L\). One defines the distortion of the mapping \(f\) to be dist \((f) = \Vert f\Vert _{\mathrm {Lip}} \cdot \Vert f^{-1}\Vert _{\mathrm {Lip}}\), where the distortion is understood to be infinite when \(f\) is not injective. We say that \((X,d_X) D\)-embeds into \((Y,d_Y)\) if there is a mapping \(f : X \rightarrow Y\) with dist \((f) \le D\).

Using the notation \(\ell _1^k\) for the space \({\mathbb {R}}^k\) equipped with the \(\Vert \cdot \Vert _1\) norm, we study the following question: How large must \(k=k(n,\varepsilon )\) be so that every \(n\)-point tree metric \((1+\varepsilon )\)-embeds into \(\ell _1^k\)?

1.1 Dimension Reduction in \(\ell _1\)

A seminal result of Johnson and Lindenstrauss [8] implies that for every \(\varepsilon > 0\), every \(n\)-point subset \(X \subseteq \ell _2\) admits a \((1+\varepsilon )\)-distortion embedding into \(\ell _2^k\), with \(k = O\big (\frac{\log n}{\varepsilon ^2}\big )\). On the other hand, the known upper bounds for \(\ell _1\) are much weaker. Talagrand [19], following earlier results of Bourgain–Lindenstrauss–Milman [3] and Schechtman [17], showed that every \(n\)-dimensional subspace \(X \subseteq \ell _1\) (and, in particular, every \(n\)-point subset) admits a \((1+\varepsilon )\)-embedding into \(\ell _1^k\), with \(k=O\big (\frac{n \log n}{\varepsilon ^2}\big )\). For \(n\)-point subsets, this was very recently improved to \(k=O(n/\varepsilon ^2)\) by Newman and Rabinovich [15], using the spectral sparsification techniques of Batson et al. [4].

On the other hand, Brinkman and Charikar [2] showed that there exist \(n\)-point subsets \(X \subseteq \ell _1\) such that any \(D\)-embedding of \(X\) into \(\ell _1^k\) requires \(k \ge n^{\Omega (1/D^2)}\) (see also [10] for a simpler proof). Thus the exponential dimension reduction achievable in the \(\ell _2\) case cannot be matched for the \(\ell _1\) norm. More recently, it has been shown by Andoni et al. [1] that there exist \(n\)-point subsets such that any \((1+\varepsilon )\)-embedding requires dimension at least \(n^{1-O(1/\log (\varepsilon ^{-1}))}\). Regev [16] has given an elegant proof of both these lower bounds based on information theoretic arguments.

One can still ask about the possibility of more substantial dimension reduction for certain finite subsets of \(\ell _1\). Such a study was undertaken by Charikar and Sahai [5]. In particular, it is an elementary exercise to verify that every finite tree metric embeds isometrically into \(\ell _1\), thus the \(\ell _1\) dimension reduction question for trees becomes a prominent example of this type. It was shownFootnote 1 [5] that for every \(\varepsilon > 0\), every \(n\)-point tree metric \((1+\varepsilon )\)-embeds into \(\ell _1^k\) with \(k = O\big (\frac{\log ^2 n}{\varepsilon ^2}\big )\). It is quite natural to ask whether the dependence on \(n\) can be reduced to the natural volume lower bound of \(\Omega (\log n)\). Indeed, it is Question 3.6 in the list “Open problems on embeddings of finite metric spaces” maintained by Matoušek [13], asked by Gupta et al.Footnote 2 As noted there, the question was, surprisingly, even open for the complete binary tree on \(n\) vertices. The present paper resolves this question, achieving the volume lower bound for all finite trees.

Theorem 1.1

For every \(\varepsilon > 0\) and \(n \in \{1,2,3,\ldots \}\), the following holds. Every \(n\)-point tree metric admits a \((1+\varepsilon )\)-embedding into \(\ell _1^k\) with \(k = O\big ((\frac{1}{\varepsilon })^4 \log \frac{1}{\varepsilon } \log n\big )\). If the tree is a complete \(d\)-ary tree of some height, the bound improves to \(k = O\big ((\frac{1}{\varepsilon })^2 \log n\big )\).

The proof for the general case is presented in Sect. 3.1. The special case of complete \(d\)-ary trees is addressed in Sect. 2. We remark that the proof also yields a randomized polynomial-time algorithm to construct the embedding.

By simple volume arguments, the \(\Theta (\log n)\) factor is necessary. Regarding the dependence on \(\varepsilon \), it is known [9] that for complete binary trees, one must have \(k \ge \Omega \big (\frac{\log n}{\varepsilon ^2 \log (1/\varepsilon )}\big )\), showing that, for this special case, Theorem 1.1 is tight up to a \(\log (1/\varepsilon )\) factor.

1.2 Notation

For a graph \(G=(V,E)\), we use the notations \(V(G)\) and \(E(G)\) to denote the vertex and edge sets of \(G\), respectively. For a connected, rooted tree \(T=(V,E)\) and \(x,y \in V\), we use the notation \(P_{xy}\) for the unique path between \(x\) and \(y\) in \(T\), and \(P_{x}\) for \(P_{rx}\), where \(r\) is the root of \(T\).

We use \({\mathbb {N}}\) for the set of positive integers \(\{1,2,3,\ldots \}\). For \(k \in {\mathbb {N}}\), we write \([k] = \{1,2,\ldots ,k\}\). We also use the asymptotic notation \(A \lesssim B\) to denote that \(A = O(B)\), and \(A \asymp B\) to denote the conjunction of \(A \lesssim B\) and \(B \lesssim A\).

1.3 Proof Outline and Related Work

We first discuss the form that all our embeddings will take. Let \(T=(V,E)\) be a finite, connected tree, and fix a root \(r \in V\). For each \(v \in V\), recall that \(P_v\) denotes the unique simple path from \(r\) to \(v\). Given a labeling of edges by vectors \(\lambda : E \rightarrow {\mathbb {R}}^k\), we can define \(\varphi : V \rightarrow {\mathbb {R}}^k\) by

$$\begin{aligned} \varphi (x) = \sum _{e \in E(P_v)} \lambda (e). \end{aligned}$$
(1)

The difficulty now lies in choosing an appropriate labeling \(\lambda \). An easy observation is that if we have \(\Vert \lambda (e)\Vert _1 =\;\) \(\mathsf{{len}}\) \((e)\) for all \(e \in E\) and the set \(\{\lambda (e)\}_{e \in E}\) is orthogonal, then \(\varphi \) is an isometry. Of course, our goal is to use many fewer than \(|E|\) dimensions for the embedding. We next illustrate a major probabilistic technique employed in our approach.

Re-randomization. Consider an unweighted, complete binary tree of height \(h\). Denote the tree by \(T_h = (V_h, E_h)\), let \(n=2^{h+1}-1\) be the number of vertices, and let \(r\) denote the root of the tree. Let \(\kappa \in {\mathbb {N}}\) be some constant which we will choose momentarily. If we assign to every edge \(e \in E_h\), a label \(\lambda (e) \in {\mathbb {R}}^{\kappa }\), then there is a natural mapping \(\tau _{\lambda } : V_h \rightarrow \{0,1\}^{\kappa h}\) given by

$$\begin{aligned} \tau _{\lambda }(v) = (\lambda (e_1), \lambda (e_2), \ldots , \lambda (e_k), 0, 0, \ldots , 0), \end{aligned}$$
(2)

where \(E(P_v) = \{e_1, e_2, \ldots , e_k\},\) and the edges are labeled in order from the root to \(v\). Note that the preceding definition falls into the framework of (1), by extending each \(\lambda (e)\) to a \((\kappa h)\)-dimensional vector padded with zeros, but the specification here will be easier to work with presently.

If we choose the label map \(\lambda : E_h \rightarrow \{0,1\}^{\kappa }\) uniformly at random, the probability for the embedding \(\tau _{\lambda }\) specified in (2) to have \(O(1)\) distortion is at most exponentially small in \(n\). In fact, the probability for \(\tau _{\lambda }\) to be injective is already this small. This is because for two nodes \(u,v \in V_h\) which are the children of the same node \(w\), there is \(\Omega (1)\) probability that \(\tau _{\lambda }(u)=\tau _{\lambda }(v)\), and there are \(\Omega (n)\) such independent events. In Sect. 2, we show that a judicious application of the Lovász Local Lemma [6] can be used to show that \(\tau _{\lambda }\) has \(O(1)\) distortion with non-zero probability. In fact, we show that this approach can handle arbitrary \(k\)-ary complete trees, with distortion \(1+\varepsilon \). Unknown to us at the time of discovery, a closely related construction occurs in the context of tree codes for interactive communication [18].

Unfortunately, the use of the Local Lemma does not extend well to the more difficult setting of arbitrary trees. For the general case, we employ an idea of Schulman [18] based on re-randomization. To see the idea in our simple setting, consider \(T_h\) to be composed of a root \(r\), under which lie two copies of \(T_{h-1}\), which we call A and B, having roots \(r_\mathrm{A}\) and \(r_\mathrm{B}\), respectively.

The idea is to assume that, inductively, we already have a labeling \(\lambda _{h-1} : E_{h-1} \rightarrow \{0,1\}^{\kappa (h-1)}\) such that the corresponding map \(\tau _{\lambda _{h-1}}\) has \(O(1)\) distortion on \(T_{h-1}\). We will then construct a random labeling \(\lambda _h : E_h \rightarrow \{0,1\}^{\kappa }\) by using \(\lambda _{h-1}\) on the A-side, and \(\pi (\lambda _{h-1})\) on the B-side, where \(\pi \) randomly alters the labeling in such a way that \(\tau _{\pi (\lambda _{h-1})}\) is simply \(\tau _{\lambda _{h-1}}\) composed with a random isometry of \(\ell _1^{\kappa (h-1)}\). We will then argue that with positive probability (over the choice of \(\pi \)), \(\tau _{\lambda _h}\) has \(O(1)\) distortion,

Let \(\pi _1, \pi _2, \ldots , \pi _{h-1} : \{0,1\}^{\kappa } \rightarrow \{0,1\}^{\kappa }\) be i.i.d. random mappings, where the distribution of \(\pi _1\) is specified by

$$\begin{aligned} \pi _1(x_1, x_2, \ldots , x_{\kappa }) = \left( \rho _1(x_1), \rho _2(x_2), \ldots , \rho _{\kappa }(x_{\kappa })\right) , \end{aligned}$$

where each \(\rho _i\) is an independent uniformly random involution \(\{0,1\} \mapsto \{0,1\}\). To every edge \(e \in E_{h-1}\), we can assign a height \(\alpha (e) \in \{1,2,\ldots ,h-1\}\) which is its distance to the root. From a labeling \(\lambda : E_{h-1} \rightarrow \{0,1\}^{\kappa }\), we define a random labeling \(\pi (\lambda ) : E_{h-1} \rightarrow \{0,1\}^{\kappa }\) by

$$\begin{aligned} \pi (\lambda )(e) = \pi _{\alpha (e)} \circ \lambda . \end{aligned}$$

By a mild abuse of notation, we will consider \(\pi (\lambda ) : E(B) \rightarrow \{0,1\}^{\kappa }\).

Finally, given a labeling \(\lambda _{h-1} : E_{h-1} \rightarrow \{0,1\}^{\kappa }\), we construct a random labeling \(\lambda _h : E_h \rightarrow \{0,1\}^{\kappa }\) as follows:

$$\begin{aligned} \lambda _h(e) = \left\{ \begin{array}{l@{\quad }l} (0,0,\ldots ,0) &{} e=(r,r_A), \\ (1,1,\ldots ,1) &{} e=(r,r_B), \\ \lambda _{h-1}(e) &{} e \in E(A), \\ \pi (\lambda _{h-1})(e) &{} e \in E(B). \end{array}\right. \end{aligned}$$

By construction, the mappings \(\tau _{\lambda _h}|_{V(A) \cup \{r\}}\) and \(\tau _{\lambda _h}|_{V(B) \cup \{r\}}\) have the same distortion as \(\tau _{\lambda _{h-1}}\). In particular, it is easy to check that \(\tau _{\pi (\lambda _{h-1})}\) is simply \(\tau _{\lambda _{h-1}}\) composed with an isometry of \(\{0,1\}^{\kappa (h-1)}\).

Now consider some pair \(x \in V(A)\) and \(y \in V(B)\). It is simple to argue that it suffices to bound the distortion for pairs with \(m=d_{T_h}(r,x)=d_{T_h}(r,y)\) for \(m \in \{1,2,\ldots ,h\}\), so we will assume that \(x,y\) have the same height in \(T_h\).

Observe that \(\tau _{\lambda _h}(x)\) is fixed with respect to the randomness in \(\pi \), thus if we write \(v = \tau _{\lambda _h}(x) - \tau _{\lambda _h}(y)\), where subtraction is taken coordinate-wise, modulo 2, then \(v\) has the form

$$\begin{aligned} v \equiv \big (\underbrace{1,1,\ldots ,1}_{\kappa }, b_1, b_2, \ldots , b_{\kappa (m-1)}\big )\, \end{aligned}$$

where the \(\{b_i\}\) are i.i.d. uniform over \(\{0,1\}\). It is thus an easy consequence of Chernoff bounds that, with probability at least \(1- e^{-m\kappa /8}\), we have

$$\begin{aligned} \Vert \tau _{\lambda _h}(x)-\tau _{\lambda _h}(y)\Vert _1 = \Vert v\Vert _1 \ge \frac{\kappa \cdot d_{T_h}(x,y)}{4}. \end{aligned}$$

Also, clearly \(\Vert \tau _{\lambda _h}\Vert _{\mathrm {Lip}} \le \kappa \).

On the other hand, the number of pairs \(x \in V(A), y \in V(B)\) with \(m=d_{T_h}(r,x)=d_{T_h}(r,y)\) is \(2^{2(m-1)}\), thus taking a union bound, we have

$$\begin{aligned} \mathbb {P}\big ({ {{\mathsf {dist}}}}(\tau _{\lambda _h}) > \max \{4, { {{\mathsf {dist}}}}(\tau _{\lambda _{h-1}})\}\big ) \le \sum _{m=1}^{h} 2^{2(m-1)} e^{-m\kappa /8}, \end{aligned}$$

and the latter bound is strictly less than 1 for some \(\kappa = O(1)\), showing the existence of a good map \(\tau _{\lambda _h}\).

This illustrates how re-randomization (applying a distribution over random isometries to one side of a tree) can be used to achieve \(O(1)\) distortion for embedding \(T_h\) into \(\ell _1^{O(h)}\). Unfortunately, the arguments become significantly more delicate when we handle less uniform trees. The full-blown re-randomization argument occurs in Sect. 5.

Scale Selection. The first step beyond complete binary trees would be in passing to complete \(d\)-ary trees for \(d \ge 3\). The same construction as above works, but now one has to choose \(\kappa \asymp \log d\). Unfortunately, if the degrees of our tree are not uniform, we have to adopt a significantly more delicate strategy. It is natural to choose a single number \(\kappa (e) \in {\mathbb {N}}\) for every edge \(e \in E\), and then put \(\lambda (e) \in \frac{1}{\kappa (e)} \{0,1\}^{\kappa (e)}\) (this ensures that the analogue of the embedding \(\tau _{\lambda }\) specified in (2) is 1-Lipschitz).

Observing the case of \(d\)-ary trees, one might be tempted to put

$$\begin{aligned} \kappa (e) = \Big \lceil \log \frac{|T_u|}{|T_v|}\Big \rceil , \end{aligned}$$

where \(e=(u,v)\) is directed away from the root, and we use \(T_v\) to denote the subtree rooted at \(v\). If one simply takes a complete binary tree on \(2^h\) nodes, and then connects a star of degree \(2^h\) to every vertex, we have \(\kappa (e) \asymp h\) for every edge, and thus the dimension becomes \(O(h^2)\) instead of \(O(h)\) as desired.

In fact, there are examples which show that it is impossible to choose \(\kappa (u,v)\) to depend only on the geometry of the subtree rooted at \(u\). These “scale selector” values have to look at the global geometry, and in particular have to encode the volume growth of the tree at many scales simultaneously. Our eventual scale selector is fairly sophisticated and impossible to describe without delving significantly into the details of the proof. For our purposes, we need to consider more general embeddings of type (1). In particular, the coordinates of our labels \(\lambda (e) \in {\mathbb {R}}^k\) will take a range of different values, not simply a single value as for complete trees.

We do try to maintain one important, related invariant: If \(P_v\) is the sequence of edges from the root to some vertex \(v\), then ideally for every coordinate \(i \in \{1,2,\ldots ,k\}\) and every value \(j \in {\mathbb {Z}}\), there will be at most one \(e \in P_v\) for which \(\lambda (e)_i \in [2^j, 2^{j+1})\). Thus instead of every coordinate being “touched” at most once on the path from the root to \(v\), every coordinate is touched at most once at every scale along every such path. This ensures that various scales do not interact. For technical reasons, this property is not maintained exactly, but analogous concepts arise frequently in the proof.

The restricted class of embeddings we use, along with a discussion of the invariants we maintain, are introduced in Sect. 3.2. The actual scale selectors are defined in Sect. 4.

Controlling the Topology. One of the properties that we used above for complete \(d\)-ary trees is that the depth of such a tree is \(O(\log _d n)\), where \(n\) is the number of nodes in the tree. This allowed us to concatenate vectors down a root–leaf path without exceeding our desired \(O(\log n)\) dimension bound. Of course, for general trees, no similar property need hold. However, there is still a bound on the topological depth of any \(n\)-node tree.

To explain this, let \(T=(V,E)\) be a tree with root \(r\), and define a monotone coloring of \(T\) to be a mapping \(\chi : E \rightarrow {\mathbb {N}}\) such that for every \(c \in {\mathbb {N}}\), the color class \(\chi ^{-1}(c)\) is a connected subset of some root–leaf path. Such colorings were used in previous works on embedding trees into Hilbert spaces [7, 11, 12], as well as for preivous low-dimensional embeddings into \(\ell _1\) [5]. The following lemma is well-known and elementary.

Lemma 1.2

Every connected \(n\)-vertex rooted tree \(T\) admits a monotone coloring such that every root–leaf path in \(T\) contains at most \(1+\log _2 n\) colors.

Proof

For an edge \(e \in E(T)\), let \(\ell (e)\) denote the number of leaves beneath \(e\) in \(T\) (including, possibly, an endpoint of \(e\)). Letting \(\ell (T) = \max _{e \in E} \ell (e)\), we will prove that for \(\ell (T) \ge 1\), there exists a monotone coloring with at most \(1+\log _2 (\ell (T)) \le 1+\log _2 n\) colors on any root–leaf path.

Suppose that \(r\) is the root of \(T\). For an edge \(e\), let \(T_e\) be the subtree beneath \(e\), including the edge \(e\) itself. If \(r\) is the endpoint of edges \(e_1, e_2, \ldots , e_k\), we may color the edges of \(T_{e_1}, T_{e_2}, \ldots , T_{e_k}\) separately, since any monotone path is contained completely within exactly one of these subtrees. Thus we may assume that \(r\) is the endpoint of only one edge \(e_1\), and then \(\ell (T)=\ell (e_1)\).

Choose a leaf \(x\) in \(T\) such that each connected component of \(T'\) of \(T \setminus E(P_{rx})\) has \(\ell (T') \le \ell (e_1)/2\) (this is easy to do by, e.g., ordering the leaves from left to right in a planar drawing of \(T\)). Color the edges \(E(P_{rx})\) with color 1, and inductively color each non-trivial connected component \(T'\) with disjoint sets of colors from \({\mathbb {N}} \setminus \{1\}\). By induction, the maximum number of colors appearing on a root–leaf path in \(T\) is at most \(1+\log _2(\ell (e_1)/2) = 1+\log _2(\ell (T))\), completing the proof. \(\square \)

Instead of dealing directly with edges in our actual embedding, we will deal with color classes. This poses a number of difficulties, and one major difficulty involving vertices which occur in the middle of such classes. For dealing with these vertices, we will first preprocess our tree by embedding it into a product of a small number of new trees, each of which admits colorings of a special type. This is carried out in Sect. 3.1.

2 Warm-Up: Embedding Complete \(k\)-ary Trees

We first prove our main result for the special case of complete \(k\)-ary trees, with an improved dependence on \(\varepsilon \). The main novelty is our use of the Lovász Local Lemma to analyze a simple random embedding of such trees into \(\ell _1\). The proof illustrates the tradeoff between concentration and the sizes of the sets \(\{ \{u,v\} \subseteq V : d_T(u,v)=j \}\) for each \(j=1,2,\ldots \)

Theorem 2.1

Let \(T_{k,h}\) be the unweighted, complete \(k\)-ary tree of height \(h\). For every \(\varepsilon > 0\), there exists a \((1+\varepsilon )\)-embedding of \(T_{k,h}\) into \(\ell _1^{O((h \log k)/\varepsilon ^2)}\).

In the next section, we introduce our random embedding and analyze the success probability for a single pair of vertices based on their distance. Then in Sect. 2.2, we show that with non-zero probability, the construction succeeds for all vertices. In the coming sections and later, in the proof of our main theorem, we will employ the following concentration inequality [14].

Theorem 2.2

Let \(M\) be a non-negative number, and \(X_i\, (1\le i\le n)\) be independent random variables satisfying \(X_i\le {\mathbb {E}}(X_i)+M\) for \(1\le i\le n\). Consider the sum \(X=\sum _{i=1}^n X_i\) with expectation \({\mathbb {E}}(X)=\sum _{i=1}^n {\mathbb {E}}(X_i)\) and \(\mathrm {Var}(X)=\sum _{i=1}^n \mathrm {Var}(X_i)\). Then we have

$$\begin{aligned} \mathbb {P}(X- \mathbb {E}(X)\ge \lambda )\le \exp \Big ({-\lambda ^2\over 2(\mathrm {Var}(X)+M\lambda /3) }\Big ). \end{aligned}$$
(3)

2.1 A Single Event

First \(k,h \in {\mathbb {N}}\) and \(\varepsilon > 0\). Write \(T=(V,E)\) for the tree \(T_{k,h}\) with root \(r \in V\), and let \(d_T\) be the unweighted shortest-path metric on \(T\). Additionally, we define

$$\begin{aligned} t={\Big \lceil {1\over \varepsilon }\Big \rceil } \end{aligned}$$
(4)

and

$$\begin{aligned} m=t \lceil {\log k}\rceil . \end{aligned}$$
(5)

Let \(\{\vec {v}(1),\ldots , {\vec {v}}(t)\}\) be the standard basis for \({\mathbb {R}}^t\). Let \(b_1, b_2, \ldots , b_m\) be chosen i.i.d. uniformly over \(\{1,2,\ldots ,t\}\). For the edges \(e \in E\), we choose i.i.d. random labels \(\lambda (e) \in {\mathbb {R}}^{m \times t}\), each of which has the distribution of the random vector (represented in matrix notation),

$$\begin{aligned} {1\over m} \left( \begin{array}{c} {\vec {v}}(b_1)\\ \vdots \\ {\vec {v}}(b_{m}) \end{array} \right) . \end{aligned}$$
(6)

Note that for every \(e\in E\), we have \(\Vert \lambda (e)\Vert _1=1\). We now define a random mapping \(g:V\rightarrow {\mathbb {R}}^{m(h-1) \times t}\) as follows: We put \(g(r)=0\), and otherwise

$$\begin{aligned} g(v)=\left( {\begin{array}{c} \lambda (e_1)\\ \vdots \\ \lambda (e_{j})\\ 0\\ \vdots \\ 0 \end{array}} \right) , \end{aligned}$$
(7)

where \(e_1, e_2, \ldots , e_{j}\) is the sequence of edges encountered on the path from the root to \(v\). It is straightforward to check that \(g\) is \(1\)-Lipschitz. The next observation is also immediate from the definition of \(g\).

Observation 2.3

For any \(v\in V\) and \(u\in V(P_v)\), we have \(d_T(u,v)= \Vert g(u)-g(v)\Vert _1\).

For \(m,n\in {\mathbb {N}}\), and \(A\in {\mathbb {R}}^{m\times n}\), we use the notation \(A[i] \in {\mathbb {R}}^n\) to refer to the \(i\)th row of \(A\). We now bound the probability that a given pair of vertices experiences a large contraction.

Lemma 2.4

For \(C\ge 10\), and \(x,y\in V\),

$$\begin{aligned} \mathbb {P}[ \Vert g(x)-g(y)\Vert _1\le (1-C\varepsilon )d_T(x,y)]\le k^{-Cd_T(x,y)/2}. \end{aligned}$$
(8)

Proof

Fix \(x,y \in V\), and let \(r'\) denote their lowest common ancestor. We define a family of random variables \(\{X_{ij}\}_{i\in [h-1], j \in [m]}\) by setting \(\ell _{ij} = (i-1)m + j\), and then

$$\begin{aligned} X_{ij}&= \Vert g(x)[\ell _{ij}]-g(r')[\ell _{ij}]\Vert _1+\Vert g(y)[\ell _{ij}]-g(r') [\ell _{ij}]\Vert _1 \nonumber \\&\quad -\Vert g(x)[\ell _{ij}]-g(y)[\ell _{ij}]\Vert _1. \end{aligned}$$
(9)

Observe that if \(i \le d_T(r,r')\) then \(X_{ij}=0\) for all \(j \in [m]\) since all three terms in (9) are zero. Furthermore, if \(i \ge \min (d_T(r,x), d_T(r,y))+1\), then again \(X_{ij}=0\) for all \(j \in [m]\), since in this case one of the first two terms of (9) is zero, and the other is equal to the last. Thus if

$$\begin{aligned} R = [h-1] \cap [d_T(r,r')+1, \min (d_T(r,x),d_T(r,y))], \end{aligned}$$

then \(i \notin R \implies X_{ij} = 0\) for all \(j \in [m]\), and additionally we have the estimate

$$\begin{aligned} |R| = \min (d_T(r,x),d_T(r,y))-d_T(r,r') \le \frac{d_T(x,y)}{2}. \end{aligned}$$
(10)

Now, using the definition of \(g\) in (7), we can write

$$\begin{aligned}&\Vert g(x)-g(y)\Vert _1\\&\quad =\sum _{i \in [h-1], j \in [m]} \big (\Vert g(x)[\ell _{ij}]-g(r')[\ell _{ij}]\Vert _1+\Vert g(y)[\ell _{ij}] -g(r')[\ell _{ij}]\Vert _1-X_{ij}\big )\\&\quad =\Vert g(x)-g(r')\Vert _1+\Vert g(y)-g(r')\Vert _1-\sum _{i \in [h-1], j \in [m]} X_{ij}\\&\quad {\mathop {=}\limits ^{(2.3)}} d_T(x,r')+d_T(y,r')-\sum _{i \in [h-1], j \in [m]} X_{ij}\\&\quad =d_T(x,y)-\sum _{i \in [h-1], j \in [m]} X_{ij}. \end{aligned}$$

We will prove the lemma by arguing that

$$\begin{aligned} \mathbb {P}\Big [\sum _{i \in [h-1], j \in [m]} X_{ij}\le C\varepsilon d_T(x,y)\Big ]\le k^{-Cd_T(x,y)/2}. \end{aligned}$$

We start the proof by first bounding the maximum of the \(X_{ij}\) variables. Since, for every \(\ell \), we have

$$\begin{aligned} \Vert g(x)[\ell ]-g(r')[\ell ]\Vert _1,\,\Vert g(y)[\ell ]-g(r')[\ell ]\Vert _1 \in \big \{0,\frac{1}{m}\big \}, \end{aligned}$$

we conclude that

$$\begin{aligned} \max \big \{ X_{ij}:i\in [h-1], j \in [m] \big \}\le {2\over m}. \end{aligned}$$
(11)

For \(i\in R\) and \(j \in [m]\), using (6) and (7), we see that \((g(x)[\ell _{ij}]-g(r')[\ell _{ij}])={1\over m}{\vec {v}}(\alpha )\) and \(g(y)[\ell _{ij}]-g(r')[\ell _{ij}]={1\over m}{\vec {v}}(\beta )\), where \(\alpha \) and \(\beta \) are i.i.d. uniform over \(\{1,\ldots , t\}\). Hence, for \(i\in R\) and \(j \in [m]\), we have

$$\begin{aligned} \mathbb {P}[X_{ij}\ne 0]={1\over t}. \end{aligned}$$

We can thus bound the expected value and variance of \(X_{ij}\) for \(i\in R\) and \(j \in [m]\) using (11),

$$\begin{aligned} {\mathbb {E}}[X_{ij}]\le {2\over tm}\, \end{aligned}$$
(12)

and

$$\begin{aligned} \mathrm {Var}(X_{ij})\le {4\over tm^2}. \end{aligned}$$
(13)

Using (10), we have

$$\begin{aligned} \sum _{i=1}^{h-1} \sum _{j=1}^m {\mathbb {E}}[X_{ij}]=\sum _{i\in R} \sum _{j \in [m]} {\mathbb {E}}[X_{ij}] \mathop {\le }\limits ^{(12)}\sum _{i\in R}{2\over t} \mathop {\le }\limits ^{(10)} {d_T(x,y)\over t} \end{aligned}$$
(14)

and

$$\begin{aligned} \sum _{i=1}^{h-1} \sum _{j=1}^{m} \mathrm {Var}(X_{ij})&=\sum _{i\in R} \sum _{j \in [m]} \mathrm {Var}(X_{ij}) {\mathop {\le }\limits ^{(13)}}\sum _{i\in R}{4\over tm} \mathop {\le }\limits ^{(10)} {2\,d_T(x,y)\over tm}. \end{aligned}$$
(15)

We now apply Theorem 2.2 to complete the proof:

$$\begin{aligned}&{\mathbb {P}}\Big [\sum _{i \in [h-1],j \in [m]} X_{ij} \ge C\big ({d_T(x,y)\over t}\big )\Big ]\\&\qquad ={\mathbb {P}}\Big [\sum _{i \in [h-1], j \in [m]} X_{ij}-{d_T(x,y)\over t}\ge (C-1)\big ({d_T(x,y)\over t}\big )\Big ]\\&\qquad {\mathop {\le }\limits ^{(14)}} {\mathbb {P}}\Big (\sum \limits _{i \in [h-1], j \in [m]} {X_{ij} }-\mathbb E \Big [\sum \limits _{i \in [h-1], j\in [m]} X_{ij}\Big ] \ge (C-1)\big (\frac{d_T(x,y)}{t}\big )\Big )\\&\qquad \le \exp \Big ({-((C-1)d_T(x,y)/t)^2\over 2\big (\sum _{i \in [h-1], j \in [m]} \mathrm {Var}(X_{ij})+ (C-1)(d_T(x,y)/t) ({2\over m})/3\big )}\Big )\\&\qquad {\mathop {\le }\limits ^{(15)}} \exp \Big ({-((C-1)d_T(x,y)/t)^2\over 2\big ({2\,d_T(x,y)/ (tm)}+ (C-1)(d_T(x,y)/t) ({2\over m})/3\big )}\Big )\\&\qquad =\exp \Big ({-(C-1)^2\over 4\big (1+ (C-1)/3\big )}\cdot \frac{m}{t}\cdot d_T(x,y) \Big ). \end{aligned}$$

An elementary calculation shows that for \(C\ge 10\), we have \({(C-1)^2 \over {4(1+(C-1)/3)}}\ge {C\over 2}.\) Hence,

$$\begin{aligned}&{\mathbb {P}}\Big [\sum _{i \in [h-1],j \in [m]} X_{ij} \ge C\varepsilon {d_T(x,y)}\Big ]\\&\quad {\mathop {\le }\limits ^{(14)}} {\mathbb {P}}\Big [\sum _{i \in [h-1],j \in [m]} X_{ij} \ge C\big ({d_T(x,y)\over t}\big )\Big ]\le \exp \Big (-{Cm\over 2t}d_T(x,y)\Big )\\&\quad {\mathop {\le }\limits ^{(5)}}\; k^{-Cd_T(x,y)/2}\, \end{aligned}$$

completing the proof.\(\square \)

2.2 The Local Lemma Argument

We first give the statement of the Lovász Local Lemma [6] and then use it in conjunction with Lemma 2.4 to complete the proof of Theorem 2.1.

Theorem 2.5

Let \({\mathcal {A}}\) be a finite set of events in some probability space. For \(A \in {\mathcal {A}}\), let \(\Gamma (A) \subseteq {\mathcal {A}}\) be such that \(A\) is independent from the collection of events \({\mathcal {A}} \setminus (\{A\} \cup \Gamma (A))\). If there exists an assignment \(x : {\mathcal {A}} \rightarrow (0,1)\) such that for all \(A \in {\mathcal {A}}\), we have

$$\begin{aligned} \mathbb {P}(A) \le x(A) \prod _{B \in \Gamma (A)} (1-x(B)), \end{aligned}$$

then the probability that none of the events in \({\mathcal {A}}\) occur is at least \(\prod _{A \in {\mathcal {A}}} (1-x(A)) > 0\).

Proof of Theorem 2.1

We may assume that \(k \ge 2\). We will use Theorem 2.5 and Lemma 2.4 to show that with non-zero probability the following inequality holds for all \(u,v\in V\)

$$\begin{aligned} \Vert g(u)-g(v)\Vert _1\le (1-14\varepsilon )\,d_T(u,v). \end{aligned}$$

For \(u,v\in V\), let \({\mathcal {E}}_{uv}\) be the event \(\big \{ \Vert g(u)-g(v)\Vert _1 \le (1-{14\varepsilon })\,d_T(u,v)\big \}\). Now, for \(u,v\in V\), define

$$\begin{aligned} x_{uv}=k^{-3 d_T(u,v)}. \end{aligned}$$

Observe that for vertices \(u,v \in V\) and a subset \(V' \subseteq V\), the event \({\mathcal {E}}_{uv}\) is mutually independent of the family \(\{{\mathcal {E}}_{u'v'} : u',v' \in V' \}\) whenever the induced subgraph of \(T\) spanned by \(V'\) contains no edges from \(P_{uv}\). Thus using Theorem 2.5, it is sufficient to show that for all \(u,v\in V\),

$$\begin{aligned} \mathbb {P}({\mathcal {E}}_{uv}) \le x_{uv} \mathop {\prod _{s,t\in V :}}_{E(P_{st})\cap E(P_{uv})\ne \emptyset } (1-x_{st}). \end{aligned}$$
(16)

Indeed, this will complete the proof of Theorem 2.1.

To this end, fix \(u,v \in V\). For \(e\in E\) and \(i\in {\mathbb {N}}\), we define the set

$$\begin{aligned} S_{e,i}=\big \{(s,t): s,t\in V, d_T(s,t)=i, \text { and } e \in E(P_{st})\big \}. \end{aligned}$$

Since \(T\) is a \(k\)-ary tree,

$$\begin{aligned} |S_{e,i}|\le \sum _{j=1}^{i} k^{j-1}\cdot k^{i-j}= i\cdot k^{i-1}\le k^{2i}. \end{aligned}$$
(17)

Thus we can write

$$\begin{aligned}&x_{uv}\mathop {\prod _{s,t\in V :}}_{E(P_{st})\cap E(P_{uv})\ne \emptyset } (1-x_{st})\\&= x_{uv} \prod _{e \in E(P_{uv})} \prod _{i\in {\mathbb {N}}} \prod _{(s,t) \in S_{e,i}} \big (1- x_{st}\big )= k^{-3d_T(u,v)} \prod _{e \in E(P_{uv})} \prod _{i\in {\mathbb {N}}} \prod _{(s,t) \in S_{e,i}} \big (1- k^{- 3i}\big )\\&\mathop {\,\ge }\limits ^{(17)} k^{-3d_T(u,v)} \prod _{e \in E(P_{uv})} \prod _{i\in {\mathbb {N}}} \big (1- k^{ - 3i}\big )^{k^{2i}}\ge k^{-3d_T(u,v)} \prod _{e \in E(P_{uv})} \prod _{i\in {\mathbb {N}}} \big (1- k^{2i}(k^{ - 3i})\big )\\&= k^{-3d_T(u,v)} \prod _{e \in E(P_{uv})} \prod _{i\in {\mathbb {N}}} \big (1- \frac{1}{k^{i}}\big ). \end{aligned}$$

For \(x\in [0,\frac{1}{2}]\), we have \(e^{-2x}\le 1-x\), and since \(k \ge 2\), we have \(k^{-i} \le {1\over 2}\) for all \(i\in {\mathbb {N}}\), hence

$$\begin{aligned}&x_{uv}\mathop {\prod _{s,t\in V :}}_{E(P_{st})\cap E(P_{uv})\ne \emptyset } (1-x_{st})\\&\quad \ge k^{-3d_T(u,v)} \prod _{e \in E(P_{uv})} \prod _{i\in {\mathbb {N}}} \exp \big ({- 2\over k^{i}}\big ) = k^{-3d_T(u,v)} \prod _{e \in E(P_{uv})} \exp \big (- 2\sum _{i\in {\mathbb {N}}} {1\over k^i}\big )\\&\quad = k^{-3d_T(u,v)} \prod _{e \in E(P_{uv})} \exp \big ({- 2/k\over 1-1/k}\big ) \ge k^{-3d_T(u,v)} \prod _{e \in E(P_{uv})} \exp \big ({- 4 \over k}\big )\\&\quad = k^{-3d_T(u,v)} \exp \big ({- 4\,d_T(u,v) \over k}\big ). \end{aligned}$$

Since \(k\ge 2\), we conclude that

$$\begin{aligned} x_{uv}\mathop {\prod _{s,t\in V :}}_{E(P_{st})\cap E(P_{uv})\ne \emptyset } (1-x_{st}) \ge k^{- 7d_T(u,v)}. \end{aligned}$$

On the other hand, Lemma 2.4 applied with \(C=14\) gives

$$\begin{aligned} \mathbb {P}[\Vert g(u)-g(v)\Vert _1\le (1-14\varepsilon )d_T(u,v)]\le k^{-7d_T(u,v)}, \end{aligned}$$

yielding (16), and completing the proof. \(\square \)

3 Colors and Scales

In the present section, we develop some tools for our eventual embedding. The proof of our main theorem appears in the next section, but relies on a key theorem which is only proved in Sect. 5.

3.1 Monotone Colorings

Let \(T=(V,E)\) be a metric tree rooted at a vertex \(r \in V\). Recall that such a tree \(T\) is equipped with a length \(\mathsf{{len}}\) \(: E \rightarrow [0,\infty )\). We extend this to subsets of edges \(S \subseteq E\) via \(\mathsf{{len}}\) \((S) = \sum _{e \in S}\) \(\mathsf{{len}}\) \((e)\). We recall that a monotone coloring is a mapping \(\chi : E \rightarrow {\mathbb {N}}\) such that each color class \(\chi ^{-1}(c) = \{ e \in E : \chi (e) = c \}\) is a connected subset of some root–leaf path. For a set of edges \(S \subseteq E\), we write \(\chi (S)\) for the set of colors occurring in \(S\). We define the multiplicity of \(\chi \) by

$$\begin{aligned} M(\chi ) = \max _{v \in V} |\chi (P_v)|. \end{aligned}$$

Given such a coloring \(\chi \) and \(c \in {\mathbb {N}}\), we define

$$\begin{aligned} {\mathsf{{len}}_{\chi }(c)} = \mathsf{{len}}(\chi ^{-1}(c)), \end{aligned}$$

and \({\mathsf{{len}}_{\chi }(S)} = \sum _{c \in S} {\mathsf{{len}}_{\chi }(c)}\) if \(S \subseteq {\mathbb {N}}\).

For every \(\delta \in [0,1]\) and \(x,y \in V\), we define the set of colors

$$\begin{aligned} C_{\chi }(x,y; \delta ) = \big \{ c : \mathsf{{len}}(P_{xy} \cap \chi ^{-1}(c)) \le \delta \cdot \mathsf{{len}}_{\chi }(c) \big \} \cap (\chi (P_x) \triangle \chi (P_y)). \end{aligned}$$

This is the set of colors \(c\) which occur in only one of \(P_x\) and \(P_y\), and for which the contribution to \(P_{xy}\) is significantly smaller than \(\mathsf{{len}}_{\chi }(c)\). We also put

$$\begin{aligned} \rho _{\chi }(x,y;\delta ) = \mathsf{{len}}_{\chi }(C(x,y;\delta )). \end{aligned}$$
(18)

We now state a key theorem that will be proved in Sect. 5.

Theorem 3.1

For every \(\varepsilon , \delta > 0\), there is a value \(C(\varepsilon ,\delta ) = O\big ((\frac{1}{\varepsilon } +\log \log \frac{1}{\delta })^3 \log \frac{1}{\varepsilon }\big )\) such that the following holds. For any metric tree \(T=(V,E)\) and any monotone coloring \(\chi : E \rightarrow {\mathbb {N}}\), there exists a mapping \(F : V \rightarrow \ell _1^{C(\varepsilon ,\delta ) (\log n + M(\chi ))}\) such that for all \(x,y \in V\),

$$\begin{aligned} (1 - \varepsilon ) \,d_T(x,y) - \delta \, \rho _{\chi }(x,y;\delta ) \le \Vert F(x)-F(y)\Vert _1 \le d_T(x,y). \end{aligned}$$
(19)

The problem one now confronts is whether the loss in the \(\rho _{\chi }(x,y; \delta )\) term can be tolerated. In general, we do not have a way to do this, so we first embed our tree into a product of a small number of trees in a way that allows us to control the corresponding \(\rho \)-terms.

Lemma 3.2

For every \(\varepsilon \in (0,1)\), there is a number \(k \asymp \frac{1}{\varepsilon }\) such that the following holds. For every metric tree \(T=(V,E)\) and monotone coloring \(\chi : E \rightarrow {\mathbb {N}}\), there exist \(k\) metric trees \(T_1, T_2, \ldots , T_k\) with monotone colorings \(\{\chi _i : E(T_i) \rightarrow {\mathbb {N}}\}_{i=1}^k\) and mappings \(\{f_i : V \rightarrow V(T_i)\}_{i=1}^k\) such that \(M(\chi _i) \le M(\chi )\), and \(|V(T_i)|\le |V|\) for all \(i \in [k]\), and the following conditions hold for all \(x,y \in V\):

  1. (a)

    We have

    $$\begin{aligned} \frac{1}{k} \sum _{i=1}^k d_{T_i}(f_i(x),f_i(y)) \ge (1-\varepsilon )\,d_T(x,y). \end{aligned}$$
    (20)
  2. (b)

    For all \(i \in [k]\), we have

    $$\begin{aligned} d_{T_i}(f_i(x),f_i(y)) \le (1+\varepsilon )\,d_T(x,y). \end{aligned}$$
    (21)
  3. (c)

    There exists a number \(j \in [k]\) such that

    $$\begin{aligned} \varepsilon \,d_T(x,y)\ge \frac{2^{-(k+1)}}{k} \mathop {\sum _{i=1}^k}_{i \ne j} \rho _{\chi _i}(f_i(x),f_i(y);2^{-(k+1)}). \end{aligned}$$
    (22)

Using Lemma 3.2 in conjunction with Theorem 3.1, we can now prove the main theorem (Theorem 1.1).

Proof of Theorem 1.1

Let \(\varepsilon > 0\) be given, let \(T=(V,E)\) be an \(n\)-vertex metric tree. Let \(\chi : E \rightarrow {\mathbb {N}}\) be a monotone coloring with \(M(\chi ) \le O(\log n)\), which exists by Lemma 1.2. Apply Lemma 3.2 to obtain metric trees \(T_1, \ldots , T_k\) with corresponding monotone colorings \(\chi _1, \ldots , \chi _k\) and mappings \(f_i : V \rightarrow V(T_i)\). Observe that \(M(\chi _i) \le O(\log n)\) for each \(i \in [k]\).

Let \(F_i : V(T_i) \rightarrow \ell _1^{C(\varepsilon ) \log n}\) be the mapping obtained by applying Theorem 3.1 to \(T_i\) and \(\chi _i\), for each \(i \in [k]\), with \(\delta = 2^{-(k+1)}\), where \(C(\varepsilon ) = O\big (\frac{1}{\varepsilon ^3} (\log \frac{1}{\varepsilon })\big )\). Finally, we put

$$\begin{aligned} F = \frac{1}{k} \big ((F_1 \circ f_1) \oplus (F_2 \circ f_2) \oplus \cdots \oplus (F_k \circ f_k)\big ) \end{aligned}$$

so that \(F : V \rightarrow \ell ^{O((\frac{1}{\varepsilon })^4 \log \frac{1}{\varepsilon } \cdot \log n)}\). We will prove that \(F\) is a \((1+O(\varepsilon ))\)-embedding, completing the proof.

First, observe that each \(F_i\) is \(1\)-Lipschitz (Theorem 3.1). In conjunction with condition (b) of Lemma 3.2 which says that \(\Vert f_i\Vert _{\mathrm {Lip}} \le 1+\varepsilon \) for each \(i \in [k]\), we have \(\Vert F\Vert _{\mathrm {Lip}} \le 1+\varepsilon \).

For the other side, fix \(x,y \in V\) and let \(j \in [k]\) be the number guaranteed in condition (c) of Lemma 3.2. Then we have

$$\begin{aligned}&\Vert F(x)-F(y)\Vert _1\\&\quad = \frac{1}{k} \sum _{i=1}^k \Vert (F_i \circ f_i)(x)-(F_i \circ f_i)(y)\Vert _1 \\&\quad {\mathop {\,\ge }\limits ^{(19)}}\; \frac{1}{k} \sum _{i \ne j} \big ((1-\varepsilon )\,d_{T_i}(f_i(x),f_i(y)) -2^{-(k+1)} \rho _{\chi _i}(f_i(x),f_i(y); 2^{-(k+1)})\big ) \\&\quad {\mathop {\ge }\limits ^{(22)}} \Big (\frac{1}{k} \sum _{i \ne j} (1-\varepsilon )\,d_{T_i}(f_i(x),f_i(y))\Big ) - \varepsilon \, d_T(x,y) \\&\quad \ge \Big (\frac{1}{k} \sum _{i=1}^k (1-\varepsilon )\,d_{T_i}(f_i(x),f_i(y))\Big )- \frac{1}{k} \, d_{T_j}(f_j(x), f_j(y)) - \varepsilon \, d_T(x,y) \\&\quad \!\! {\mathop {\ge }\limits ^{(21)}} \Big (\frac{1}{k} \sum _{i=1}^k (1-\varepsilon )\,d_{T_i}(f_i(x),f_i(y))\Big ) - \frac{1+\varepsilon }{k} \, d_T(x,y) - \varepsilon \, d_T(x,y) \\&\quad {\mathop {\ge }\limits ^{(20)}} (1-\varepsilon )^2 \,d_T(x,y) - \frac{1+\varepsilon }{k} \, d_T(x,y) - \varepsilon \, d_T(x,y) \\&\quad \ge (1-O(\varepsilon )) \,d_T(x,y), \end{aligned}$$

where in the final line we have used \(k \asymp \frac{1}{\varepsilon }\), completing the proof. \(\square \)

We now move on to the proof of Lemma 3.2. We begin by proving an analogous statement for the half line \([0,\infty )\). An \({\mathbb {R}}\) -star is a metric space formed as follows: Given a sequence \(\{a_i\}_{i=1}^{\infty }\) of positive numbers, one takes the disjoint union of the intervals \(\{[0,a_1], [0,a_2], \ldots \}\), and then identifies the 0 point in each, which is canonically called the root of the \({\mathbb {R}}\) -star. An \({\mathbb {R}}\)-star \(S\) carries the natural induced length metric \(d_S\). We refer to the associated intervals as branches, and the length of a branch is the associated number \(a_i\). Finally, if \(S\) is an \({\mathbb {R}}\)-star, and \(x \in S \setminus \{0\}\), we use \(\ell (x)\) to denote the length of the branch containing \(x\). We put \(\ell (0)=0\).

Lemma 3.3

For every \(k \in {\mathbb {N}}\) with \(k \ge 2\), there exist \({\mathbb {R}}\)-stars \(S_1, \ldots , S_k\) with mappings

$$\begin{aligned} f_i : [0,\infty ) \rightarrow S_i \end{aligned}$$

such that the following conditions hold:

  1. (i)

    For each \(i \in [k], f_i(0)\) is the root of \(S_i\).

  2. (ii)

    For all \(x,y \in [0,\infty )\), \(\frac{1}{k} \sum _{i=1}^k d_{S_i}(f_i(x),f_i(y)) \ge (1-\frac{7}{k}) |x-y|.\)

  3. (iii)

    For each \(i \in [k], f_i\) is \((1+2^{-k+1})\)-Lipschitz.

  4. (iv)

    For \(x \in [0,\infty )\), we have \(\ell (f_i(x)) \le 2^{k-1} x.\)

  5. (v)

    For \(x\in [0,\infty )\), there are at most two values of \(i \in [k]\) such that

    $$\begin{aligned} d_{S_i}(f_i(0),f_i(x)) \le 2^{-k} \,\ell (f_i(x)). \end{aligned}$$
  6. (vi)

    For all \(x,y \in [0,\infty )\), there is at most one value of \(i \in [k]\) such that \(f_i(x)\) and \(f_i(y)\) are in different branches of \(S_i\) and

    $$\begin{aligned} 2^{-k} \big (\ell (f_i(x)) + \ell (f_i(y))\big ) > 2\, |x-y|. \end{aligned}$$

Proof

Assume that \(k \ge 2\). We first construct \({\mathbb {R}}\)-stars \(S_1, \ldots , S_k\). We will index the branches of each star by \({\mathbb {Z}}\). For \(i \in [k], S_i\) is a star whose \(j\)th branch, for \(j \in {\mathbb {Z}}\), has length \(2^{i-1+k(j+1)}\). We will use the notation \((i,j,d)\) to denote the point at distance \(d\) from the root on the \(j\)th branch of \(S_i\). Observe that \((i,j,0)\) and \((i,j',0)\) describe the same point (the root of \(S_i\)) for all \(j,j' \in {\mathbb {N}}\).

Now, we define for every \(i \in [k]\), a function \(f_i : [0,\infty ) \rightarrow S_i\) as follows:

$$\begin{aligned} f_i(x)=\left\{ \begin{array}{ll} \big (i,j,(x-2^{i+kj})/(1-2^{1-k})\big ) &{} \text { for } 2^{-i} x\in [2^{kj},2^{k(j+1)-1}),\\ \big (i,j,2^{i+k(j+1)}-x\big )&{}\text { for } 2^{-i} x \in [2^{k(j+1)-1},2^{k(j+1)}). \end{array} \right. \end{aligned}$$

Condition (i) is immediate. It is also straightforward to verify that

$$\begin{aligned} \Vert f_i\Vert _{\mathrm {Lip}} \le (1-2^{1-k})^{-1} \le 1+2^{-k+1}, \end{aligned}$$
(23)

yielding condition (iii).

Toward verifying condition (ii), observe that for every \(x \in [0,\infty )\) and \(l\in \{0,1, \ldots , k-2\}\) we have

$$\begin{aligned} d_{S_i}(f_i(x),f_i(0))\ge {(x-2^{\lfloor \log _2 x\rfloor -l})/ (1-2^{1-k})}\ge x-2^{\lfloor \log _2 x\rfloor -l}, \end{aligned}$$

when \(i = (\lfloor \log _2 x\rfloor -l) {\text { mod }}k\). Using this, we can write

$$\begin{aligned} \sum _{i=1}^{k} d_{S_i}(f_i(x),f_i(0))&\ge \sum _{l=\lfloor \log _2 x\rfloor -k+2}^{\lfloor \log _2 x \rfloor } x-2^l = (k-1)x-\sum _{l=\lfloor \log _2 x \rfloor -k+2}^{\lfloor \log _2 x \rfloor } 2^{l}\nonumber \\&\ge (k-1)x-2^{\lfloor \log _2 x \rfloor +1}\ge (k-3)x. \end{aligned}$$
(24)

Now fix \(x,y \in [0,\infty )\) with \(x \le y\). If \(x\le y/2\), then we can use the triangle inequality, together with (23) and (24) to write

$$\begin{aligned}&\frac{1}{k} \sum _{i=1}^k d_{S_i}(f_i(x),f_i(y)) \\&\ge \frac{1}{k} \sum _{i=1}^k \big ( d_{S_i}(f_i(y),f_i(0)) - d_{S_i}(f_i(x),f_i(0))\big ) \ge (1-3/k)y-(1+2^{1-k})x\\&\ge (1-3/k)y-(1+1/k)x \ge (1-7/k)(y-x)+ 4y/k-8x/k\\&\ge (1-7/k)(y-x). \end{aligned}$$

In the case that \({y\over 2} \le x\le y\), for \(l\in \{0,1,\ldots , k-3\}\), we have

$$\begin{aligned} d_{S_i}(f_i(x),f_i(y))\ge (y-x)/(1-2^{1-k})\ge y-x \end{aligned}$$

when \(i=(\lfloor \log _2 x\rfloor -l) {\text { mod }}k\). From this, we conclude that

$$\begin{aligned} {1\over k}\sum _{i=1}^{k} d_{S_i}(f_i(x),f_i(y))&\ge {1\over k}\sum _{l=0}^{k-3}(y-x)\ge { k-2 \over k}(y-x), \end{aligned}$$
(25)

yielding condition (ii).

It is also straightforward to check that

$$\begin{aligned} \ell (f_i(x))\le 2^{\lfloor \log _2 x\rfloor +k-1}\le 2^{k-1}x, \end{aligned}$$

which verifies condition (iv).

To verify condition (v), note that for \(x\in [0,\infty )\), the inequality \(d_{S_i}(f_i(x),f_i(0))\le x/2\) can only hold for \(i {\text { mod }}k \in \{\lfloor \log _2 x\rfloor , \lfloor \log _2 x \rfloor + 1\}\), hence condition (iv) implies condition (v).

Finally we verify condition (vi). We divide the problem into two cases. If \(x<y/2\), then by condition (iv),

$$\begin{aligned} \ell (f_i(x)) + \ell (f_i(y)) \le 2^{k-1} (x+y)\le 2^{k-1} (2y)\le 2^{k+1}(y-x). \end{aligned}$$

In the case that \(y/2< x\le y, f_i(x)\) and \(f_i(y)\) can be mapped to different branches of \(S_i\) only for \(i \equiv \lfloor \log _2 y\rfloor \,({\text { mod }}k)\), yielding condition (vi). \(\square \)

Finally, we move onto the proof of Lemma 3.2.

Proof of Lemma 3.2

We put \(k = \lceil 7/\varepsilon \rceil \) and prove the following stronger statement by induction on \(|V|\): There exist metric trees \(T_1, T_2, \ldots , T_k\) and monotone colorings \(\chi _i : E(T_i) \rightarrow {\mathbb {N}}\), along with mappings \(f_i : V \rightarrow V(T_i)\) satisfying the conditions of the lemma. Furthermore, each coloring \(\chi _i\) satisfies the stronger condition for all \(v \in V\),

$$\begin{aligned} |\chi _i(P_{f_i(v)})| \le |\chi (P_v)|. \end{aligned}$$
(26)

The statement is trivial for the tree containing only a single vertex. Now suppose that we have a tree \(T\) and coloring \(\chi : E \rightarrow {\mathbb {N}}\). Since \(T\) is connected, it is easy to see that there exists a color class \(c \in \chi (E)\) with the following property. Let \(\gamma _c\) be the path whose edges are colored \(c\), and let \(v_c\) be the vertex of \(\gamma _c\) closest to the root. Then the induced tree \(T'\) on the vertex set \((V \setminus V(\gamma _c)) \cup \{v_c\}\) is connected.

Applying the inductive hypothesis to \(T'\) and \(\chi |_{E(T')}\) yields metric trees \(T_1', T_2', \ldots ,\) \( T_k'\) with colorings \(\chi _i' : E(T_i') \rightarrow {\mathbb {N}}\) and mappings \(f'_i : V(T') \rightarrow V(T_i')\).

Now, let \(S_1, \ldots , S_k\) and \(\{ g_i : [0,\infty ) \rightarrow S_i\}\) be the \({\mathbb {R}}\)-stars and mappings guaranteed by Lemma 3.3. For each \(i \in [k]\), let \(S_i'\) be the induced subgraph of \(S_i\) on the set \(\{g_i(d_T(v,v_c)) : v \in V(\gamma _c)\}\), and make \(S_i'\) into a metric tree rooted at \(g_i(0)\), with the length structure inherited from \(S_i\). We now construct \(T_i\) by attaching \(S'_i\) to \(T'_i\) with the root of \(S'_i\) identified with the node \(f_i'(v_c)\). The coloring \(\chi _i'\) is extended to \(T_i\) by assigning to each root–leaf path in \(S'_i\) a new color. Finally, we specify functions \(f_i : V \rightarrow V(T_i)\) via

$$\begin{aligned} f_i(v) = \left\{ \begin{array}{l@{\quad }l} f'_i(v), &{} v \in V(T'), \\ g_i(d_T(v_c, v)), &{} v \in V \setminus V(T'). \end{array}\right. \end{aligned}$$

It is straightforward to verify that (26) holds for the colorings \(\{\chi _i\}\) and every vertex \(v \in V\). In addition, using the inductive hypothesis, we have \(|V(T_i)| \le |V|\) and \(M(\chi ) \le M(\chi _i)\) for every \(i \in [k]\), with the latter condition following immediately from (26) and the structure of the mappings \(\{f_i\}\).

We now verify that conditions (a), (b), and (c) hold. For \(x,y \in V(T')\), the induction hypothesis guarantees all three conditions. If both \(x,y \in V(\gamma _c) \setminus \{v_c\}\), then conditions (a) and (b) follow directly from conditions (ii) and (iii) of Lemma 3.3 applied to the maps \(\{g_i\}\). To verify condition (c), let \(j \in [k]\) be the single bad index from (vi). We have for all \(i\ne j\),

$$\begin{aligned} \rho _{\chi _i}(f_i(x), f_i(y); 2^{-(k+1)}) \le 2^{k+1}d_T(x,y). \end{aligned}$$

Since there are at most two colors on the path between \(x\) and \(y\) in any \(T_i\), by condition (v) of Lemma 3.3, there are at most four values of \(i \in [k] \setminus \{j\}\) such that

$$\begin{aligned} \rho _{\chi _i}(f_i(x), f_i(y); 2^{-(k+1)})\ne 0, \end{aligned}$$

hence

$$\begin{aligned} {1\over k}\sum _{i \ne j} \rho _{\chi _i}(f_i(x), f_i(y); 2^{-(k+1)}) \le {4\cdot 2^{k+1}\over k}\,d_T(x,y)\le \varepsilon 2^{k+1}d_T(x,y). \end{aligned}$$

Since \(\Vert f_i\Vert _{\mathrm {Lip}}\) is determined on edges \((x,y) \in E\), and each such edge has \(x,y \in V(\gamma _c)\) or \(x,y \in V(T')\), we have already verified condition (b) for all \(i \in [k]\) and \(x,y \in V\). Finally, we verify (a) and (c) for pairs with \(x\in V(T')\) and \(y\in V(\gamma _c)\). We can check condition (a) using the previous two cases,

$$\begin{aligned} \frac{1}{k} \sum _{i=1}^k d_{T_i}(f_i(x),f_i(y))&= \frac{1}{k} \sum _{i=1}^k \big (d_{T_i}(f_i(x),f_i(v_c))+d_{T_i}(f_i(y),f_i(v_c))\big )\\&\ge (1-\varepsilon )d_T(y,v_c)+(1-\varepsilon )d_T(x,v_c)\ge (1-\varepsilon )d_T(x,y). \end{aligned}$$

Towards verifying condition (c), note that by condition (v) from Lemma 3.3, there are at most two values of \(i\), such that

$$\begin{aligned}&\rho _{\chi _i}(f_i(x),f_i(y);2^{-(k+1)})- \rho _{\chi _i}(f_i(x),f_i(v_c);2^{-(k+1)})\\&\qquad =\rho _{\chi _i} (f_i(y),f_i(v_c);2^{-(k+1)})\ne 0. \end{aligned}$$

By the induction hypothesis, there exists a number \(j\in [k]\) such that

$$\begin{aligned} \varepsilon \, d_T(x,v_c)\ge \frac{2^{-(k+1)}}{k} \mathop {\sum }_{i \ne j} \rho _{\chi _i}(f_i(v_c),f_i(x);2^{-(k+1)}). \end{aligned}$$

Now we use condition (iv) from Lemma 3.3 to conclude

$$\begin{aligned}&\frac{2^{-(k+1)}}{k} \mathop {\sum }_{i \ne j} \rho _{\chi _i}(f_i(x),f_i(y);2^{-(k+1)})\\&\quad \le \frac{2^{-(k+1)}}{k} \mathop {\sum }_{i \ne j}\big ( \rho _{\chi _i}(f_i(x),f_i(v_c);2^{-(k+1)})+\rho _{\chi _i}(f_i(y), f_i(v_c);2^{-(k+1)})\big ) \\&\quad \le \varepsilon d_T(x,v_c)+2 \big (\frac{2^{-(k+1)}}{k}\big )\,( 2^{k-1} d_T(y,v_c))\le \varepsilon \,d_T(x,v_c)+\varepsilon \,d_T(v_c,y)\\&\quad = \varepsilon \,d_T(x,y), \end{aligned}$$

completing the proof. \(\square \)

3.2 Multi-scale Embeddings

We now present the basics of our multi-scale embedding approach. The next lemma is devoted to combining scales together without using too many dimensions, while controlling the distortion of the resulting map.

Lemma 3.4

For every \(\varepsilon \in (0,1)\), the following holds. Let \((X,d)\) be an arbitrary metric space, and consider a family of functions \(\{f_i : X \rightarrow [0,1]\}_{i \in {\mathbb {Z}}}\) such that for all \(x,y \in X\), we have

$$\begin{aligned} \sum _{i \in {\mathbb {Z}}} 2^i |f_i(x)-f_i(y)| < \infty . \end{aligned}$$
(27)

Then there is a mapping \(F : V \rightarrow \ell _1^{2+\lceil \log \frac{1}{\varepsilon }\rceil }\) such that for all \(x,y \in X\),

$$\begin{aligned} (1-\varepsilon )\sum _{i\in {\mathbb {Z}}}2^{i} |f_i(x)-f_i(y)|- 2\,\zeta (x,y){\le } \Vert F(x)-F(y)\Vert _1 {\le } \sum _{i\in {\mathbb {Z}}} 2^{i}|f_i(x)-f_i(y)|, \end{aligned}$$

where

$$\begin{aligned} \zeta (x,y)=\sum _{\begin{array}{c} i:\exists j<i \\ f_{j}(x)-f_{j}(y)\ne 0 \end{array}}2^i (|f_{i}(x)-f_{i}(y)|-\lfloor | f_{i}(x)-f_{i}(y)|\rfloor ). \end{aligned}$$

Proof

Let \(k=2+\lceil \log 1/\varepsilon \rceil \), and fix some \(x_0 \in X\). For \(i \in [k]\), define \(F_i : X \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} F_i(x)=\sum _{j\in {\mathbb {Z}}} 2^{jk+i} (f_{jk+i}(x)-f_{jk+i}(x_0)). \end{aligned}$$
(28)

It is easy to see that (27) implies absolute convergence of the preceding sum. We will consider the map \(F = F_1 \oplus F_2 \oplus \cdots \oplus F_{k} : X \rightarrow \ell _1^k\). It is straightforward to verify that for every \(x,y \in X\),

$$\begin{aligned} \Vert F(x)-F(y)\Vert _1\le \sum _{i\in {\mathbb {Z}}} 2^i |f_i(x)-f_i(y)|. \end{aligned}$$

Now, for \(i \in [k]\), define

$$\begin{aligned} \zeta _i(x,y)=\sum _{\begin{array}{c} j:\exists \ell <j \\ f_{\ell k+i}(x)\!-\!f_{\ell k+i}(y)\ne 0 \end{array}}2^{jk+i}(|f_{jk+i}(x)\!-\!f_{jk+i}(y)|\!-\!\lfloor |f_{jk+i}(x)-f_{jk+i}(y)|\rfloor ). \end{aligned}$$

One can easily check that \(\sum _{i=1}^k \zeta _i(x,y)\le \zeta (x,y)\), thus showing the following for \(i \in [k]\) will complete our proof of the lemma,

$$\begin{aligned} |F_i(x)-F_i(y)| \ge (1-\varepsilon )\sum _{j\in {\mathbb {Z}}} (2^{jk+i} |f_{jk+i}(x)-f_{jk+i}(y)|)-2\zeta _i(x,y). \end{aligned}$$
(29)

Toward this end, fix \(i \in [k]\) and \(x,y {\in } X\). Let \(S = \{ j {\in } {\mathbb {Z}} : |f_{jk+i}(x)-f_{jk+i}(y)| {=} 1\}\), and \(T = \{ j \in {\mathbb {Z}} : 0 < |f_{jk+i}(x)-f_{jk+i}(y)| < 1\}\). Clearly we then have

$$\begin{aligned}&|F_i(x)-F_i(y)|\\&\quad =\big |\sum _{j\in S}2^{jk+i}(f_{jk+i}(x)-f_{jk+i}(y))+\sum _{j\in T}2^{jk+i}(f_{jk+i}(x)-f_{jk+i}(y))\big |. \end{aligned}$$

If \(S \cup T=\emptyset \), then (29) is immediate. Now, suppose that \(S \ne \emptyset \), and let \(c = i + k \cdot \max (S)\). Observe that \(\max (S)\) exists by (27).

We then have

$$\begin{aligned}&\sum _{j\in {\mathbb {Z}}} 2^{jk+i} |f_{jk+i}(x)-f_{jk+i}(y)|\\&\qquad \le 2^{c}+\mathop {\sum _{j\in S \cup T}}_{j<\max S} 2^{kj+i}+\mathop {\sum _{j\in T}}_{j > \max S} 2^{kj+i}|f_{kj+i}(x)-f_{kj+i}(y)|\\&\qquad \le 2^{c}+\sum _{j<\max S} 2^{kj+i}+\zeta _i(x,y)\le 2^{c}+2\cdot 2^{k(\max S-1)+i}+\zeta _i(x,y)\\&\qquad \le 2^{c}(1+2^{1-k})+\zeta _i(x,y)\le (1+\varepsilon /2)2^{c}+\zeta _i(x,y). \end{aligned}$$

On the other hand,

$$\begin{aligned} |F_{i}(x)-F_{i}(y)|&= \big |\sum _{j\in {\mathbb {Z}}} 2^{kj+i}( f_{jk+i}(x)-f_{jk+i}(y))\big |\\&\ge 2^{c}-\mathop {\sum _{j\in S\cup T}}_{j<\max S} 2^{kj+i}-\mathop {\sum _{j\in T}}_{j > \max S} 2^{kj+i}|f_{kj+i}(x)-f_{kj+i}(y)|\\&\ge 2^{c}-\sum _{j<\max S} 2^{kj+i}-\zeta _i(x,y)\ge 2^{c}-2\cdot 2^{k(\max S-1)+i}-\zeta _i(x,y)\\&\ge 2^{c}(1-2^{1-k})-\zeta _i(x,y)\ge (1-\varepsilon /2)2^{c}-\zeta _i(x,y). \end{aligned}$$

Therefore,

$$\begin{aligned}&(1-\varepsilon )\sum _{j\in {\mathbb {Z}}} 2^{kj+i}|f_{jk+i}(x)-f_{jk+i}(y)|\\&\quad \le (1-\varepsilon )((1+\varepsilon /2)2^c+\zeta _i(x,y))\le {(1-\varepsilon /2)2^c+\zeta _i(x,y)}\\&\quad \le |F_{i}(x)-F_{i}(y)|+2\zeta _i(x,y), \end{aligned}$$

completing the verification of (29) in the case when \(S \ne \emptyset \).

In the remaining case when \(S=\emptyset \) and \(T \ne \emptyset \), if the set \(T\) does not have a minimum element, then

$$\begin{aligned} \sum _{j\in T} 2^{kj+i}|f_{kj+i}(x)-f_{kj+i}(y)| = \zeta _i(x,y), \end{aligned}$$

making (29) vacuous since the right-hand side is non-positive.

Otherwise, let \(\ell =\min (T)\), and write

$$\begin{aligned}&|F_i(x)-F_i(y)|\\&\quad = \big |\sum _{j\in T} 2^{kj+i} (f_{kj+i}(x)-f_{kj+i}(y))\big |\\&\quad \ge 2^{\ell k+i}|f_{\ell k+i}(x)-f_{\ell k+i}(y)| -\big |\sum _{j\in T,j>\ell } 2^{kj+i} (f_{kj+i}(x)-f_{kj+i}(y))\big |\\&\quad \ge 2^{\ell k+i} |f_{\ell k+i}(x)-f_{\ell k+i}(y)|-\zeta _i(x,y)\\&\quad =\sum _{j\in {\mathbb {Z}}} 2^{kj+i} |f_{kj+i}(x)-f_{kj+i}(y)|-2\,\zeta _i(x,y). \end{aligned}$$

This completes the proof. \(\square \)

In Sect. 5, we will require the following straightforward corollary.

Corollary 3.5

For every \(\varepsilon \in (0,1)\) and \(m \in {\mathbb {N}}\), the following holds. Let \((X,d)\) be a metric space, and suppose we have a family of functions \(\{f_i:X\rightarrow [0,1]^m\}_{i \in {\mathbb {Z}}}\) such that for all \(x,y\in X\),

$$\begin{aligned} \sum _{i \in {\mathbb {Z}}} 2^i \Vert f_i(x)-f_i(y)\Vert _1 < \infty . \end{aligned}$$

Then there exists a mapping \(F : V \rightarrow \ell _1^{m(2+\lceil \log \frac{1}{\varepsilon } \rceil )}\) such that for all \(x,y\in X\),

$$\begin{aligned} (1-\varepsilon )\sum _{i \in {\mathbb {Z}}} \big (2^{i} \Vert f_i(x)-f_i(y)\Vert _1\big )- 2\,\zeta (x,y)\le \Vert F(x)-F(y)\Vert _1\le \sum _{i \in {\mathbb {Z}}} 2^{i}\Vert f_i(x)-f_i(y)\Vert _1, \end{aligned}$$

where

(30)

and we have used the notation \(x_k\) for the \(k\)-th coordinate of \(x \in {\mathbb {R}}^m\).

4 Scale Assignment

Let \(T=(V,E)\) be a metric tree with root \(r \in V\), equipped with a monotone coloring \(\chi : E \rightarrow {\mathbb {N}}\). We will now describe a way of assigning “scales” to the vertices of \(T\). These scale values will be used in Sect. 5 to guide our eventual embedding. The scales of a vertex will describe, roughly, the subset and magnitude of coordinates that should differ between the vertex and its neighbors. First, we fix some notation.

For every \(c \in \chi (E)\), we use \(\gamma _c\) to denote the path in \(T\) colored \(c\), and we use \(v_c\) to denote the vertex of \(\gamma _c\) which is closest to the root. We will also use the notation \(T(c)\) to denote the subtree of \(T\) under the color \(c\); formally, \(T(c)\) is the induced (rooted) subtree on \(\{v_c\} \cup V(T_u)\) where \(u \in V\) is the child of \(v_c\) such that \(\chi (v_c,u)=c\), and \(T_u\) is the subtree rooted at \(u\).

We will write \(p(v)\) for the parent of a vertex \(v \in V\), and \(p(r)=r\). Furthermore, we define the “parent color” of a color class by \(\rho (c) = \chi (v_c, p(v_c))\) with the convention that \(\chi (r,r)=c_0\), where \(c_0 \in {\mathbb {N}} \setminus \chi (E)\) is some fixed element. Finally, we put \(T(c_0)=T\).

4.1 Scale Selectors

We start by defining a function \(\kappa :\chi (E) \cup \{c_0\} \rightarrow {\mathbb {N}}\) which describes the “branching factor” for each color class,

$$\begin{aligned} \kappa (c)=\Big \lfloor \log _2 {|E(T(\rho (c)))|\over |E(T(c))|}\Big \rfloor +1. \end{aligned}$$
(31)

Moreover, we define \(\varphi : \chi (E) \cup \{c_0\} \rightarrow {\mathbb {N}} \cup \{0\}\) inductively by setting \(\varphi (c_0)=0\), and

$$\begin{aligned} \varphi (c)=\kappa (c)+\varphi (\rho (c)) \end{aligned}$$
(32)

for \(c\in \chi (E)\).

Observe that for every color \(c \in \chi (E)\), we have

$$\begin{aligned} \varphi (c)&= \sum _{c'\in \chi (E(P_{v_c}))\cup \{c\}}\quad \kappa (c') \le \sum _{c'\in \chi (E(P_{v_c}))\cup \{c\}} \Big (1+\log _2 {|E(T(\rho (c')))|\over |E(T(c'))|}\Big )\nonumber \\&\le M(\chi )+\log _2 |E| \end{aligned}$$
(33)

Next, we use \(\varphi \) to inductively define our scale selectors. Let

$$\begin{aligned} m(T)=\mathrm{min}{\big \{\mathsf{{len}}(e): e\in E \text { and } \mathsf{{len}}(e)>0\big \}}. \end{aligned}$$

We now define a family of functions \(\{\tau _i : V \rightarrow {\mathbb {N}} \cup \{0\}\}_{i \in {\mathbb {Z}}}\).

For \(v\in V\), let \(c=\chi (v,p(v))\), and put

$$\begin{aligned} \tau _i(v)=0 \quad \text {for } i<\Big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\Big \rfloor , \end{aligned}$$

and otherwise,

$$\begin{aligned}&\!\!\!\tau _i(v) \nonumber \\&=\min \Big (\underbrace{\Big \lceil {d_T(v,v_c)-\min \big (d_T(v,v_c), \sum _{j=-\infty }^{i{-}1} 2^{j}\tau _j(v)\big )\over 2^i}\Big \rceil }_{(\mathrm{A})}, \underbrace{\varphi (c)-\sum _{c'\in {\chi }(E(P_{v}))} \tau _i(v_{c'}) }_{(\mathrm{B})}\Big ).\nonumber \\ \end{aligned}$$
(34)

The value of \(\tau _i(v)\) will be used in Sect. 5 to determine how many coordinates of magnitude \(\asymp 2^i\) change as the embedding proceeds from \(v_c\) to \(v\). In this definition, we try to cover the distance from root to \(v\) with the smallest scales possible while satisfying the inequality

$$\begin{aligned} \varphi (c)\ge \tau _i(v)+\sum _{c'\in \chi (E(P_{v}))} \tau _i(v_{c'}). \end{aligned}$$

For \(v \in V\setminus \{r\}\), let \(c={\chi }(v,p(v))\), for each \(i \in {\mathbb {Z}}\), part (B) of (34) for \(\tau _i(v_c)\) implies that

$$\begin{aligned} \tau _i(v_c)\le \varphi (\rho (c))-\sum _{c'\in {\chi }(E(P_{v_c}))} \tau _i(v_{c'}). \end{aligned}$$

Hence,

$$\begin{aligned}&\varphi (c)-\sum _{c'\in {\chi }(E(P_{v}))} \tau _i(v_{c'})\nonumber \\&\quad = \varphi (c)-\tau _i(v_c)-\sum _{c'\in \chi (E(P_{v_c}))} \tau _i(v_{c'})\ge \varphi (c)-\varphi (\rho (c))= \kappa (c)\ge 1 .\qquad \end{aligned}$$
(35)

Therefore, part (B) of (34) is always positive, so if \(\tau _k(v)=0\) for some \(k\ge \big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \), then \(\tau _k(v)\) is defined by part (A) of (18). Hence \(\sum _{j=-\infty }^{k-1} 2^{j}\tau _j(v) \ge d_T(v,v_c)\) and the following observation is immediate.

Observation 4.1

For \(v\in V\) and \(k\ge \big \lfloor \log _2\big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \), if \(\tau _k(v)=0\) then for all \(i\ge k, \tau _i(v)=0\).

Comparing part (A) of (34) for \(\tau _i(v)\) and \(\tau _{i+1}(v)\) also allows us to observe the following.

Observation 4.2

For \(v\in V\) and \(k\ge \big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \), if part (A) in (34) for \(\tau _k(v)\) is less than or equal to part (B) then for all \(i> k, \tau _i(v)=0\).

4.2 Properties of the Scale Selector Maps

We now prove some key properties of the maps \(\kappa , \varphi \), and \(\{\tau _i\}\).

Lemma 4.3

For every vertex \(v\in V\) with \(c={\chi }(v,p(v))\), the following holds. For all \({i}\in {\mathbb {Z}}\) with \({{d_T(v,v_c)}\over \kappa (c)} \le 2^{i-1}\), we have \(\tau _i(v)=0.\)

Proof

If \(d_T(v,v_c)=0\), the lemma is vacuous. Suppose now that \(d_T(v,v_c)>0\), and let \(k=\lceil \log _2\big ({\frac{d_T(v,v_c)}{\kappa (c)}}\big )\rceil .\) We have \(d_T(v,v_c)\ge m(T)\) and \(\kappa (c)\le \log _2 |E|+1\), therefore

$$\begin{aligned} k\ge \Big \lfloor \log _2 \Big (\frac{m(T)}{M(\chi )+\log _2 |E|}\Big )\Big \rfloor . \end{aligned}$$

It follows that for \(i \ge k, \tau _i(v)\) is given by (34).

If \(\tau _k(v)=0\), then by Observation 4.1, for all \(i\ge k, \tau _i(v)=0\).

On the other hand if \(\tau _k(v)\ne 0\) then either it is determined by part (B) of (34), in which case

$$\begin{aligned} \tau _k(v)&= \varphi (c)-\sum _{c'\in {\chi }(E(P_{v}))} \tau _k(v_{c'})= \varphi (c)-\tau _k(v_c)-\sum _{c'\in \chi (E(P_{v_c}))} \tau _k(v_{c'})\\&\ge \varphi (c)-\varphi (\rho (c)) = \kappa (c), \end{aligned}$$

implying that

$$\begin{aligned} \sum _{j=-\infty }^{k} 2^j\tau _j(v)\ge \kappa (c)2^k\ge d_T(v,v_c). \end{aligned}$$

Examining part (A) of (34), we see that \(\tau _{k+1}(v)=0\), and by Observation 4.1, \(\tau _i(v)=0\) for \(i > k\). Alternately, \(\tau _k(v)\) is determined by part (A) of (34), and by Observation 4.2 \(\tau _{i}(v)=0\) for \(i>k\), completing the proof. \(\square \)

The next lemma shows how the values \(\{\tau _i(v)\}\) track the distance from \(v_c\) to \(v\).

Lemma 4.4

For any vertex \(v\in V\) with \(c={\chi }(v,p(v))\), we have

$$\begin{aligned} d_T(v,v_c)\le \sum _{i=-\infty }^\infty 2^i\tau _i(v) \le 3\,d_T(v,v_c). \end{aligned}$$

Proof

If \(d_T(v,v_c)=0\), the lemma is vacuous. Suppose now that \(d_T(v,v_c)>0\), and let

$$\begin{aligned} k=\max \{i:\tau _i(v)\ne 0\}. \end{aligned}$$

By Lemma 4.3, the maximum exists.

We have \(\tau _{k+1}(v)=0\), and thus inequality (35) implies that part (A) of (34) specifies \(\tau _{k+1}(v)\), yielding

$$\begin{aligned} d_T(v,v_c)\le \sum _{i=-\infty }^{k} 2^i\tau _i(v) = \sum _{i=-\infty }^{\infty } 2^i\tau _i(v). \end{aligned}$$

On the other hand, since \(\tau _k(v)>0\), we must have \(d_T(v,v_c)>\sum _{i=-\infty }^{k-1} 2^i\tau _i(v),\) and Lemma 4.3 implies that \(2^k< 2\,d_T(v,v_c),\) hence,

$$\begin{aligned} \sum _{i=-\infty }^k 2^i\tau _i(v)&\le \sum _{i=-\infty }^{k-1} 2^i\tau _i(v)+2^k\Big \lceil \frac{d_T(v,v_c)-\sum _{i=-\infty }^{k-1} 2^i\tau _i(v)}{2^k}\Big \rceil \\&< \sum _{i=-\infty }^{k-1} 2^i\tau _i(v) +2^k\Big ( {d_T(v,v_c)-\sum _{i=-\infty }^{k-1} 2^i\tau _i(v)\over 2^k}+1\Big )\\&= \sum _{i=-\infty }^{k-1} 2^i\tau _i(v) +2^k+\Big (d_T(v,v_c)-\sum _{i=-\infty }^{k-1} 2^i\tau _i(v)\Big )\\&\le d_T(v,v_c)+2^k< 3\,d_T(v,v_c). \end{aligned}$$

\(\square \)

The following lemma shows that for any color \(c\in \chi (E)\) the value of \(\tau _i\) does not decrease as we move further from \(v_c\) in \(\gamma _c\).

Lemma 4.5

Let \(u,w\in V\) be such that \(c={\chi }(w,p(w))={\chi }(u,p(u))\), and \(d_T(w,v_c)\le d_T(u,v_c)\). Then for all \(i\in {\mathbb {Z}}\), we have

$$\begin{aligned} \tau _i(w)\le \tau _i(u). \end{aligned}$$

Proof

First let \(k\) be the smallest integer for which

$$\begin{aligned} \Big \lceil \frac{d_T(w,v_c)-\min \big (d_T(w,v_c), \sum _{j=-\infty }^{k-1} 2^{j}\tau _j(w)\big )}{2^k}\Big \rceil \le \varphi (c)-\sum _{c'\in {\chi }(E(P_{w}))} \tau _k(v_{c'}). \end{aligned}$$

This \(k\) exists since, by (35), the right-hand side is always positive, while by Lemma 4.3, the left-hand side must be zero for some \(k \in {\mathbb {Z}}\) large enough.

For \(i>k\), by Observation 4.2 we have, \(\tau _i(w)=0\). Therefore, for \(i>k\), we have \(\tau _i(u)\ge \tau _i(w)\). We now use induction on \(i\) to show that for \(i< k, \tau _i(u)=\tau _i(w)\), and for \(i=k, \tau _k(u)\ge \tau _k(w)\). Recall that, for \(i<\big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \), we have \(\tau _i(w)=\tau _i(u)=0\), which gives us the base case of the induction.

Now, by definition of \(k\), part (B) of (34) for \(\tau _{k-1}(w)\) is an integer strictly less than part (A), hence

$$\begin{aligned} \sum _{j=-\infty }^{k-1} 2^{j}\tau _j(w)&= 2^{k-1}\tau _{k-1}(w) + \sum _{j=-\infty }^{k-2} 2^{j}\tau _j(w) \nonumber \\&\le 2^{k-1}\Big (\Big \lceil \frac{d_T(w,v_c)- \sum _{j=-\infty }^{k-2} 2^{j}\tau _j(w)}{2^{k-1}}\Big \rceil -1\Big ) + \sum _{j=-\infty }^{k-2} 2^{j}\tau _j(w) \nonumber \\&< 2^{k-1}\Big (\frac{d_T(w,v_c)- \sum _{j=-\infty }^{k-2} 2^{j}\tau _j(w)}{2^{k-1}}\Big ) + \sum _{j=-\infty }^{k-2} 2^{j}\tau _j(w) \nonumber \\&\le d_T(w,v_c). \end{aligned}$$
(36)

For \(\big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \le i\le k\), by (36), and as \(d_T(u,v_c) \ge d_T(w,v_c)\), we have

$$\begin{aligned}&\min \Big (d_T(w,v_c), \sum _{j=-\infty }^{i-1} 2^{j}\tau _j(w)\Big )\nonumber \\&\qquad = \sum _{j=-\infty }^{i-1} 2^{j}\tau _j(w)= \min \Big (d_T(u,v_c), \sum _{j=-\infty }^{i-1} 2^{j}\tau _j(w)\Big ). \end{aligned}$$
(37)

By our induction hypothesis for all \(j<i, \tau _j(w)=\tau _j(u)\), so using (37) we can write

$$\begin{aligned}&d_T(w,v_c)-\min \Big (d_T(w,v_c), \sum _{j=-\infty }^{i-1} 2^{j}\tau _j(w)\Big )\nonumber \\&\quad \le d_T(u,v_c)-\min \Big (d_T(u,v_c), \sum _{j=-\infty }^{i-1} 2^{j}\tau _j(u)\Big ). \end{aligned}$$
(38)

Since \({\chi }(w,p(w))={\chi }(u,p(u))\), for all \(i\in {\mathbb {Z}}\) part (B) of (34) is identical for \(\tau _i(u)\) and \(\tau _i(w)\). Therefore, using (38), and the definition of \(k\), for all \(\big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \le i<k\), part (B) of (34) specifies \(\tau _i(u)\) and \(\tau _i(w)\), hence

$$\begin{aligned} \tau _i(u)=\tau _i(w)=\varphi (c)-\sum _{c'\in {\chi }(E(P_{w}))} \tau _i(v_{c'}). \end{aligned}$$

For the case that \(i=k\), part (B) of (34) is identical for \(\tau _k(u)\) and \(\tau _k(w)\), and inequality (38) implies that part (A) of (34) for \(\tau _k(u)\) is at least as large as part (A) of (34) for \(\tau _k(w)\), completing the proof. \(\square \)

The next lemma bounds the distance between two vertices in the graph based on \(\{\tau _i\}\).

Lemma 4.6

Let \(k> \big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor \) be an integer. For any two vertices \(w\) and \(u\) such that \(\tau _k(u)\ne 0, \tau _{k-1}(w)=0\) and \({\chi }(w,p(w))={\chi }(u,p(u))\), we have

$$\begin{aligned} d_T(u,w)> 2^{k-1}. \end{aligned}$$

Proof

By Observation 4.1, \(\tau _k(w)=0\). Letting \(c={\chi }(u,p(u))\), by Lemma 4.5 we have \(d_T(v_c,u)\ge d_T(v_c,w)\). Using Lemma 4.5 again, we can conclude that for all \(i\in {\mathbb {Z}}, \tau _i(u)\ge \tau _i(w)\). Since \(\tau _{k-1}(w)=0\), inequality (35) implies that part (A) of (34) specifies \(\tau _{k-1}(w)\). Therefore,

$$\begin{aligned} d_T(w,v_c)&\le \sum _{i=-\infty }^{k-2} 2^{i}\tau _i(w)\le \sum _{i=-\infty }^{k-2} 2^{i}\tau _i(u) \nonumber \\&= \Big (\sum _{i=-\infty }^{k-1} 2^{i}\tau _i(u)\Big )- 2^{k-1}\tau _{k-1}(u). \end{aligned}$$
(39)

Since \(\tau _k(u)>0 \), using part \((A)\) of (34), we can write

$$\begin{aligned} d_T(u,v_c)> \sum _{i=-\infty }^{k-1} 2^{i}\tau _i(u). \end{aligned}$$
(40)

Observation 4.1 implies that \(\tau _{k-1}(u)\ne 0\), thus \(\tau _{k-1}(u) \ge 1\), and using (39) and (40), we have

$$\begin{aligned} d_T(w,u)= d_T(u,v_c)-d_T(w,v_c)>2^{k-1 }, \end{aligned}$$

completing the proof. \(\square \)

The next lemma and the following two corollaries bound the number of colors \(c\) in the tree which have a small value of \(\varphi (c)\).

Lemma 4.7

For any \(k\in {\mathbb {N}}\cup \{0\}\), and any color \(c\in \chi (E)\), we have

$$\begin{aligned} \#\big \{c'\in \chi (E(T(c))): \varphi (c')-\varphi (c)=k\big \} \le 2^{k}. \end{aligned}$$

Proof

We start the proof by comparing the size of the subtrees \(T(c')\) and \(T(c)\) for \(c'\in {\chi }(E(T(c)))\).

For a given color \(c'\in {\chi }(E(T(c)))\), we define the sequence \(\{c_i\}_{i\in {\mathbb {N}}}\) as follows. We put \(c_1=c'\) and for \(i>1\) we put \(c_i=\rho (c_{i-1})\). Suppose now that \(c_m=c\), we have

$$\begin{aligned} \varphi (c_m)-\varphi (c_1)= \sum _{i=1}^{m-1} \kappa (c_i)\ge \sum _{i=1}^{m-1}\log _2\Big ( {|E(T({c_{i+1}))}|\over |E(T({c_{i}))}|}\Big ) \ge \log _2\Big ( {|E(T(c))|\over |E(T({c'}))|}\Big ).\qquad \end{aligned}$$
(41)

This inequality implies that

$$\begin{aligned} |E(T(c))|\le 2^{\varphi (c')-\varphi (c)}|E(T(c'))|. \end{aligned}$$

It is easy to check that for colors \(a,b\in \chi (E(T(c)))\) such that \(\varphi (a)=\varphi (b)\), subtrees \(T(a)\) and \(T(b)\) are edge disjoint. Therefore, for \(k \in {\mathbb {N}}\cup \{0\}\), summing over all the colors \(c'\) such that \(\varphi (c')-\varphi (c)=k\) gives

$$\begin{aligned}&\#\{c'\in \chi (E(T(c))): \varphi (c')-\varphi (c) = k\}\,\\&\quad \le \!\!\sum _{\begin{array}{c} c' \in \chi (E(T(c)))\\ \varphi (c')-\varphi (c)=k \end{array}} \,{2^k\,|E(T(c'))|\over |E(T(c))|}=2^k\!\!\!\!\sum _{\begin{array}{c} c' \in \chi (E(T(c)))\\ \varphi (c')-\varphi (c)=k \end{array}} \,{|E(T(c'))|\over |E(T(c))|} \le 2^k. \end{aligned}$$

\(\square \)

The following two corollaries are immediate from Lemma 4.7.

Corollary 4.8

For any \(k\in {\mathbb {N}}\), and any color \(c\in \chi (E)\), we have

$$\begin{aligned} \#\{c'\in \chi (E(T(c))): \varphi (c')-\varphi (c)\le k\} < 2^{k+1}. \end{aligned}$$

Corollary 4.9

For any color \(c\in \chi (E)\), and constant \(C\ge 2\), we have

$$\begin{aligned} \sum _{c'\in \chi (E(T(c)))\setminus \{c\}} 2^{-C(\varphi (c')-\varphi (c))}< 2^{2-C}. \end{aligned}$$

The next lemma is similar to Lemma 4.6. The assumption is more general, and the conclusion is correspondingly weaker. This result is used primarily to enable the proof of Lemma 4.11.

Lemma 4.10

Let \(u\in V\) and \(w\in V(P_u)\) be such that \(\varphi (\chi (u,p(u)))>\varphi (\chi (w,p(w)))\). For all vertices \(x\in V(T_u)\), and \(k\in {\mathbb {Z}}\) with

$$\begin{aligned} 2^{k} \ge \Big ({6\,d_T(x,w)\over \varphi ({\chi }(u,p(u)))-\varphi ({\chi }(w,p(w)))}\Big ), \end{aligned}$$
(42)

we have \(\tau _k(x)=0\).

Proof

In the case that \(d_T(x,w)=0\), this lemma is vacuous. Suppose now that \(d_T(x,w)>0\). Let \(c_1,\ldots , c_m\) be the set of colors that appear on the path \(P_{x\,p(w)}\), in order from \(x\) to \(p(w)\), and for \(i\in [m]\), let \(y_i=v_{c_i}\). We prove this lemma by showing that if

$$\begin{aligned} k\ge \log _2 \Big ({6\,d_T(x,w)\over \varphi ({\chi }(u,p(u)))-\varphi ({\chi }(w,p(w)))}\Big ), \end{aligned}$$
(43)

then part \((A)\) of (34) for \(\tau _{k}(x)\) is zero. First note that \(\varphi ({\chi }(u,p(u)))-\varphi ({\chi }(w,p(w)))\!\le \! M(\chi )+\log _2 |E|\) and \(d_T(x,w){\ge } m(T)\), hence (43) implies

$$\begin{aligned} k \ge \Big \lfloor \log _2 \Big (\frac{m(T)}{M(\chi )+\log _2 |E|}\Big )\Big \rfloor . \end{aligned}$$

By Lemma 4.4, we have

$$\begin{aligned} \sum _{i=1}^{m-2} 2^{k-1}\tau _{k-1}(y_{i})\le \sum _{i=1}^{m-2} \sum _{j=-\infty }^{\infty } 2^{j}\tau _j(y_{i})\le \sum _{i=1}^{m-2} 3\,d_T(y_{i},y_{{i+1}})= 3\,d_T(y_{1},y_{{m-1}}).\nonumber \\ \end{aligned}$$
(44)

Now, using (42) gives

$$\begin{aligned} \varphi (c_1)-\varphi (c_{m})&\ge \varphi ({\chi }(u,p(u)))-\varphi ({\chi }(w,p(w)))\nonumber \\&\ge {6\,d_T(x,w)\over 2^{k}}\ge {6\,d_T(x,y_{{m-1}})\over 2^{k}}. \end{aligned}$$
(45)

Using the above inequality and (44), we can write

$$\begin{aligned} d_T(x,y_{1})&= d_T(x,y_{m-1})-d_T(y_{1},y_{{m-1}})\\&\le {2^{k-1}\over 3}\Big ( \varphi (c_1)-\varphi (c_{m})-\sum _{i=1}^{m-2} \tau _{k-1}(y_{i})\Big ). \end{aligned}$$

First, note that \(c_m=\chi (y_{m-1},p(y_{m-1}))\). Now, we use part (B) of (34) for \(\tau _k(y_{m-1})\) to write

$$\begin{aligned} d_T(x,y_{1})&\le {2^{k-1}\over 3}\Big (\varphi (c_1)\!-\!\big (\tau _{k-1}(y_{m-1})\!+\!\sum _{c'\in \chi (E(P_{y_{{m-1}}}))} \tau _{k-1}(v_{c'})\big )-\sum _{i=1}^{m-2} \tau _{k-1}(y_{i})\Big )\nonumber \\&\le {2^{k-1}\over 3}\Big ( \varphi (c_1)-\sum _{c'\in {\chi }(E(P_{x}))} \tau _{k-1}(v_{c'})\Big )\nonumber \\&\le {2^{k-1}}\Big ( \varphi ({\chi }({x},p(x)))-\sum _{c'\in {\chi }(E(P_{x}))} \tau _{k-1}(v_{c'})\Big ). \end{aligned}$$
(46)

Therefore, either part (A) of (34) specifies \(\tau _{k-1}(x)\) in which case by Observation 4.2, \(\tau _{i}(v)=0\) for \(i\ge k\), or part (B) of (34) specifies \(\tau _{k-1}(x)\) in which case by (46) we have

$$\begin{aligned} \tau _{k-1}(x)2^{k-1}\ge d_T(x,y_{1}), \end{aligned}$$

and part (A) of (34) is zero for \(i\ge k\). \(\square \)

In Sect. 5, we give the description of our embedding and analyze its distortion. In the analysis of the embedding, for a given pair of vertices \(x,y\in V\), we divide the path between \(x\) and \(y\) into subpaths and for each subpath we show that either the contribution of that subpath to the distance between \(x\) and \(y\) in the embedding is “large” through a concentration of measure argument, or we use the following lemma to show that the length of the subpath is “small,” compared to the distance between \(x\) and \(y\). The complete argument is somewhat more delicate and one can find the details of how Lemma 4.11 is used in the proof of Lemma 5.15.

Lemma 4.11

There exists a constant \(C > 0\) such that the following holds. For any \(c \in \chi (E)\) and \(v\in V(T({c}))\) with \(v\ne v_c\) and for any \(\varepsilon \in (0,\frac{1}{2}]\), there are vertices \(u,u' \in V\) with \(u \ne u'\) and \(d_T(u,v) \le \varepsilon \,d_T(u,u')\), and such that

$$\begin{aligned} u,u'&\in \{v_a:a\in \chi (E(P_{v \,v_c}))\}\cup \{v\}. \end{aligned}$$

Furthermore, for all vertices \(x\in V(P_{u'u})\setminus \{u'\}\), for all \(k \in {\mathbb {Z}}\),

$$\begin{aligned} \tau _k(x) \ne 0 \implies 2^k < \Big ({C{d_T(u,u')} \over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (\chi (v_c,p(v_c))))}\Big ). \end{aligned}$$

Proof

Let \(r'=v_c\), and let \(c_1,\ldots , c_m\) be the set of colors that appear on the path \(P_{vr'}\) in order from \(v\) to \(r'\), and put \(c_{m+1}={\chi }(r',p(r'))\). We define \(y_0=v\), and for \(i\in [m], y_i=v_{c_i}\). Note that \(\{y_0,\ldots , y_m\}=\{v\}\cup \{v_a:a\in \chi (E(P_{v \,v_c}))\}\), and for \(i\le m, \chi (y_i,p(y_i))=c_{i+1}\). We give a constructive proof for the lemma.

For \(i\in {\mathbb {N}}\), we construct a sequence \((a_i,b_i)\in {{\mathbb {N}}}\times {\mathbb {N}}\), the idea being that \(P_{y_{a_i},y_{b_i}}\) is a nonempty subpath \(P_{vr'}\) such that for different values of \(i\), these subpaths are edge disjoint. At each step of the construction either we can use \((a_i,b_i)\) to find \(u\) and \(u'\) such that they satisfy the properties of this lemma, or we find \((a_{i+1},b_{i+1})\) such that \(b_{i+1}<b_i\). The last condition guarantees that we can always find \(u\) and \(u'\) that satisfy the conditions of this lemma.

We start with \(a_1=m\) and \(b_1=m-1\). If \(d_T(v,y_{b_1})\le \varepsilon d_T(y_{a_1},y_{b_1})\) then

$$\begin{aligned} \Big ({2{d_T(y_m,y_{m-1})} \over \varphi (\chi (y_{m-1},p(y_{m-1})))-\varphi (\chi (r',p(r')))}\Big ) = {{2d_T(y_{a_1},y_{b_1})}\over \kappa (c)} \end{aligned}$$

and by Lemma 4.3 the assignment \(u'=y_{a_1}\) and \(u=y_{b_1}\) satisfies the conditions of this lemma if \(C \ge 1\). Otherwise, for \(i \ge 1\), we choose \((a_{i+1},b_{i+1})\) based on \((a_i,b_i)\), and construct the rest of the sequence preserving the following three properties:

  1. (i)

    \(\varphi (c_{b_i+1})-\varphi (c_{a_i+1})\ge \varphi (c_{a_i+1} )-\varphi (\chi (r',p(r')))\);

  2. (ii)

    \(d_T(y_{b_i},v) \ge \varepsilon d_T(y_{b_i},y_{a_i})\);

  3. (iii)

    \(a_i>b_i\).

Let \({j}\in \{0,\ldots , m\}\) be the maximum integer such that \(\varepsilon d_T(y_{j},y_{b_i})\ge d_T(v,y_{j})\). Note that \(j<b_i\), and the maximum always exists because \(y_0 = v\). We will now split the proof into three cases.

Case I: \(\varphi (c_{j+2})-\varphi (c_{b_i+1})\ge 2(\varphi (c_{b_i+1})-\varphi (c_{a_{i}+1})).\)

In this case by condition (iii), \(\varphi (c_{b_i+1})-\varphi (c_{a_{i}+1})>0\). Hence \(j+1<b_i\), and we can preserve conditions (i), (ii) and (iii) with

$$\begin{aligned} (a_{i+1},b_{i+1})=(b_i, {j+1}). \end{aligned}$$

Case II: \(\varphi (c_{j+2})-\varphi (c_{b_i+1})< 2(\varphi (c_{b_i+1})-\varphi (c_{a_{i}+1}))\) and \(\varphi (c_{j+1})-\varphi (c_{b_i+1})\ge 6(\varphi (c_{b_i+1})-\varphi (c_{a_i+1}))\).

In this case by (32) we have

$$\begin{aligned} \kappa (c_{j+1})= \varphi (c_{j+1})-\varphi (c_{{j+2}})=(\varphi (c_{j+1}) -\varphi (c_{b_i+1}))-(\varphi (c_{j+2})-\varphi (c_{b_i+1})). \end{aligned}$$

Using the conditions of this case, we write

$$\begin{aligned} \kappa (c_{j+1})&= (\varphi (c_{j+1})-\varphi (c_{b_i+1})) -(\varphi (c_{j+2})-\varphi (c_{b_i+1}))\\&\ge 6(\varphi (c_{b_i+1})-\varphi (c_{a_{i}+1})) -(\varphi (c_{j+2})-\varphi (c_{b_i+1}))\\&= \big (2(\varphi (c_{b_i\!+\!1})\!-\!\varphi (c_{a_{i}+1})) \!+\!4(\varphi (c_{b_i+1})\!-\!\varphi (c_{a_i+1}))\big )-\big (\varphi (c_{j+2})\!-\!\varphi (c_{b_i+1})\big )\\&> \big (2(\varphi (c_{b_i+1})\!-\!\varphi (c_{a_{i}+1})) \!+\!2(\varphi (c_{j+2})\!-\!\varphi (c_{b_i+1}))\big )\!-\!\big (\varphi (c_{j+2})\!-\!\varphi (c_{b_i+1})\big ), \end{aligned}$$

and by condition (i),

$$\begin{aligned} \kappa (c_{j+1})&> \big (\big (\varphi (c_{b_i+1})-\varphi (c_{a_{i}+1})\big ) +\big (\varphi (c_{a_i+1})-\varphi (\chi (r',p(r'))\big )\nonumber \\&\quad +2(\varphi (c_{j+2})-\varphi (c_{b_i+1}))\big )-\big (\varphi (c_{j+2})-\varphi (c_{b_i+1})\big )\nonumber \\&= \varphi (c_{j+2})-\varphi (\chi (r',p(r'))). \end{aligned}$$
(47)

Thus if \( d_T(y_{{j+1}},v) \ge \varepsilon \, d_T(y_j,y_{{j+1}})\), then \((a_{i+1},b_{i+1})=({j+1},j)\), satisfies condition (i) by (47), and it is also easy to verify that it satisfies conditions (ii) and (iii). If \( d_T(y_{{j+1}},v) < \varepsilon \, d_T(y_{j},y_{{j+1}})\), then by (32),

$$\begin{aligned} \varphi (\chi (y_{j},p(y_{j})))=\varphi (c_{j+1}) =\kappa (c_{j+1})+\varphi (c_{j+2}) \end{aligned}$$

and by (47),

$$\begin{aligned}&\Big ({2{d_T(y_j,y_{j+1})} \over (\varphi (\chi (y_{j},p(y_{j})))-\varphi (\chi (r',p(r'))))}\Big )\\&\qquad = \Big ({2{d_T(y_j,y_{j+1})} \over \kappa (c_{j+1})+\varphi (c_{j+2})-\varphi (\chi (r',p(r')))}\Big )> {{d_T(y_{j},y_{{j+1}})}\over \kappa (c_{j+1})}. \end{aligned}$$

Hence Lemma 4.3 implies that the assignment \(u'=y_{{j+1}}\) and \(u=y_{j}\) satisfies the conditions of this lemma if \(C \ge 2\).

Case III: \(\varphi (c_{j+1})-\varphi (c_{b_i+1})< 6(\varphi (c_{b_i+1})-\varphi (c_{a_i+1}))\).

In this case we use Lemma 4.10 to show that the assignment \(u=y_{j}\) and \(u'=y_{b_i}\) satisfies the conditions of the lemma. We have

$$\begin{aligned}&\varphi (\chi (y_j,p(y_j)))-\varphi (\chi (r',p(r')))\\&\quad = \varphi (c_{j+1})-\varphi (\chi (r',p(r')))\\&\quad = (\varphi (c_{j+1}\!-\!\varphi (c_{{b_i}+1}))+(\varphi (c_{b_i+1}) -\varphi (c_{a_i+1}))\!+\!(\varphi (c_{a_i+1})-\varphi (\chi (r',p(r'))))\\&\quad < 6(\varphi (c_{b_i+1})\!-\!\varphi (c_{{a_i}+1}))\!+\!(\varphi (c_{b_i+1}) -\varphi (c_{a_i+1}))\!+\!(\varphi (c_{a_i+1})\!-\!\varphi (\chi (r',p(r')))), \end{aligned}$$

and by condition (i),

$$\begin{aligned} \varphi (\chi (y_j,p(y_j)))-\varphi (\chi (r',p(r'))) < 8(\varphi (c_{b_i+1}) -\varphi (c_{a_i+1})). \end{aligned}$$

Condition (ii) and the definition of \(y_j\) imply that

$$\begin{aligned} d_T(y_{j},y_{b_i})\ge {(1-\varepsilon )d_T(v,y_{b_i})}\ge \varepsilon {(1-\varepsilon )d_T(y_{a_i},y_{b_i})}\ge {\varepsilon \over 2}\,d_T(y_{a_i},y_{b_i}). \end{aligned}$$

Hence,

$$\begin{aligned} \Big ({6({2\over \varepsilon }){d_T(y_j,y_{b_{i}})} \over {1\over 8}(\varphi (\chi (y_{j},p(y_{j})))-\varphi (\chi (r',p(r'))))}\Big )\ge \Big ({6d_T(y_{b_i},y_{a_i})\over \varphi (c_{b_i+1})-\varphi (c_{a_i+1})}\Big ), \end{aligned}$$

and by applying Lemma 4.10 with \(u=y_{b_i}\) and \(w=y_{a_i}\), we can conclude that the assignment \(u=y_{j}\) and \(u'=y_{b_i}\) satisfies the conditions of this lemma with \(C=96\). \(\square \)

5 The Embedding

We now present a proof of Theorem 3.1, thereby completing the proof of Theorem 1.1. We first introduce a random embedding of the tree \(T\) into \(\ell _1\), and then show that, for a suitable choice of parameters, with non-zero probability our construction satisfies the conditions of the theorem.

Notation: We use the notations and definitions introduced in Sect. 4. Moreover, in this section, for \(c\in \chi (E)\cup \{{\chi }(r,p(r))\}\), we use \(\rho ^{-1}(c)\) to denote the set of colors \(c'\in \chi (E)\) such that \(\rho (c')=c\), i.e. the colors of the “children” of \(c\). For \(m,n\in {\mathbb {N}}\), and \(A\in {\mathbb {R}}^{m\times n}\), we use the notation \(A[i]\) to refer to the \(i\)th row of \(A\) and \(A[i,j]\) to refer to the \(j\)th element in the \(i\)th row.

5.1 The Construction

Fix \(\delta , \varepsilon \in (0,{1\over 2}]\), and let

$$\begin{aligned} t=\lceil \varepsilon ^{-1}+\log \lceil \log _2 1/\delta \rceil \rceil \end{aligned}$$
(48)

and

$$\begin{aligned} m=\lceil t^2( M(\chi )+\log _2 |E|) \rceil . \end{aligned}$$
(49)

(See Lemma 5.15 for the relation between \(\varepsilon \) and \(\delta \), and the parameters of Theorem 3.1.) For \(i\in {\mathbb {Z}}\), we first define the map \(\Delta _{i}:V\rightarrow {\mathbb {R}}^{m \times t}\), and then we use it to construct our final embedding.

For a vertex \(v\in V\) and \(c={\chi }(v,p(v))\), let \(\alpha =\sum _{c'\in {\chi }(E(P_{v}))} t^2\tau _i(v_{c'})\), and

$$\begin{aligned} \beta =\alpha +\min \Big (t^2\tau _i(v),\big \lfloor {d_T(v_c,v)-\sum _{\ell =-\infty }^{i-1} 2^\ell \tau _\ell (v)\over 2^i/{t^2}} \big \rfloor \Big ). \end{aligned}$$

Note that \(\beta \le m\) since

$$\begin{aligned} \tau _i(v) + \sum _{c' \in \chi (E(P_v))} \tau _i(v_c') \le \varphi (c) \le M(\chi ) + \log _2 |E|. \end{aligned}$$

For \(j \in [m]\), we define

$$\begin{aligned} \Delta _{i}(v)[j]= \left\{ \begin{array}{l} \big ({2^i\over {t^2}},{0,0\ldots , 0}\big )\quad \text { if } \alpha <{j} \le \beta ,\\ \big (d_T(v_c,v)-\big (\big (\sum _{\ell =-\infty }^{i-1} 2^\ell \tau _\ell (v)\big )+(\beta -\alpha ){2^i\over {t^2}}\big ),{0,0\ldots , 0}\big )\\ \quad \text { if } j=\beta +1 \text { and } \beta -\alpha < t^2\tau _i(v),\\ ({0,0\ldots , 0}) \quad \text { otherwise.} \end{array} \right. \end{aligned}$$
(50)

Observe that the scale selector \(\tau _i\) chooses the scales in this definition, and for \(v\in V\) and \(i\in {\mathbb {Z}}, \Delta _i(v)=0\) when \(\tau _i(v)=0\). Also note that the second case in the definition only occurs when \(\tau _i(v)\) is specified by part (A) of (34), and in that case \(\sum _{\ell \le i}2^\ell \tau _\ell (v) > d(v,v_c)\).

Now, we present some key properties of the map \(\Delta _i(v)\). The following two observations follow immediately from the definitions.

Observation 5.1

For \(v\in V\) and \(i\in {\mathbb {Z}}\), each row in \(\Delta _i(v)\) has at most one non-zero coordinate.

Observation 5.2

For \(v\in V\) and \(i\in {\mathbb {Z}}\), let \(\alpha =\sum _{c'\in {\chi }(E(P_{v}))} t^2\tau _i(v_{c'})\). For \(j\notin (\alpha ,\alpha +t^2\tau _i(v)]\), we have

$$\begin{aligned} \Delta _{i}(v)[j]=({0,\ldots ,0}). \end{aligned}$$

Proofs of the next four lemmas will be presented in Sect. 5.2.

Lemma 5.3

For \(v\in V\), there is at most one \(i\in {\mathbb {Z}}\) and at most one couple \((j,k)\in [m]\times [t]\) such that \(\Delta _i(v)[j,k]\notin \{0,{2^i\over t^2}\}\).

Lemma 5.4

Let \(c\in \chi (E)\), and \(u,w\in V(\gamma _c)\backslash \{v_c\}\) be such that \(d_T(w,v_c)\le d_T(u,v_c)\). For all \(i\in {\mathbb {Z}}\) and \((j,k)\in [m]\times [t]\), we have

$$\begin{aligned} \Delta _i(w)[j,k]\le \Delta _i(u)[j,k]. \end{aligned}$$

Lemma 5.5

For \(c\in \chi (E)\), and \(u,w\in V(\gamma _c)\setminus \{v_c\}\), we have

$$\begin{aligned} d_T(w,u)=\sum _{i\in {\mathbb {Z}}} \Vert \Delta _i(u)-\Delta _i(w)\Vert _1 \end{aligned}$$
(51)

and

$$\begin{aligned} d_T(v_c,u)=\sum _{i\in {\mathbb {Z}}} \Vert \Delta _i(u)\Vert _1. \end{aligned}$$
(52)

Lemma 5.6

For \(c\in \chi (E), u,w\in V(\gamma _c)\setminus \{v_c\}, i> j\) and \(k \in [m]\), if both \(\Vert \Delta _{i}(u)[k]-\Delta _i(w)[k]\Vert _1\ne 0\), and \(\Vert \Delta _j(u)[k]-\Delta _j(w)[k]\Vert _1\ne 0\), then \(d_T(u,w)\ge {2^{j-1}}\).

Re-randomization. For \(t\in {\mathbb {N}}\), let \(\pi _t:{\mathbb {R}}^t\rightarrow {\mathbb {R}}^t\) be a random mapping obtained by uniformly permuting the coordinates in \({\mathbb {R}}^t\). Let \(\{\sigma _i\}_{i \in [m]}\) be a sequence of i.i.d. random variables with the same distribution as \(\pi _t\). We define the random variable \(\pi _{t,m}:{\mathbb {R}}^{m\times t} \rightarrow {\mathbb {R}}^{m\times t}\) as follows:

$$\begin{aligned} \pi _{t,m}\left( \begin{array}{c} r_1\\ \vdots \\ r_m \end{array} \right) = \left( \begin{array}{c} \sigma _1(r_1)\\ \vdots \\ \sigma _m(r_m) \end{array} \right) .\quad \end{aligned}$$

The Construction. We now use re-randomization to construct our final embedding. For \(c\in \chi (E)\), and \(i\in {\mathbb {Z}}\), the map \(f_{i,c}: V(T({c}))\rightarrow {\mathbb {R}}^{m\times t}\) will represent an embedding of the subtree \(T({c})\) at scale \(2^i/t^2\). Recall that

$$\begin{aligned} V(T(c))=V(\gamma _c)\cup \Big (\bigcup _{c'\in \rho ^{-1}(c)} V(T(c'))\setminus \{v_{c'}\} \Big ). \end{aligned}$$

Let \(\{\Pi _{i,c'} : i \in {\mathbb {Z}}, c' \in \rho ^{-1}(c) \}\) be a sequence of i.i.d. random variables which each have the distribution of \(\pi _{t,m}\). We define \(f_{i,c}:V(T({c}))\rightarrow {\mathbb {R}}^{m\times t}\) as follows:

$$\begin{aligned} f_{i,c}(x)= \left\{ \begin{array}{ll} 0 &{} \text { if } x=v_c,\\ \Delta _i(x) &{} \text { if } x\in V(\gamma _c)\setminus \{v_c\},\\ {\Delta _i{(v_{c'})}}+\Pi _{i,c'}( f_{i,c'}(x)) &{} \text { if } x\in V(T({c'}))\setminus \{v_{c'}\} \text { for some } c'\in \rho ^{-1}(c). \end{array} \right. \end{aligned}$$
(53)

Re-randomization permutes the elements within each row, and the permutations are independent for different subtrees, scales, and rows. Finally, we define \(f_i=f_{i,c_0}\), where \(c_0=\chi (r,p(r))\). We use the following lemma to prove Theorem 3.1.

Lemma 5.7

There exists a universal constant \(C\) such that the following holds with non-zero probability: For all \(x,y \in V\),

$$\begin{aligned} (1 -C\varepsilon ) \,d_T(x,y) - \delta \, \rho _{\chi }(x,y;\delta ) \le \sum _{i\in {\mathbb {Z}}}\Vert f_i(x)-f_i(y)\Vert _1 \le d_T(x,y). \end{aligned}$$
(54)

We will prove Lemma 5.7 in Sect. 5.3. We first make two observations, and then use them to prove Theorem 3.1. Our first observation is immediate from Observations 5.1 and 5.2, since in the third case of (53), by Observation 5.2 \(,\Delta _i(v_c')\) and \(\Pi _{i,c'}( f_{i,c'}(x))\) must be supported on disjoint sets of rows.

Observation 5.8

For any \(v\in V\) and for any row \(j\in [m]\), there is at most one non-zero coordinate in \(f_i(v)[j]\).

Observation 5.2 and Lemma 5.5 also imply the following.

Observation 5.9

For any \(v\in V\) and \(u\in P_v\), we have

$$\begin{aligned} d_T(u,v)=\sum _{i\in {\mathbb {Z}}} \Vert f_i(u)-f_i(v)\Vert _1. \end{aligned}$$

Using these, together with Corollary 3.5, we now prove Theorem 3.1.

Proof of Theorem 3.1

By Lemma 5.7, there exists a choice of mappings \(\{g_i\}_{i\in {\mathbb {Z}}}\) such that for all \(x,y \in V\),

$$\begin{aligned} d_T(x,y)\ge \sum _{i \in {\mathbb {Z}}} \Vert {g_i(x)-g_i(y)}\Vert \ge (1-O(\varepsilon ))d_T(x,y)-\delta \rho _{{\chi }}(x,y;\delta ). \end{aligned}$$

We will apply Corollary 3.5 to the family given by \(\big \{f_i = {t^2g_i\over 2^i}\big \}_{i\in {\mathbb {Z}}}\) to arrive at an embedding \(F : V \rightarrow \ell _1^{tm({2+\lceil \log {1\over \varepsilon }\rceil })}\) such that \(G = F/t^2\) satisfies

$$\begin{aligned} d_T(x,y)\ge \Vert {G(x)-G(y)}\Vert _1\ge (1-O(\varepsilon ))d_T(x,y)-\delta \rho _{{\chi }}(x,y;\delta ). \end{aligned}$$
(55)

Observe that the codomain of \(f_i\) is \({\mathbb {R}}^{m\times t}\), where \(mt=\Theta \big ((\frac{1}{\varepsilon }+\log \log (\frac{1}{\delta }))^{3}\log n\big ),\) and the codomain of \(G\) is \({\mathbb {R}}^d\), where \(d={\Theta \big ( \log {1\over \varepsilon }(\frac{1}{\varepsilon }+\log \log (\frac{1}{\delta }))^{3}\log n\big )}\).

To achieve (55), we need only show that for every \(x,y \in V\), we have \(\frac{\zeta (x,y)}{t^2} \lesssim \varepsilon d_T(x,y)\), where \(\zeta (x,y)\) is defined in (30). Recalling this definition, we now restate \(\zeta \) in terms of our explicit family \(\big \{f_i = {t^2 g_i \over 2^i}\big \}_{i \in {\mathbb {Z}}}\). We have

$$\begin{aligned} \frac{\zeta (x,y)}{t^2}=\sum _{(k_1,k_2)\in [m]\times [t]}\sum _{\begin{array}{c} i:\exists j<i\\ g_{j}(x)[k_1,k_2]\ne g_{j}(y)[k_1,k_2] \end{array}} h_i(x,y;k_1,k_2)\,, \end{aligned}$$
(56)

where

$$\begin{aligned}&h_i(x,y;k_1,k_2)\\&\quad ={2^i\over t^2}\Big ({t^2\over 2^i}\big |g_{i}(x)[k_1,k_2]-g_{i}(y)[k_1,k_2]\big |\!-\!\big \lfloor \big |{t^2\over 2^i}g_{i}(x)[k_1,k_2]\!-\!{t^2\over 2^i}g_{i}(y)[k_1,k_2]\big |\big \rfloor \Big ).\qquad \end{aligned}$$

Fix \(x,y \in V\). For \(c\in \chi (E(P_{xy}))\), let \(\lambda _{c}\) be the induced subgraph on \(V(P_{xy})\cap V(\gamma _{c})\), i.e. the subpath of \(P_{xy}\) where all edges are colored by color \(c\). We have

$$\begin{aligned} d_T(x,y)=\sum _{c\in \chi (E(P_{xy}))}\mathsf{{len}}(E(\lambda _c)). \end{aligned}$$
(57)

If we look at a single term in (56), we have

$$\begin{aligned} h_i(x,y;k_1,k_2)< {2^i\over t^2}. \end{aligned}$$
(58)

For \(u,v\in P_{xy}\), let

$$\begin{aligned} S_i(u,v)&= \big \{(k_1,k_2)\in [m]\times [t]:h_i(u,v;k_1,k_2)\ne 0 \text { and }\\&\quad \exists j <i: g_{j}(x)[k_1,k_2]\ne g_{j}(y)[k_1,k_2]\big \}. \end{aligned}$$

Now, notice that if \(\frac{t^2}{2^i} (g_i(x)[k_1,k_2]-g_i(y)[k_1,k_2])\) is fractional, then there must exist a subpath \(\lambda _c\), for a color \(c\in \chi (E(P_{xy}))\), with endpoints \(u_c\) and \(w_c\) such that \(\frac{t^2}{2^i} (g_i(u_c)[k_1,k_2]-g_i(w_c)[k_1,k_2])\) is fractional too. Hence we have

$$\begin{aligned} \zeta (x,y)<\sum _{c\in \chi (E(P_{xy}))} \sum _{i\in {\mathbb {Z}}}{2^i|S_i(u_c,w_c)|\over t^2}. \end{aligned}$$

We call \(\sum _{i\in {\mathbb {Z}}} {2^i|S_i(u_c,w_c)|\over t^2}\) the contribution of \(\lambda _c\) for each color \(c\in \chi (E(P_{xy}))\).

We divide the analysis of the paths \(\lambda _c\) for \({c\in \chi (E(P_{xy}))}\) into two cases. For \(c\in \chi (E(P_x))\triangle \chi (E(P_y))\), the vertex \(v_{c}\) is one endpoint of the path \(\lambda _{c}\). Let \(u_{c}\) be the other. By Lemma 5.3, there is at most one \(i\in {\mathbb {Z}}\) and \((k_1,k_2)\in [m]\times [t]\) such that \(h_i(u_c,v_c;k_1,k_2)\ne 0,\) and

$$\begin{aligned} \big |\bigcup _{i\in {\mathbb {Z}}} S_i(u_c,v_c)\big |\le 1. \end{aligned}$$

By Lemma 4.3, for all \({i}\in {\mathbb {Z}}\) with \({{d_T(u_{c},v_{c})}} \le 2^{i-1}\), we have \(\tau _i(u_{c})=0\) and

$$\begin{aligned} \Vert \Delta _i(u_{c})\Vert _1=\Vert g_i(u_{c})-g_i(v_{c})\Vert _1=0. \end{aligned}$$
(59)

For \(i<1+\log _2({{d_T(u_{c},v_{c})}} )\), by (58) and Lemma 5.3 we can bound the contribution of \(\lambda _c\) to \(\zeta (x,y)\) by

$$\begin{aligned} \sum _{j\in {\mathbb {Z}}} {2^j|S_j(u_c,v_c)|\over t^2}< {2^i\over t^2}< {2d_T(u_c,v_c)\over t^2}\le \varepsilon d_T(u_c,v_c). \end{aligned}$$
(60)

Note that there is at most one color in \(\chi (E(P_{xy}))\setminus (\chi (E(P_x))\triangle \chi (E(P_y)))\). If no such color exists, then by (60),

$$\begin{aligned} \zeta (x,y)< \sum _{c\in \chi (E(P_{xy}))}\varepsilon \,\mathsf{{len}}(E(\lambda _c)) {\mathop {\le }\limits ^{(57)}} \varepsilon d_T(x,y). \end{aligned}$$

Suppose now that \(\{c\}=\chi (E(P_{xy}))\setminus (\chi (E(P_x))\triangle \chi (E(P_y)))\). Let \(u,w \in V(\lambda _c)\) be the closest vertices to \(x\) and \(y\), respectively. For \(i\in {\mathbb {Z}}\) we will show that if \(h_i(u,w;k_1,k_2)\ne 0\), then either \(d_T(x,y)\ge 2^{i-2},\) or for all \(j<i\), we have \((g_j(x)-g_j(y))[k_1,k_2]=0\). Then, by Lemma 5.3, there are at most two elements in \(g_i(u)-g_i(w)\) that are not in \(\{0,{2^i\over t^2},-{2^i\over t^2}\}\), therefore we can conclude

$$\begin{aligned} \zeta (x,y)&< \sum _{i\in {\mathbb {Z}}}{2^i|S_i(u,w)|\over t^2}+\sum _{c\in \chi (E(P_x))\triangle \chi (E(P_y))}\sum _{i\in {\mathbb {Z}}} {2^i|S_i(u_c,v_c)|\over t^2}\\&{\mathop {\le }\limits ^{(57)}} 4 \varepsilon d_T(x,y)+\sum _{c\in \chi (E(P_x))\triangle \chi (E(P_y))}\varepsilon \,\mathsf{{len}}(E(\lambda _c))\\&\le 5\varepsilon d_T(x,y). \end{aligned}$$

Without loss of generality suppose that \(d_T(u,v_c)\le d_T(w,v_c)\). If \(d_T(w,v_c)=0\) then the contribution of \(\lambda _c\) to \(\zeta (x,y)\) is zero. Suppose now that \(d_T(w,v_c)>0\), and let \(m_w=\max \{i:\tau _i(w)\ne 0\}\). By Lemma 4.3 the maximum always exists.

We will now split the rest of the proof into two cases.

Case 1: \(\tau _{m_w-1}(u)=0.\)

In this case by Lemma 4.6 we have \(d_T(u,w)> 2^{m_w-1}\). For \((k_1,k_2)\in [m]\times [t]\), if \(h_i(u,w;k_1,k_2)\ne 0\) then by (50), \(i\le m_w\) and

$$\begin{aligned} {2^{i}\over t^2} \le {2^{m_w}\over t^2} < {2d_T(u,w)\over t^2}\le {2d_T(x,y)\over t^2} \le \varepsilon d_T(x,y). \end{aligned}$$

Case 2: \(\tau _{m_w-1}(u)\ne 0.\)

Let \(m_u=\max \{i:\tau _i(u)\ne 0\}\). By Lemma 4.5 and as \(\tau _{m_w-1}(u)\ne 0\), we have \( m_u\le m_w\le m_u+1. \) Observation 4.2 implies that for all \(j<m_u\),

$$\begin{aligned} \tau _j(u)+\sum _{c'\in \chi (E(P_u))}\tau _j(v_{c'})=\varphi (c). \end{aligned}$$

We have \(m_w\ge m_u\), and by Observation 4.2,

$$\begin{aligned} \tau _j(w)+\sum _{c'\in \chi (E(P_w))}\tau _j(v_{c'})=\tau _j(u)+\sum _{c'\in \chi (E(P_u))}\tau _j(v_{c'})=\varphi (c). \end{aligned}$$
(61)

Therefore, by Observation 5.2 for \(j<m_u\) and \(k\in [t^2\varphi (c)]\),

$$\begin{aligned} \Vert (g_j(x)-g_j(u))[k]\Vert _1=\Vert (g_j(y)-g_j(w))[k]\Vert _1=0, \end{aligned}$$
(62)

and by Observation 5.2 and part (B) of (34), for all \(i\in {\mathbb {Z}}\), all the non-zero elements of \(g_i(u)-g_i(w)\) are in the first \(t^2\varphi (c)\) rows.

Suppose that there exists \(k\in [m]\) such that \(\Vert (g_i(u)-g_i(w))[k]\Vert _1\ne 0\). Now, we divide the proof into two cases again.

Case 2.1: There exists a \(j<i\) such that \(\Vert (g_j(x)\!-\!g_j(u))[k]\Vert _1\!+\!\Vert (g_j(y)-g_j(w))[k]\Vert _1\ne 0.\)

In this case, there must exist some \(c'\in \chi (E(P_x))\triangle \chi (E(P_y))\) such that

$$\begin{aligned} \Vert (g_j(v_{c'})-g_j(u_{c'}))[k]\Vert _1\ne 0. \end{aligned}$$

By (53) and (50), we have \(\tau _j(u_{c'})\ne 0\). Inequality (62) implies \(j\ge m_u\), and finally by Lemma 4.3,

$$\begin{aligned} d_T(x,y)\ge d_T(u_{c'},v_{c'})> 2^{j-1}\ge 2^{m_u-1} \ge 2^{m_w-2}\ge 2^{i-2}. \end{aligned}$$
(63)

Case 2.2: \(\Vert (g_j(x)-g_j(u))[k]\Vert _1+\Vert (g_j(y)-g_j(w))[k]\Vert _1= 0\) for all \(j<i\).

In this case, either for all \(j<i, \Vert g_j(x)[k]-g_j(y)[k]\Vert _1=0\) which implies that for \(k'\in [t], (k,k')\notin S_i(u,w)\), or \(\Vert g_j(u)[k]-g_j(w)[k]\Vert _1\ne 0\) for some \(j<i\). If \(\Vert g_j(u)[k]-g_j(w)[k]\Vert _1\ne 0\) for some \(j<i\) then by Lemma 5.6,

$$\begin{aligned} d_T(x,y)\ge d_T(u,w)\ge 2^{m_u-1}\ge 2^{m_w-2}\ge 2^{i-2}. \end{aligned}$$
(64)

For \(i>m_w\) we have \(\Vert g_i(u)-g_i(w)\Vert _1=0\), therefore in both cases if \(h_i(x,y;k_1,k_2)\ne 0\) either for all \(j<i, \Vert g_j(x)[k]-g_j(y)[k]\Vert _1=0\) or

$$\begin{aligned} {2^i\over t^2}\le {4d_T(x,y)\over t^2}\le 2\varepsilon d_T(x,y). \end{aligned}$$

\(\square \)

5.2 Properties of the \(\Delta _i\) Maps

We now present proofs of Lemmas 5.3–5.6.

Proof of Lemma 5.3

For a fixed \(i\in {\mathbb {Z}}\), by (50) there is at most one element in \(\Delta _i(v)\) that takes a value other than \(\{0,{2^i\over t^2}\}\).

We prove this lemma by showing that if for some \(i\in {\mathbb {Z}}\), and \((j,k)\in [m]\times [t]\),

$$\begin{aligned} \Delta _i(v)[j,k]\notin \big \{0,{2^i\over t^2}\big \}, \end{aligned}$$

then for all \(i'>i\) and \((j',k')\in [m]\times [t]\), we have \(\Delta _{i'}(v)[j',k']=0\). Let \(c={\chi }(v,p(v))\). Using (50), we can conclude that

$$\begin{aligned} t^2\tau _i(v)>\Big \lfloor {d_T(v_c,v)-\sum _{\ell =-\infty }^{i-1} 2^\ell \tau _\ell (v)\over 2^i/{t^2}}\Big \rfloor . \end{aligned}$$

Since the left-hand side is an integer,

$$\begin{aligned} t^2\tau _i(v)\ge {d_T(v_c,v)-\sum _{\ell =-\infty }^{i-1} 2^\ell \tau _\ell (v)\over 2^i/{t^2}} \end{aligned}$$

and

$$\begin{aligned} \sum _{\ell \le i}2^\ell \tau _\ell (v)&= 2^i\tau _i(v)+\sum _{\ell < i}2^\ell \tau _\ell (v)\\&\ge 2^i\Big ({d_T(v_c,v)-\sum _{\ell <i} 2^\ell \tau _\ell (v)\over 2^i}\Big )+\sum _{\ell < i}2^\ell \tau _\ell (v)\ge d_T(v_c,v). \end{aligned}$$

By part (A) of (34), for \(i'>i\) we have \(\tau _{i'}(v)=0\), thus \(\Vert \Delta _{i'}(v)\Vert _1=0\) and the proof is complete. \(\square \)

Proof of Lemma 5.4

For \(i\!\!<\!\! \big \lfloor \log _2\big (\frac{m(T)}{M(\chi ){+}\log _2 |E|}\big )\big \rfloor \) we have \(\Vert \Delta _k(u)\Vert {=}\Vert \Delta _k(w)\Vert _1{=}0\).

Let \(\nu \) be the minimum integer greater than \(\big \lfloor \log _2 \big (\frac{m(T)}{M(\chi )+\log _2 |E|}\big )\big \rfloor -1\) such that part (A) of (34) for \(\tau _\nu (w)\) is less that or equal to part (B). This \(\nu \) exists since, by (35), part (B) of (34) is always positive, while by Lemma 4.3, part (A) of (34) must be zero for some \(\nu \in \mathbb {Z}\) large enough. First we analyze the case when \(i<\nu \).

Observation 4.2 implies that part (B) of (34) specifies the value of \(\tau _i(w)\). By Lemma 4.5 \(\tau _i(u)\ge \tau _i(w)\), but the part (B) for \(\tau _i(u)\) is the same as for \(\tau _i(w)\), so we must have \( \tau _i(u)=\tau _i(w),\) and the same reasoning holds for \(\tau _{\ell }(w)\) for \(\ell <i\). Using this and the fact that part (A) does not define \(\tau _i(w)\), we have

$$\begin{aligned} 2^i\tau _i(w)+\sum _{\ell <i}2^\ell \tau _\ell (w)= 2^i\tau _i(u)+\sum _{\ell <i}2^\ell \tau _\ell (u)< d_T(v_c,w) < d_T(v_c,u). \end{aligned}$$

Therefore, the second case in (50) happens neither for \(u\) nor for \(w\), and for \(i<\nu \) we have \(\Delta _i(u)=\Delta _i(w)\).

We now consider the case \(i{=}\nu \). We have already shown that for \(\ell {<}i, \tau _\ell (u){=}\tau _\ell (w),\) and using (50), it is easy to verify that for all \((j,k)\in [m]\times [t]\),

$$\begin{aligned} \Delta _i(u)[j,k]\ge \Delta _i(w)[j,k]. \end{aligned}$$

Finally, in the case that \(i>\nu \), by Observation 4.2, we have \(\tau _i(w)=0\) and \(\Delta _i(w)[j,k]=0\). \(\square \)

Proof of Lemma 5.5

For all \(i\in {\mathbb {Z}}\), recalling the definition \(\alpha \) and \(\beta \) in (50) for \(\Delta _i(u)\), we have

$$\begin{aligned} \beta -\alpha =\min \Big (t^2\tau _i(v),\Big \lfloor {d_T(v_c,v)-\sum _{\ell =-\infty }^{i-1} 2^\ell \tau _\ell (v)\over 2^i/{t^2}} \Big \rfloor \Big ). \end{aligned}$$

and by definition of \(\Delta _i(u)\) we have

$$\begin{aligned} \Vert \Delta _i(u)\Vert _1=\min \Big ({2^i}\tau _i(u), d_T(u,v_c)-\sum _{j<i} 2^j\tau _j(u)\Big ). \end{aligned}$$

By Lemma 4.4, we have \(\sum _{i\in {\mathbb {Z}}} 2^i\tau _i(u)\ge d_T(u,v_c)\), therefore \(d_T(v_c,u)=\sum _{i\in {\mathbb {Z}}} \Vert \Delta _i(u)\Vert _1.\) The same argument also implies that \(d_T(w,v_c)=\sum _{i\in {\mathbb {Z}}} \Vert \Delta _i(w)\Vert _1\).

Now, suppose that \(d_T(u,v_c)\ge d(w,v_c)\). Then Lemma 5.4 implies that

$$\begin{aligned} \Vert \Delta _i(u)-\Delta _i(w)\Vert _1=\Vert \Delta _i(u)\Vert _1-\Vert \Delta _i(w)\Vert _1=d_T(v_c,u)-d_T(v_c,w)=d_T(w,u). \end{aligned}$$

\(\square \)

Proof of Lemma 5.6

Without loss of generality suppose that \(d_T(v_c,u)\ge d_T(v_c,w)\). We have

$$\begin{aligned} d_T(u,w)&= \sum _{h\in {\mathbb {Z}}} \Vert \Delta _h(u)-\Delta _h(w)\Vert _1\ge \sum _{h=j}^i \Vert \Delta _h(u)-\Delta _h(w)\Vert _1\nonumber \\&\ge \Vert \Delta _i(u)-\Delta _i(w)\Vert _1+\Vert \Delta _j(u) -\Delta _j(w)\Vert _1. \end{aligned}$$
(65)

By Lemma 4.5 we have \(\tau _j(w)\le \tau _j(u)\). In the definition of \(\tau _j(w)\), if part (B) of (34) is less than part (A), then by (50), for all \(h\) such that

$$\begin{aligned} \sum _{c'\in \chi (E(P_v))} t^2 \tau _j(v_{c'})<h \le t^2{\varphi (c)}, \end{aligned}$$

we have \(\Vert \Delta _{j}(w)[h]\Vert _1={2^i\over t^2}\). By Lemma 5.4 and Observation 5.2 for \(k\in {\mathbb {Z}}, \Delta _{j}(w)=\Delta _{j}(u)\). Hence, part (A) of (34) must specify the value of \(\tau _j(w)\). Observation 4.2 implies that \(\tau _i(w)=0\) and by (50), we have \(\Vert \Delta _i(w)\Vert _1=0\).

By (50), since \(\Vert \Delta _{i}(u)[k]\Vert _1>0\), and \(\alpha \) from (50) is a multiple of \(t^2\) for all \(t^2 \lfloor {k\over t^2}\rfloor <h<k\) we have \(\Vert \Delta _{i}(u)[h]\Vert _1={2^i\over t^2}\). This implies that

$$\begin{aligned} \Vert \Delta _i(u)-\Delta _i(w)\Vert _1\ge {2^i\over t^2} \big (k-1-t^2{\big \lfloor \frac{k}{{t^2}}\big \rfloor }\big )\ge {2^j\over t^2}\big (k-1-t^2{\big \lfloor \frac{k}{{t^2}}\big \rfloor }\big ). \end{aligned}$$

Moreover, \(\Vert \Delta _{j}(w)[k]\Vert _1<{2^j\over t^2}\), and (50) implies that for all \(k<h\le t^2 \lfloor {1+{k\over t^2}}\rfloor \), we have \(\Vert \Delta _{j}(w)[h]\Vert _1=0\). The same argument also shows that

$$\begin{aligned} \Vert \Delta _j(u)-\Delta _j(w)\Vert _1\ge {2^j\over t^2}\big (t^2\big \lfloor 1+{k\over {t^2}}\big \rfloor -k\big ). \end{aligned}$$

Hence by (65),

$$\begin{aligned} d_T(u,w)\ge {t^2-1\over t^2}2^j\ge 2^{j-1}. \end{aligned}$$

\(\square \)

5.3 The Probabilistic Analysis

We are thus left to prove Lemma 5.7. For \(c\in \chi (E)\), we analyze the embedding for \(T(c)\) by going through all \(c'\in \chi (E(T(c)))\) one by one in increasing order of \(\varphi (c')\). Our first lemma bounds the probability of a bad event, i.e. of a subpath not contributing enough to the distance in the embedding.

Lemma 5.10

For any \(C \ge 8\), the following holds. Consider three colors \(a \in \chi (E), b \in \rho ^{-1}(a)\), and \(c \in \chi (E(P_{u\, v_b}))\) for some \(u \in V(T(b))\). Then for every \(w \in V(T(a)) \setminus V(T(b))\), we have

$$\begin{aligned}&\mathbb {P}\Big [ \exists \,{x\in V(P_{w\,v_{a}})} :\sum _{i\in {\mathbb {Z}}}\Vert f_{i,a}(x)-f_{i,a}(u)\Vert _1\le \left( 1-{C \varepsilon }\right) \,d_T(u, v_c) \nonumber \\&\qquad +\sum _{i\in {\mathbb {Z}}}\Vert f_{i,a}(v_c)-f_{i,a}(x)\Vert _1 \mid \{f_{i,c'}\}_{c'\in \rho ^{-1}(a)} \Big ]\nonumber \\&\quad \le {1\over \lceil \log _2 1/\delta \rceil } \exp \big (-(C/(\varepsilon 2^{\beta +2}))\, d_T(u,v_c)\big ), \end{aligned}$$
(66)

where \(\beta =\max \{i: \exists y \in P_{u\,v_c} \backslash \{v_c\}, \tau _i(y)\ne 0\}\). (See Fig. 1 for position of vertices in the tree.)

Fig. 1
figure 1

Position of vertices corresponding to the statement of Lemma 5.10

Proof

Recall that \({\mathbb {R}}^{m\times t}\) is the codomain of \(f_{i,a}\). For \(i\in {\mathbb {Z}}\), and \(j\in [m]\), and \(z\in V(P_{w\,v_{a}})\), let

$$\begin{aligned} s_{ij}(z)&= \left\| f_{i,a}(z)[j]-f_{i,a}(v_c)[j]\right\| _1+\left\| f_{i,a}(v_c)[j]\right. \\&\quad \left. -f_{i,a}(u)[j]\right\| _1-\left\| f_{i,a}(z)[j]-f_{i,a}(u)[j]\right\| _1. \end{aligned}$$

We have

$$\begin{aligned}&\sum _{i\in {\mathbb {Z}}} \Vert f_{i,a}(u)-f_{i,a}(v_c)\Vert _1+\sum _{i\in {\mathbb {Z}}} \Vert f_{i,a}(v_c)-f_{i,a}(z)\Vert _1\\&\quad = \sum _{i\in {\mathbb {Z}}} \Vert f_{i,a}(z)-f_{i,a}(u)\Vert _1+\sum _{i\in {\mathbb {Z}},j\in [m]} s_{ij}(z). \end{aligned}$$

By Observation 5.9, we have \(d_T(u,v_c)=\sum _{i\in {\mathbb {Z}}} \Vert f_{i,a}(u)-f_{i,a}(v_c)\Vert _1\), therefore

$$\begin{aligned} d_T(u,v_c)-\sum _{i\in {\mathbb {Z}},j\in [m]} s_{ij}(z)\!=\! \sum _{i\in {\mathbb {Z}}} \Vert f_{i,a}(z)-f_{i,a}(u)\Vert _1-\sum _{i\in {\mathbb {Z}}} \Vert f_{i,a}(z)-f_{i,a}(v_c)\Vert _1. \nonumber \\ \end{aligned}$$
(67)

Let \({\mathcal {E}} = \{f_{i,c'} : c' \in \rho ^{-1}(a)\}\). We define \(\mathbb {P}_{{\mathcal {E}}}[\cdot ] = \mathbb {P}[\cdot \mid {\mathcal {E}}].\) In order to prove this theorem, we bound

$$\begin{aligned} {\mathbb {P}}_{{\mathcal {E}}}\big [\exists \, {x\in V(P_{w\,v_{a}})} : \sum _{i\in {\mathbb {Z}},j\in [m]} s_{ij}(x)\ge C\varepsilon d_T(u,v_c)\big ]. \end{aligned}$$

We start by bounding the maximum of the random variables \(s_{ij}\).

For \(i>\beta \) we have \(\Delta _i(u)=\Delta _i(v_c)\), hence \(f_{i,a}(u)=f_{i,a}(v_c)\). Using the triangle inequality for all \(i\in {\mathbb {Z}}, j\in [m]\) and \(z\in P_{w\,v_a}\),

$$\begin{aligned} s_{ij}(z)\le 2 \Vert f_{i,a}(v_c)[j]-f_{i,a}(u)[j]\Vert _1, \end{aligned}$$
(68)

Hence for all \(i\in {\mathbb {Z}}\) and \(j\in [m]\) by Observation 5.8,

$$\begin{aligned} s_{ij}(z)\le 2 \Vert f_{i,a}(v_c)[j]-f_{i,a}(u)[j]\Vert _1\le {2^{\beta +1}\over t^2}. \end{aligned}$$
(69)

First note that, if \(z\) is on the path between \(v_{b}\) and \(v_a\) then by Observation 5.9, \(s_{ij}(z)=0\). Observation 5.2 and (50) imply that if \(\Vert f_{i,a}(u)[j]-f_{i,a}(v_c)[j]\Vert _1\ne 0\) then \(\Vert f_{i,a}(v_c)[j]\Vert _1= 0\). From this, we can conclude that \(s_{ij}(z)\ne 0\) if and only if there exists a \(k\in [t]\) such that both \(f_{i,a}(u)[j,k]-f_{i,a}(v_c)[j,k]\ne 0\) and \(f_{i,a}(z)[j,k]\ne 0\). Since by Lemma 5.4, for all \(i\in {\mathbb {Z}}, j\in [m]\) and \(k\in [t]\), we have \(f_{i,a}(w)[j,k]\ge f_{i,a}(z)[j,k]\), we conclude that for \(z\in P_{w\,v_{a}}\) if \(s_{ij}(z)\ne 0\) then \(s_{ij}(w)\ne 0\).

Now, for \(i\in {\mathbb {Z}}\) and \(j\in [m]\), we define a random variable

$$\begin{aligned} X_{ij}=\left\{ \begin{array}{l@{\quad }l} 0&{} \text {if } s_{ij}(w)=0,\\ 2\Vert f_{i,a}(u)[j]-f_{i,a}(v_c)[j]\Vert _1&{} \text {if } s_{ij}(w)\ne 0. \end{array}\right. \end{aligned}$$
(70)

Note that since the re-randomization in (53) is performed independently on each row and at each scale, the random variables \(\big \{X_{ij}: i\in {\mathbb {Z}}, j\in [m]\big \}\) are mutually independent. By (68), for all \(z\in P_{w\,v_a}\), we have \(s_{ij}(z)\le X_{ij}\), and thus

$$\begin{aligned}&{\mathbb {P}}_{{\mathcal {E}}}\Big [\exists \, {x\in V(P_{w\,v_{a}})} : \sum _{i\in {\mathbb {Z}},j\in [m]} s_{ij}(x)\ge C\varepsilon d_T(u,v_c)\Big ]\nonumber \\&\qquad \le {\mathbb {P}}_{{\mathcal {E}}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}\ge C\varepsilon d_T(u,v_c)\Big ]. \end{aligned}$$
(71)

As before, for \(X_{ij}\) to be non-zero, it must be that \(k\in [t]\) is such that \(f_{i,a}(w)[j,k] \ne 0\) and \(f_{i,a}(u)[j,k]-f_{i,a}(v_c)[j,k]\ne 0\). Since \(w\notin V(T(b))\) with the re-randomization in (53) and Observation 5.8, this happens at most with probability \(\frac{1}{t}\), hence for \(j\in [m]\), and \(i\in {\mathbb {Z}}\),

$$\begin{aligned} {\mathbb {P}}_{{\mathcal {E}}}[X_{ij}\ne 0]&= {\mathbb {P}}_{{\mathcal {E}}}\big [\Vert f_{i,a}(w)[j]-f_{i,a}(v_c)[j]\Vert _1 +\Vert f_{i,a}(v_c)[j]\\&-f_{i,a}(u)[j]\Vert _1-\Vert f_{i,a}(w)[j]-f_{i,a}(u)[j]\Vert _1\ne 0\big ]\le {1\over t}. \end{aligned}$$

This yields

$$\begin{aligned} {\mathbb {E}}[X_{ij}\mid {\mathcal {E}}]\le {1\over t}\big (2\Vert f_{i,a}(u)[j]-f_{i,a}(v_c)[j]\Vert _1\big ). \end{aligned}$$
(72)

Now we use (69) to write

$$\begin{aligned} \mathrm {Var}(X_{ij}\mid {\mathcal {E}})\!\le \! {1\over t}{\big (2\Vert f_{i,a}(u)[j]\!-\!f_{i,a}(v_c)[j]\Vert _1\big )^2}\le {2^{\beta +2}\over t^3}\Vert f_{i,a}(u)[j]\!-\!f_{i,a}(v_c)[j]\Vert _1, \end{aligned}$$

and use Observation 5.9 in conjunction with (72) to conclude that

$$\begin{aligned} {\mathbb {E}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}\mid {\mathcal {E}}\Big ]\le \sum _{i\in {\mathbb {Z}},j\in [m]} {2\over t}\, \Vert f_i(v_c)[j]-f_i(u)[j]\Vert _1={2\over t}\, d_T(v_c,u)\quad \end{aligned}$$
(73)

and

$$\begin{aligned} \sum _{i\in {\mathbb {Z}},j\in [m]}\mathrm {Var}(X_{ij}\mid {\mathcal {E}})\le \sum _{i\in {\mathbb {Z}},j\in [m]}{2^{\beta +2}\over t^3}\Vert f_i(v_c)[j]-f_i(u)[j]\Vert _1={2^{\beta +2}\over t^3}d_T(v_c,u).\nonumber \\ \end{aligned}$$
(74)

Define \(M = \max \{ X_{ij} - {\mathbb {E}}[X_{ij} \mid {\mathcal {E}}] : i\in {\mathbb {Z}}, j\in [m] \}.\) We now apply Theorem 2.2 to complete the proof:

$$\begin{aligned}&{\mathbb {P}}_{{\mathcal {E}}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}\ge C\big ({d_T(u,v_c)\over t}\big )\Big ]\\&\quad ={\mathbb {P}}_{{\mathcal {E}}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}-{2d_T(u,v_c)\over t}\ge (C-2)\big ({d_T(u,v_c)\over t}\big )\Big ]\\&\quad {\mathop {\!\!\!\le }\limits ^{\!\!(73)}}\; {\mathbb {P}}_{{\mathcal {E}}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} {X_{ij} }-\mathbb E\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}\mid {\mathcal {E}}\Big ] \ge (C-2)\Big (\frac{d_T(u,v_c)}{t}\Big )\Big ]\\&\quad \le \exp \Big ({-((C-2)d_T(u,v_c)/t)^2\over 2\big (\sum _{i\in {\mathbb {Z}},j\in [m]} \mathrm {Var}(X_{ij}\mid {\mathcal {E}})+ (C-2)(d_T(u,v_c)/t) M/3\big )}\Big ). \end{aligned}$$

Since \({\mathbb {E}}[X_{ij}\mid {\mathcal {E}}]\ge 0\), (69) implies \(M\le {2^{\beta +1}\over t^2}\). Now, we can plug in this bound and (74) to write

$$\begin{aligned}&{\mathbb {P}}_{{\mathcal {E}}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}\ge C\Big ({d_T(u,v_c)\over t}\Big )\Big ]\\&\qquad \le \exp \Big ({-((C-2)d_T(u,v_c)/t)^2\over 2\big ({2^{\beta +2}\over t^3}d_T(u,v_c)+ (C-2)(d_T(u,v_c)/t) (2^{\beta +1}/t^2)/3\big )}\Big )\\&\qquad =\exp \Big ({-t(C-2)^2d_T(u,v_c)\over {2\big (2^{\beta +2}+ (C-2)(2^{\beta +1})/3\big )}}\Big )\\&\qquad =\exp \Big ({-(C-2)^2 \over {(C-2)/3+2}}\Big ({td_T(u,v_c)\over 2^{\beta +2}}\Big )\Big ). \end{aligned}$$

An elementary calculation shows that for \(C\ge 8, {(C-2)^2 \over {(C-2)/3+2}}> C,\) hence

$$\begin{aligned}&{\mathbb {P}}_{{\mathcal {E}}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}\ge C\Big ({d_T(u,v_c)\over t}\Big )\Big ]\\&< \exp \Big (-(Ct/ 2^{\beta +2})\, d_T(u,v_c)\Big )\\ {}&\mathop {\le }\limits ^{(48)}\exp \Big (-C\Big ({1\over \varepsilon }+\log \Big \lceil \log _2 {1\over \delta }\Big \rceil \Big )\Big ({1\over 2^{\beta +2}}\Big )\, d_T(u,v_c)\Big )\\&= \Big ({1\over \big \lceil \log _2(1/\delta )\big \rceil }\Big )^{{Cd_T(u,v_c)\over 2^{\beta +2}}} \cdot \exp \Big (-C\Big ({1\over \varepsilon }\Big )\Big ({1\over 2^{\beta +2}}\Big )\, d_T(u,v_c)\Big ). \end{aligned}$$

Since there exists a \(y {\in } P_{u\,v_c} {\backslash } \{v_c\}\) such that \(\tau _\beta (y) \ne 0\), and for all \(c'\in \chi (E), \kappa (c')\ge 1\), Lemma 4.3 implies that \(d_T(u,v_c)> 2^{\beta -1}\), and for \(C\ge 8\), we have \({Cd_T(u,v_c)\over 2^{\beta +2}}> 1\). Therefore,

$$\begin{aligned}&\mathbb {P}_{{\mathcal {E}}} \Big [ \exists \,{x\in V(P_{w\,v_{a}})} :\sum _{i\in {\mathbb {Z}}}\Vert f_{i,a}(x)-f_{i,a}(u)\Vert _1\\&\quad \le \left( 1-{C \varepsilon }\right) \,d_T(u, v_c)+\sum _{i\in {\mathbb {Z}}}\Vert f_{i,a}(v_c)-f_{i,a}(x)\Vert _1 \Big ]\\&\quad {\mathop {\le }\limits ^{(67)}} {\mathbb {P}}_{{\mathcal {E}}}\Big [\exists \, {x\in V(P_{w\,v_{c}})} : \sum _{i\in {\mathbb {Z}},j\in [m]} s_{ij}(x)\ge C\varepsilon d_T(u,v_c)\Big ]\\&\quad {\mathop {\le }\limits ^{(71)}} {\mathbb {P}}_{{\mathcal {E}}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}\ge C\varepsilon \left( {d_T(u,v_c)}\right) \Big ]\\&\quad \mathop {\le }\limits ^{(48)} {\mathbb {P}}_{{\mathcal {E}}}\Big [\sum _{i\in {\mathbb {Z}},j\in [m]} X_{ij}\ge C\Big ({d_T(u,v_c)\over t}\Big )\Big ]\\&\quad < \Big ({1\over \lceil \log _2(1/\delta )\rceil }\Big )\cdot \exp \Big (-C\big ({1\over \varepsilon 2^{\beta +2}}\big )\, d_T(u,v_c)\Big ), \end{aligned}$$

completing the proof. \(\square \)

The \(\Gamma _a\) Mappings. Before proving Lemma 5.7, we need some more definitions. For a color \(a\in \chi (E)\), we define a map \(\Gamma _{a}:{V(T(a))}\rightarrow V(T(a))\) based on Lemma 5.10. For \(u\in V(\gamma _a)\), we put \(\Gamma _a(u)=u\). For all other vertices \(u\in V(T(a))\setminus V(\gamma _a)\), there exists a unique color \(b\in \rho ^{-1}(a)\) such that \(u\in V(T(b))\). We define \(\Gamma _{a}(u)\) as the vertex \(w \in V(P_{u v_b})\) which is closest to the root among those vertices satisfying the following condition: For all \(v \in V(P_{u w}) \setminus \{w\}\) and \(k \in {\mathbb {Z}}, \tau _k(v)\ne 0\) implies

$$\begin{aligned} 2^k< {d_T(u,w)\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))}. \end{aligned}$$
(75)

Clearly such a vertex exists, because the conditions are vacuously satisfied for \(w=u\). We now prove some properties of the map \(\Gamma _a\).

Lemma 5.11

Consider any \(a\in \chi (E)\) and \(u\in V(T(a))\) such that \(\Gamma _{a}(u)\ne u\). Then we have \(\Gamma _{a}(u)=v_{c}\) for some \(c\in \chi (E(P_{u v_a}))\setminus \{a\}\).

Proof

Let \(w\in V(P_{u\,\Gamma _a(u)})\) be such that \(\Gamma _a(u)=p(w)\). The vertex \(w\) always exists because \(\Gamma _a(u)\in V(P_u)\setminus \{u\}\). If \(\chi (w,\Gamma _a(u))\ne \chi (\Gamma _a(u),p(\Gamma _a(u)))\) then \(\Gamma _a(u)\) is \(v_{c}\) for some \(c\in \chi (E(P_{u\, v_a}))\setminus \{a\}\).

Now, for the sake of contradiction suppose that \(\chi (w,\Gamma _a(u))\!=\! \chi (\Gamma _a(u),p(\Gamma _a(u)))\). In this case, we show that for all \( v \in P_{u\, p(\Gamma _a(u))} \setminus \{p(\Gamma _a(u))\}\), and \(k\in {\mathbb {Z}}, \tau _k(v)\ne 0\) implies

$$\begin{aligned} 2^k< {d_T(u,p(\Gamma _a(u)))\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))}. \end{aligned}$$
(76)

This is a contradiction since by definition of \(\Gamma _a\), it must be that \(\Gamma _a(u)\) is the closest vertex to the root satisfying this condition, yet \(p(\Gamma _a(u))\) is closer to root than \(\Gamma _a(u)\).

Observe that

$$\begin{aligned} V(P_{u\,p(\Gamma _a(u))})\setminus \{p(\Gamma _a(u))\}= V(P_{u\,\Gamma _a(u)}). \end{aligned}$$

We first verify (76) for \(\Gamma _a(u)\) and \(k\in {\mathbb {Z}}\) with \(\tau _k(\Gamma _a(u))\ne 0\). Since \(\Gamma _a(u)\in V(P_u)\), we have

$$\begin{aligned} d_T(u,\Gamma _a(u))\le d_T(u,p(\Gamma _a(u))). \end{aligned}$$
(77)

Recalling that \(p(w)=\Gamma _a(u)\), by Lemma 4.5 for all \(k\in {\mathbb {Z}}, \tau _k(\Gamma _a(u))\le \tau _k(w)\), therefore for all \(k\in {\mathbb {Z}}\), with \(\tau _k(\Gamma _a(u))\ne 0\), we have \(\tau _k(w) \ne 0\) as well, hence (75) implies

$$\begin{aligned} 2^k<{d_T(u,\Gamma _a(u))\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))}{\mathop {\le }\limits ^{(77)}} {d_T(u,p(\Gamma _a(u)))\over \varepsilon (\varphi (\chi (u,p(u))-\varphi (a))}. \end{aligned}$$
(78)

For all other vertices, \(v \in V(P_{u\Gamma _a(u)})\setminus \{\Gamma _a(u)\}\), and \(k\in {\mathbb {Z}}\) with \(\tau _k(v)\ne 0\) by (75),

$$\begin{aligned} 2^k< {d_T(u,\Gamma _a(u))\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))} {\mathop {\le }\limits ^{(77)}} {d_T(u,p(\Gamma _a(u)))\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))}, \end{aligned}$$
(79)

completing the proof. \(\square \)

Lemma 5.12

Suppose that \(a\in \chi (E)\) and \(u\in V(T(a))\). For any \(w\in V(P_{u\,\Gamma _{a}(u)})\) such that \({\chi }(u,p(u))={\chi }(w,p(w))\) we have \(\Gamma _{a}(w)\in V(P_{u\,\Gamma _{a}(u)})\).

Proof

For the sake of contradiction, suppose that \(\Gamma _{a}(w)\notin V(P_{u\,\Gamma _{a}(u)})\). Since \(w\in V(P_u)\) and \(\Gamma _{a}(w)\notin V(P_{u\,\Gamma _{a}(u)})\), we have \(\Gamma _a(w)\in V(P_{\Gamma _a(u)})\) and

$$\begin{aligned} d_T(u,\Gamma _a(u))\le d_T(u,\Gamma _a(w)). \end{aligned}$$
(80)

Since \(w\in V(P_{u\,\Gamma _{a}(u)})\) by assumption, for all vertices, we have \(V(P_{u\,w})\setminus \{w\} \subseteq V(P_{u\,\Gamma _{a}(u)})\setminus \{\Gamma _a(u)\}\). Thus for all \(v\in V(P_{u\,w})\setminus \{w\}\) and \(k\in {\mathbb {Z}}\) with \(\tau _k(v)\ne 0\) by (75),

$$\begin{aligned} 2^k< {d_T(u,\Gamma _a(u))\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))} {\mathop {\le }\limits ^{(80)}} {d_T(u,\Gamma _a(w))\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))}. \end{aligned}$$
(81)

The fact that \(w\!\in \! V(P_{u\,\Gamma _a(u)})\) also implies that \(d_T(w,\Gamma _a(w)))\!\le \! d_T(u\,\Gamma _a(w)))\). Therefore, for all vertices \(v\!\in \! V(P_{w\,\Gamma _a(w)}){\setminus } \{\Gamma _a(w)\}\) and \(k\!\in \! {\mathbb {Z}}\) with \(\tau _k(v)\!\ne \!0\) by (75),

$$\begin{aligned} 2^k&< {d_T(w,\Gamma _a(w))\over \varepsilon (\varphi (\chi (w,p(w)))-\varphi (a))}\le {d_T(u,\Gamma _a(w))\over \varepsilon (\varphi (\chi (w,p(w)))-\varphi (a))}\nonumber \\&= {d_T(u,\Gamma _a(w))\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))}. \end{aligned}$$
(82)

We have

$$\begin{aligned} V(P_{u\,\Gamma _a(w)})=V(P_{u\,w}) \cup \big (V(P_{w\,\Gamma _a(w)})\setminus \{\Gamma _a(w)\}\big ). \end{aligned}$$

Hence, by (81) and (82), for all \(v\in V(P_{u\,\Gamma _a(w)})\setminus \{\Gamma _a(w)\}\) and \(k\in {\mathbb {Z}}, \tau _k(v)\ne 0\) implies

$$\begin{aligned} 2^k< {d_T(u,p(\Gamma _a(w)))\over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (a))}. \end{aligned}$$
(83)

This is a contradiction to the definition of \(\Gamma _a(u)\), since \(\Gamma _a(u)\) must be the closest vertex to the root satisfying this condition, yet \(\Gamma _a(w)\) is closer to root than \(\Gamma _a(u)\).\(\square \)

Defining Representatives for \(\gamma _c\). Now, for each \(c\in \chi (E)\), we define a small set of representatives for vertices in \(\gamma _c\). Later, we use these sets to bound the contraction of pairs of vertices that have one endpoint in \(\gamma _c\).

For \(a\in \chi (E)\) and \(c\in \chi (E(T(a)))\setminus \{a\}\), we define the set \(R_{a}(c)\subseteq V(\gamma _c)\), the set of representatives for \(\gamma _c\), as follows:

$$\begin{aligned} R_{a}(c)&= \bigcup _{i=0}^{ \lceil \log _2 {1\over \delta }\rceil -1} \big \{u\in V(\gamma _c):\, u \text { is the furthest vertex} \nonumber \\&\qquad \qquad \qquad \text { from } v_c \text { s.t. } \Gamma _a(u)\ne u \text { and } d(u,v_c)\le 2^{-i}\,\mathsf{{len}}(\gamma _c)\big \}.\quad \quad \end{aligned}$$
(84)

The next lemma shows when a vertex has a close representative.

Lemma 5.13

Consider \(a\in \chi (E)\) and \(c\in \chi (E(T(a)))\setminus \{a\}\). For all vertices \(u\in V(\gamma _c)\) with \(\Gamma _{a}(u)\ne u\) there exists a \(w\in R_{a}(c)\) such that

$$\begin{aligned} d_T(u,v_{c})\le d_T(w,v_{c})\le 2\max \big (d_T(u,v_{c}), \delta \, \mathsf{{len}}(\gamma _c)\big ). \end{aligned}$$

Proof

Let \(i\ge 0\) be such that

$$\begin{aligned} \frac{d_T(u,v_{c})}{\mathsf{{len}}(\gamma _c)} \in \big (2^{-i-1},2^{-i}\big ]. \end{aligned}$$

If \(i\le \lceil \log _2 {1\over \delta }\rceil -1\), then (84) implies that either \(u\in R_a(c)\), or there exists a \(w\in R_a(c)\) such that

$$\begin{aligned} d_T(u,v_{c})< d_T(w,v_{c})\le { 2^i}\le 2\,d_T(u,v_{c}). \end{aligned}$$

On the other hand, if \(i> \lceil \log _2 {1\over \delta }\rceil -1\), then (84) implies that either \(u\in R_a(c)\), or that there exists a \(w\in R_a(c)\) such that

$$\begin{aligned} d_T(u,v_{c})< d_T(w,v_{c})\le { 2^{\lceil \log _2 {1\over \delta }\rceil -1}} \le 2\delta \,\mathsf{{len}}(\gamma _c), \end{aligned}$$

completing the proof. \(\square \)

The following lemma, in conjunction with Lemma 5.13, reduces the number of vertices in \(V(\gamma _c)\) that we need to analyze using Lemma 5.10.

Lemma 5.14

Let \((X,d)\) be a pseudometric, and let \(f:V\rightarrow X\) be a \(1\)-Lipschitz map. For \(x,y\in V\), and \(x',y'\in V(P_{xy})\) and \(h\ge 0\), if \(d(f(x),f(y))\ge d_T(x,y)-h\) then \(d(f(x'),f(y'))\ge d_T(x',y')-h\).

Proof

Suppose without loss of generality that \(d_T(x',x)\le d_T(y',x)\). Using the triangle inequality,

$$\begin{aligned} d(f(x'),f(y'))&\ge d(f(x),f(y))- d(f(x),f(x'))- d(f(y),f(y'))\\&\ge (d_T(x,y)-h) - d(f(x),f(x'))- d(f(y),f(y'))\\&\ge d_T(x,y)- d_T(x,x')- d_T(y,y')-h\\&= d_T(x',y')-h.\nonumber \\ \end{aligned}$$

\(\square \)

The following lemma constitutes the inductive step of the proof of Lemma 5.7.

Lemma 5.15

There exists a universal constant \(C\) such that for any color \(c\in \chi (E)\cup \{\chi (r,p(r))\}\), the following holds. Suppose that, with non-zero probability, for all \(c'\in \rho ^{-1}(c)\), and for all pairs \(x,y\in V(T(c'))\), we have

$$\begin{aligned} (1 -C\varepsilon ) \,d_T(x,y) - \delta \, \rho _{\chi }(x,y;\delta ) \le \sum _{i\in {\mathbb {Z}}}\Vert f_{i,c'}(x)-f_{i,c'}(y)\Vert _1 \le d_T(x,y).\quad \end{aligned}$$
(85)

Then with non-zero probability for all \(x,y\in V(T(c))\), we have

$$\begin{aligned} (1 -C\varepsilon ) \,d_T(x,y) - \delta \, \rho _{\chi }(x,y;\delta ) \le \sum _{i\in {\mathbb {Z}}}\Vert f_{i,c}(x)-f_{i,c}(y)\Vert _1 \le d_T(x,y).\quad \end{aligned}$$
(86)

Proof

Let \({\mathcal {E}}\) denote the event that, for all \(c'\in \rho ^{-1}(c)\), and all \(x,y \in V(T(c'))\), we have

$$\begin{aligned} d_T(x,y)\ge \sum _{i \in {\mathbb {Z}}} \Vert {f_{i,c'}(x)-f_{i,c'}(y)}\Vert \ge (1-C\varepsilon )d_T(x,y)-\delta \rho _{{\chi }}(x,y;\delta ). \end{aligned}$$
(87)

We will prove the lemma by showing that, conditioned on \({\mathcal {E}}\), (86) holds with non-zero probability.

For \(x,y\in V(T(c))\) we define

$$\begin{aligned} \mu (x,y) = \mathrm{max}{\big \{ \varphi (a) : a \in \chi (E) \text { and } x,y \in V(T(a))\big \}}. \end{aligned}$$

Note that since \(x,y\in V(T(c))\), we have

$$\begin{aligned} \mu (x,y)\ge \varphi (c). \end{aligned}$$
(88)

It is easy to see that if \(\mu (x,y)> \varphi (c)\), then \(x,y\in V(T(c'))\) for some \(c'\in \rho ^{-1}(c)\). By construction, if \(c' \in \rho ^{-1}(c)\) and \(x,y \in V(T(c'))\), then

$$\begin{aligned} \Vert f_{i,c}(x)-f_{i,c}(y)\Vert = \Vert f_{i,c'}(x)-f_{i,c'}(y)\Vert , \end{aligned}$$

hence \({\mathcal {E}}\) implies that (86) holds for all such pairs. Thus in the remainder of the proof, we need only handle pairs \(x,y \in V(T(c))\) with \(\mu (x,y)=\varphi (c)\).

Write \(\chi (E(T(c))) = \{c_1, c_2, \ldots , c_n\}\), where the colors are ordered so that \(\varphi (c_j) \le \varphi (c_{j+1})\) for \(j=1,2,\ldots ,n-1\). Let \(\varepsilon _1=24 \varepsilon \), where the constant \(24\) comes from Lemma 5.10. And let \(\varepsilon _2=2 \cdot C' \varepsilon \), where \(C'\) is the constant from Lemma 4.11.

For \(i\in [m]\), we define the event \(X_i\) as follows: For all \(j \le i\), all \(x \in V(\gamma _{c_i})\) and \(y \in V(\gamma _{c_j})\) with \(\mu (x,y)=\varphi (c)\), we have

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1\nonumber \\&\qquad \ge d_T(x,y)-\varepsilon _1 d_T(x,y) -\varepsilon _2 d_T(\Gamma _c(x), \Gamma _c(y)) -\delta \rho _\chi (x,y;\delta ). \end{aligned}$$
(89)

For all pairs \(x\in V(\gamma _{c_i})\) and \(y\in V(\gamma _{c_j})\), the event \(X_{\max (i,j)}\) implies

$$\begin{aligned} \sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1\ge d_T(x,y)-(\varepsilon _1+\varepsilon _2)d_T(x,y)-\delta \rho _\chi (x,y;\delta ). \end{aligned}$$

In particular this shows that for \(C=2\cdot C'+24\), if the events \(X_1, X_2,\ldots , X_n\) all occur, then (86) holds for all pairs \(x,y\in V(T(c))\). Hence we are left to show that

$$\begin{aligned} \mathbb {P}[X_1 \wedge \cdots \wedge X_n\mid {\mathcal {E}}] > 0. \end{aligned}$$

To this end, we define new events \(\{ Y_i : i \in [n] \}\) and we show that for every \(i \in [n]\),

$$\begin{aligned} \mathbb {P}_{{\mathcal {E}}}\big [{X_1}\wedge \cdots \wedge {X_{i}} \mid {X_1}\wedge \cdots \wedge {X_{i-1}}\wedge { Y_i}\big ]=1\,, \end{aligned}$$
(90)

and then we bound the probability that \(Y_i\) does not occur by

$$\begin{aligned} \mathbb {P}_{{\mathcal {E}}}\big [\overline{ Y_i}\big ]\le 2^{-3(\varphi (c_i)-\varphi (c))+1}. \end{aligned}$$
(91)

By Lemma 5.5 and the definition of \(f_{k,c}\) (53), we have \(\mathbb {P}_{{\mathcal {E}}}[X_1]=1\). Since for all \(i\in \{2,\ldots n\}, c_i\in \chi (E(T(c)))\setminus \{c\}\), we have

$$\begin{aligned}&\mathbb {P}_{{\mathcal {E}}}[{X_1}\wedge \cdots \wedge {X_n}]\\&\quad \ge 1-\sum _{i=2}^n \mathbb {P}_{{\mathcal {E}}}\big [\overline{Y_i}\big ] {\mathop {\ge }\limits ^{(91)}} 1-\sum _{i=2}^n2^{-3(\varphi (c_i) -\varphi (c))+1}{\mathop {>}\limits ^{(4.9)}} 1-2\cdot 2^{(2-3)}={0}, \end{aligned}$$

which completes the proof.

For each \(i \in [n]\), we define the event \(Y_i\) as follows: For all \(j < i\), and all vertices \(x \in R_c(c_i)\) and \(y \in V(\gamma _{c_j})\) with \(\mu (x,y)=\varphi (c)\), we have

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1-\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(\Gamma _c(x))-f_{k,c}(y)\Vert _1 \nonumber \\&\qquad \ge \,(1-\varepsilon _1/2)\,d_T(x,\Gamma _c(x)). \end{aligned}$$
(92)

We now complete the proof of Lemma 5.15 by proving (90) and (91).

Proof of (90). Suppose that \(X_1,\ldots ,X_{i-1}\) and \(Y_i\) hold. We will show that \(X_i\) holds as well. First note that for all vertices in \(x,y\in V(\gamma _{c_i})\), by Lemma 5.5 and the definition of \(f_{k,c_i}\) (53), we have

$$\begin{aligned} d_T(x,y)=\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c_i}(x)-f_{k,c_i}(y)\Vert _1=\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1, \end{aligned}$$

thus we only need to prove (89) for pairs \(x\in V(\gamma _{c_i})\) and \(y\in V(\gamma _{c_j})\) for \(j< i\) and \(\mu (x,y)=\varphi (c)\). We now divide the pairs with one endpoint in \(\gamma _{c_i}\) into two cases based on \(\Gamma _c\).

Case I: \(x\in V(\gamma _{c_i})\) with \(x\ne \Gamma _c(x)\), and \(y\in V(\gamma _{c_j})\) for some \(j< i\), and \(\mu (x,y)=\varphi (c)\).

In this case, by Lemma 5.13, there exists a vertex \(z\in R_{c}(c_i)\) such that

$$\begin{aligned} d(x,v_{c_i}) \le d(z,v_{c_i})\le 2\max \big (\delta \,\mathsf{{len}} (E(\gamma _{c_i})),d_T(x,v_{c_i})\big ). \end{aligned}$$

If \(d(x,v_{c_i})\le \delta \,\) \(\mathsf{{len}}\) \((E(\gamma _{c_i}))\), then by (18), we have \(\mathsf{{len}}\) \(( E(\gamma _{c_i})) = \rho _\chi (x,v_{c_i};\delta )\), hence

$$\begin{aligned} d_T(z,\Gamma _c(z))&\le d_T(v_{c_i},\Gamma _c(z))+2\max ( \delta \,\mathsf{{len}}(E(\gamma _{c_i})),d_T(x,v_{c_i}))\nonumber \\&\le d_T(v_{c_i},\Gamma _c(z))+2\max ( \delta \,\rho _\chi (x,v_{c_i};\delta ),d_T(x,v_{c_i}))\nonumber \\&\le d_T(v_{c_i},\Gamma _c(z))+ 2\,\delta \,\rho _\chi (x,v_{c_i};\delta )+2\,d_T(x,v_{c_i})\nonumber \\&\le 2\delta \,\rho _\chi (x,v_{c_i};\delta )+2\,d_T(x,\Gamma _c(z)). \end{aligned}$$
(93)

Since \(z \in R_c(c_i)\), by definition we have \(\Gamma _c(z)\ne z\), therefore by Lemma 5.11, \(\Gamma _c(z)=v_{c'}\) for some color \(c'\in \chi (P_{z\,v_c})\setminus \{c\}\). The function \(\varphi \) is non-decreasing along any root–leaf path, hence \(\chi (\Gamma _c(z),p(\Gamma _c(z)))= c_{\ell }\) for some \(\ell <i\).

We refer to Fig. 2 for the relative position of the vertices referenced in the following inequalities. Using our assumption that \(X_1,\ldots , X_{i-1}\) and \(Y_i\) hold, we can write

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(z)-f_{k,c}(y)\Vert _1 \\&\quad {\mathop {\ge }\limits ^{Y_i}}\,\, d_T(\Gamma _c(z),z)-(\varepsilon _1/2)\,d_T(z,\Gamma _c(z))+\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(\Gamma _c(z))-f_{k,c}(y)\Vert _1\\&\quad {\mathop {\!\!\!\!\!\!\!\!\!\!\!\!\!\ge }\limits ^{X_{\max (\ell ,j)}}}\,\, d_T(\Gamma _c(z),z)\!-\!(\varepsilon _1/2)\,d_T(z,\Gamma _c(z))\!+\!d_T(\Gamma _c(z),y)\\&\qquad -\varepsilon _2\, d_T(\Gamma _c(\Gamma _c(z)), \Gamma _c(y)) -\varepsilon _1\,d_T(\Gamma _c(z),y) -\delta \,\rho _\chi (\Gamma _c(z),y;\delta )\\&\quad \ge \,\, d_T(y,z)-(\varepsilon _1/2)\,d_T(z,\Gamma _c(z)) -\varepsilon _2\,d_T(\Gamma _c(\Gamma _c(z)),\Gamma _c(y))\\&\qquad -\varepsilon _1\,d_T(\Gamma _c(z),y) -\delta \,\rho _\chi (\Gamma _c(z),y;\delta ). \end{aligned}$$

We may assume that \(\varepsilon _1 < 1\), otherwise there is nothing to prove. Using the preceding inequality, and applying Lemma 5.14 on pairs \((z,y)\) and \((x,y)\) implies that

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1\\&\quad \ge d_T(x,y)-(\varepsilon _1/2)\,d_T(z,\Gamma _c(z)) -\varepsilon _2\,d_T(\Gamma _c(\Gamma _c(z)),\Gamma _c(y))\\&\qquad -\varepsilon _1\,d_T(\Gamma _c(z),y) -\delta \,\rho _\chi (\Gamma _c(z),y;\delta )\\&\quad {\mathop {\!\!\ge }\limits ^{\!\!\!(93)}} d_T(x,y)-(\varepsilon _1/2)\big ( 2\,d_T(x,\Gamma _c(z))+2\delta \, \rho _\chi (x,v_{c_i};\delta )\big )\\&\qquad -\varepsilon _2\,d_T(\Gamma _c(\Gamma _c(z)),\Gamma _c(y))-\varepsilon _1\,d_T(\Gamma _c(z),y) -\delta \,\rho _\chi ( \Gamma _c(z),y;\delta ). \end{aligned}$$

where in the last line we have used the fact that \(\varepsilon _1\le 1\).

Fig. 2
figure 2

Position of vertices in the subtree \(T(c)\) for Case I

We have \(\chi (x,p(x))=\chi (z,p(z))=c_i\). Moreover, since \(\Gamma _c(z)\ne z\), using Lemma 5.11 it is easy to check that \(x\in P_{z\,\Gamma _c(z)}\). Therefore, by Lemma 5.12, \( d_T(\Gamma _c(\Gamma _c(z)),y)\le d_T(\Gamma _c(z),y)\le d_T(\Gamma _c(x),y),\) and combining this with the preceding inequality yields

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1 \ge d_T(x,y)-(\varepsilon _1/2)\big ( 2\,d_T(x,\Gamma _c(z))+2\delta \, \rho _\chi (x,v_{c_i};\delta )\big )\\&\qquad -\varepsilon _2\,d_T(\Gamma _c(x),\Gamma _c(y)) -\varepsilon _1\,d_T(\Gamma _c(z),y) -\delta \rho _\chi ( \Gamma _c(z),y;\delta ). \end{aligned}$$

Recall the definition of \(C(x,y;\delta )\) in (18). Since by Lemma 5.11, \(\Gamma _c(z)=v_{c'}\) for some color \(c'\in \chi (P_{z\,v_c})\setminus \{c\}\), we have \(C(\Gamma _c(z),y;\delta )\subseteq C(v_{c_i},y;\delta )\), hence \(\rho _\chi (v_{c_i},y;\delta )\ge \rho _\chi (\Gamma _c(z),y;\delta )\) and thus,

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1\\&\quad \ge d_T(x,y)-(\varepsilon _1/2)\big (2\,d_T (x,\Gamma _c(z))+2\delta \,\rho _\chi (x,v_{c_i};\delta )\big ) \\&\qquad -\varepsilon _2\,d_T(\Gamma _c(x),\Gamma _c(y)) -\varepsilon _1\, d_T(\Gamma _c(z),y) -\delta \,\rho _\chi ( v_{c_i},y;\delta )\\&\quad \ge d_T(x,y)\!-\!\varepsilon _1\,d_T(x,\Gamma _c(z)) \!-\!\varepsilon _2\,d_T(\Gamma _c(x),\Gamma _c(y)) -\varepsilon _1d_T(\Gamma _c(z),y)\\&\qquad -\delta \big (\rho _\chi ( v_{c_i},y;\delta )+\varepsilon _1\rho _\chi ( x,v_{c_i};\delta )\big )\\&\quad \ge d_T(x,y)-\varepsilon _1\,d_T(x,\Gamma _c(z))-\varepsilon _2\,d_T(\Gamma _c(x),\Gamma _c(y)) -\varepsilon _1d_T(\Gamma _c(z),y)\\&\qquad -\delta \big (\rho _\chi ( x,v_{c_i};\delta )+\rho _\chi ( v_{c_i},y;\delta )\big ), \end{aligned}$$

where in the last line we have again used that \(\varepsilon _1 < 1\).

The set of colors that appear on the paths \(P_{x\,v_{c_i}}\) and \(P_{v_{c_i} y}\) are disjoint, therefore \(\rho _\chi ( x,y;\delta )=\rho _\chi ( x,v_{c_i};\delta )+\rho _\chi ( v_{c_i},y;\delta )\), and

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1\\&\qquad \ge d_T(x,y)-\varepsilon _1\,d_T(x,\Gamma _c(z))\\&\qquad \qquad -\varepsilon _2\,d_T(\Gamma _c(x),\Gamma _c(y)) -\varepsilon _1\,d_T(\Gamma _c(z),y) -\delta \rho _\chi ( x,y;\delta )\\&\qquad = d_T(x,y) -\varepsilon _1\,d_T(x,y) -\varepsilon _2\, d_T(\Gamma _c(x),\Gamma _c(y)) -\delta \rho _\chi ( x,y;\delta ). \end{aligned}$$

Case II: \(x\in V(\gamma _{c_i})\) with \(x= \Gamma _c(x)\), and \(y\in V(\gamma _{c_j})\) for some \(j<i\), and \(\mu (x,y)= \varphi (c)\).

In this case, we first note that since \(c=c_1, x \notin V(\gamma _c)\). Hence we can suppose that \(x\in V(T(c'))\) for some \(c'\in \rho ^{-1}(c)\). Recall that \({\varepsilon _2\over 2}=C'\varepsilon \), where \(C'\) is the constant from Lemma 4.11. By Lemma 4.11 (with \(c', x\), and \(\varepsilon _2\over 2\) substituted for \(c, v\), and \(\varepsilon \), respectively, in the statement of Lemma 4.11), there exist vertices \(u,u'\in \{x\}\cup \{v_a:a\in \ \chi (E(P_{x\,v_{c'}}))\}\) such that

$$\begin{aligned} d_T(x,u)\le (\varepsilon _2/2) \,d_T(u',u). \end{aligned}$$
(94)

For all vertices \(z\in V(P_{u'u})\setminus \{u'\}\) and for all \(k \in {\mathbb {Z}}\),

$$\begin{aligned} \tau _k(z) \ne 0 \implies 2^k < \Big ({{d_T(u,u')} \over \varepsilon (\varphi (\chi (u,p(u)))-\varphi (\chi (v_{c'},p(v_{c'}))))}\Big ). \end{aligned}$$

We have \(\chi (v_{c'},p(v_{c'}))=c\), and this condition is exactly the same condition as (75) for \(\Gamma _c(u)\), therefore

$$\begin{aligned} d_T(x,u)\le (\varepsilon _2/2)\, d_T(u',u)\le (\varepsilon _2/2)\, d_T(\Gamma _c(u),u). \end{aligned}$$
(95)

Note that the assumption that \(\Gamma _c(x)=x\) implies that \(u\ne x\) and \(u=v_a\) for some color \(a\in \chi (E(P_{x\,v_{c'}}))\).

We have

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1-\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(u)-f_{k,c}(y)\Vert _1\nonumber \\&\quad \ge -\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(u)\Vert _1{\mathop {=}\limits ^{(5.9)}}-d_T(x,u){\mathop {\ge }\limits ^{(95)}} d_T(x,u)-\varepsilon _2\, d_T(u,\Gamma _c(u))\nonumber \\&\quad \ge d_T(x,u)-\varepsilon _2\, d_T(x,\Gamma _c(u))= d_T(x,u)-\varepsilon _2\, d_T(\Gamma _c(x),\Gamma _c(u)). \end{aligned}$$
(96)

Since \(u=v_a\) for some color \(a\in \chi (E(P_{x\,v_{c'}})), \chi (u,p(u))=c_\ell \), for some \(\ell <i\) and \(X_{\max (\ell ,j)}\) implies that

$$\begin{aligned} \sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(u)-f_{k,c}(y)\Vert _1\ge d_T(u,y)- \varepsilon _2\, d_T(\Gamma _c(u),\Gamma _c(y)) -\varepsilon _1\, d_T(u,y) -\delta \,\rho _\chi (u,y;\delta ). \end{aligned}$$

Recall the definition of \(C(x,y;\delta )\) in (18). We have \(u=v_a\) for some color \(a\in (E(P_{x\,v_{c'}}))\), therefore \(C(u,y;\delta )\subseteq C(x,y;\delta )\), and \(\rho _\chi (u,y;\delta )\le \rho _\chi (x,y;\delta )\). Now we can write

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(u)-f_{k,c}(y)\Vert _1\nonumber \\&\quad \ge d_T(u,y)- \varepsilon _2\, d_T(\Gamma _c(u),\Gamma _c(y))-\varepsilon _1\, d_T(u,y) -\delta \, \rho _\chi (x,y;\delta ). \end{aligned}$$
(97)

Adding (96) and (97) we can conclude that

$$\begin{aligned}&\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1 \\&\quad \ge d_T(u,y)+d_T(u,x)-\varepsilon _2\,(d_T(\Gamma _c(x),\Gamma _c(u))\\&\qquad +d_T(\Gamma _c(u),\Gamma _c(y)))-\varepsilon _1\,d_T(x,y) -\delta \,\rho _\chi (x,y;\delta )\\&\quad \ge d_T(x,y) -\varepsilon _2 \, d_T(\Gamma _c(x),\Gamma _c(y)) -\varepsilon _1\,d_T(x,y) -\delta \,\rho _\chi (x,y;\delta ), \end{aligned}$$

completing the proof of (90).

Proof of (91). We prove this inequality by first bounding the probability that (92) holds for a fixed \(x\) and all \(y\in V(\gamma _{c_j})\) (for a fixed \(j\in \{1,\ldots , i-1\}\)) with \(\mu (x,y)=\varphi (c)\). Then we use a union bound to complete the proof.

We start the proof by giving some definitions. For a vertex \(x\in R_{c}(c_i)\), let

$$\begin{aligned} S_x=\big \{j\in \{1,\ldots ,i-1\}: \text { there exists a } v\in V(\gamma _{c_j}) \text { such that } \mu (x,v)=\varphi (c)\big \}. \end{aligned}$$

For \(a\in S_x\), we define \(w(x;a)\) as the vertex \(v \in V(\gamma _a)\) which is furthest from the root among those satisfying \(\mu (x,v)=\varphi (c)\). Finally for \(x\in R_{c}(c_i)\), we put

$$\begin{aligned} \beta _x=\max \big \{k\in {\mathbb {Z}}: \exists z\in P_{x\,\Gamma _c(x)}\setminus \{\Gamma _c(x)\},\,\tau _k(z)\ne 0\big \}. \end{aligned}$$

Inequality (75) implies

$$\begin{aligned} 2^{\beta _x}< {d_T(x,\Gamma _c(x))\over \varepsilon (\varphi (c_i)-\varphi (c))}. \end{aligned}$$
(98)

By definition of \(R_c\), for all elements \(x\in R_c(c_i)\), we have \(\Gamma _c(x)\ne x\). Moreover, by Lemma 5.11, \(\Gamma _c(x)=v_{c'}\) for some \(c'\in \chi (E(P_{x\,v_c}))\setminus \{c\}\). Now, for \(x\in R_c(c_i)\) and \(a\in S_x\) we apply Lemma 5.10 with \(\varepsilon _1/2=12\varepsilon \) to write

$$\begin{aligned}&\mathbb {P}_{{\mathcal {E}}}\Big [\exists y\in P_{w(x;a),v_c}:\sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(x)-f_{k,c}(y)\Vert _1\nonumber \\&\quad \le (1-\varepsilon _1/2)d_T(x,\Gamma _c(x))+ \sum _{k\in {\mathbb {Z}}}\Vert f_{k,c}(y)-f_{k,c}(\Gamma _c(x))\Vert _1\Big ]\nonumber \\&\quad \le {1\over \lceil \log _2 1/\delta \rceil }{\exp \Big (-12\frac{d_T(x,\Gamma _c(x))}{2^{\beta _x+2}\varepsilon }\Big )}\,\nonumber \\&\quad {\mathop {\!\!\le }\limits ^{\!\!(98)}} \frac{\exp (-3(\varphi (c_i)-\varphi (c)))}{\lceil \log _2 1/\delta \rceil }. \end{aligned}$$
(99)

Note that, for all \(y\in V(\gamma _{c_a})\) with \(\mu (x,y)=\varphi (c)\), we have \(y\in P_{w(x;a),v_c}\).

By definition of \(R_c(c_i), |R_{c}(c_i)|\le \lceil \log _2\delta ^{-1}\rceil \). We also have \(\varphi (c_j)\le \varphi (c_i)\) for \(j<i\), and by Corollary 4.8, \(|S_x| \le i < 2^{\varphi {(c_i)}-\varphi (c)+1}\). Taking a union bound over all \(x\in R_c(c_i)\) and \(a\in S_x\) implies

$$\begin{aligned} \mathbb {P}_{{\mathcal {E}}}[\overline{Y_i}]&{\mathop {\!\!\!\le }\limits ^{(99)}} \sum _{x\in R_c(c_i)} |S_x|\Big ( {1\over \lceil \log _2\delta ^{-1}\rceil } \exp (-3(\varphi (c_i)-\varphi (c)))\Big )\\&< \big (\lceil \log _2\delta ^{-1}\rceil 2^{\varphi {(c_i)}-\varphi (c)+1}\big )\Big ( {1\over \lceil \log _2\delta ^{-1}\rceil } \exp (-3(\varphi (c_i)-\varphi (c)))\Big )\\&= 2^{\varphi {(c_i)}-\varphi (c)+1} \exp (-3(\varphi (c_i)-\varphi (c))). \end{aligned}$$

Since \(\varphi (c_i)\ge \varphi (c)\), by an elementary calculation we conclude that

$$\begin{aligned} \mathbb {P}_{{\mathcal {E}}}[\overline{Y_i}] < 2\cdot 2^{-3(\varphi (c_i)-\varphi (c))}, \end{aligned}$$

which completes the proof of (91). \(\square \)

Finally, we present the proof of Lemma 5.7.

Proof of Lemma 5.7

Let \(C\) be the same constant as the constant in Lemma 5.15. For the sake of contradiction, suppose that

$$\begin{aligned}&\mathbb {P}\Big [\forall x,y \in V,\,\, (1 -C\varepsilon ) \,d_T(x,y) - \delta \, \rho _{\chi }(x,y;\delta )\\&\qquad \le \sum _{i\in {\mathbb {Z}}}\Vert f_i(x)-f_i(y)\Vert _1 \le d_T(x,y)\Big ]=0. \end{aligned}$$

Now let \(c\in \chi (E)\cup \{{\chi }(r,p(r))\}\) be a color with a maximal value of \(\varphi (c)\) such that

$$\begin{aligned}&\mathbb {P}\Big [\forall x,y \in V(T(c)),\,\,(1 -C\varepsilon ) \,d_T(x,y) - \delta \, \rho _{\chi }(x,y;\delta )\nonumber \\&\qquad \le \sum _{i\in {\mathbb {Z}}}\Vert f_{i,c}(x)-f_{i,c}(y)\Vert _1 \le d_T(x,y)\Big ]=0. \end{aligned}$$
(100)

For \(a\in \chi (E), \kappa (a)>0\). Hence, for all \(c'\in \rho ^{-1}(c),\) by (32), \(\varphi (c')>\varphi (c)\), and by maximality of \(c\), for all \(c'\in \rho ^{-1}(c),\) we have

$$\begin{aligned}&\mathbb {P}\Big [x,y \in V(T(c')),\,\,(1 -C\varepsilon ) \,d_T(x,y) - \delta \, \rho _{\chi }(x,y;\delta )\\&\qquad \le \sum _{i\in {\mathbb {Z}}}\Vert f_{i,c'}(x)-f_{i,c'}(y)\Vert _1 \le d_T(x,y)\Big ] > 0. \end{aligned}$$

But now applying Lemma 5.15 contradicts (100), completing the proof. \(\square \)