1 Introduction

Recent years have seen a large outpouring of work in analysis, geometry and theoretical computer science on metric space embeddings guaranteed to introduce only small distortion into the distances between pairs of points.

Euclidean space is not only a metric space, it is also equipped with higher dimensional volumes. General metrics do not carry such structure. However, a general definition for the volume of a set of points in an arbitrary metric was developed by Feige [10].

In this paper we extend the study of metric embeddings into Euclidean space by first, showing a robustness property of the general volume definition. Using this robustness property, together with existing metric embedding methods, to show an embedding that guarantees small distortion not only on pairs, but also on the volumes of sets of points. The robustness property (see Theorem 2) is that the minimization over permutations in the volume definition affects it by only a constant. This result is of independent interest as it provides an analysis for the greedy algorithm on a variant of the online Steiner tree problem, where the cost of buying an edge is logarithmic in its length. We show that the greedy algorithm has a constant competitive ratio to the optimum. Our main application of Theorem 2 is an algorithmic embedding (see Theorem 3) with constant average distortion for sets of any fixed size. In fact, our bound on the average distortion scales logarithmically with the size of the set. Moreover this bound holds even for higher moments of the distortion (the \(\ell _q\)-distortion), while the embedding still maintains the best possible worst case distortion bound, simultaneously. Hence our embedding generalizes both [16] and [3] (see related work below).

1.1 Volume in General Metric Spaces

Let \(d_\mathrm{{E}}\) denote Euclidean distance, and let \(\mathrm{affspan}\) denote the affine span of a point set. The \((n-1)\)-dimensional Euclidean volume of the convex hull of points \(X=\{v_1,\ldots ,v_n\} \subseteq \mathbb {R}^d\) is

$$\begin{aligned} {\phi }_\mathrm{E}(X) = \frac{1}{(n-1)!} \prod _{i=2}^n d_\mathrm{E}(v_i,\mathrm{affspan}(v_1,\ldots ,v_{i-1})). \end{aligned}$$

This definition is, of course, independent of the order of the points.

1.1.1 Feige’s Notion of Volume

Let \((X,d_X)\) be a finite metric space, \(X=\{v_1, \ldots , v_n\}\). Let \(S_n\) be the symmetric group on \(n\) symbols, and let \(\pi _\mathrm{{P}} \in S_n\) be an order in which the points of \(X\) may be adjoined to a minimum spanning tree by Prim’s algorithm. (Thus \(v_{\pi _\mathrm{{P}}(1)}\) is an arbitrary point, \(v_{\pi _\mathrm{{P}}(2)}\) is the closest point to it, etc.) Feige’s notion of the volume of \(X\) is (we have normalized by a factor of \((n-1)!\)):

$$\begin{aligned} {\phi }_\mathrm{F}(X) =\frac{1}{(n-1)!} \prod _{i=2}^n d_X(v_{\pi _\mathrm{{P}}(i)},\{v_{\pi _\mathrm{{P}}(1)},\ldots ,v_{\pi _\mathrm{{P}}(i-1)}\}). \end{aligned}$$
(1)

\(\pi _\mathrm{{P}}\) minimizes the above expression (1) (see Sect. 2).

It should be noted that even if \(X\) is a subset of Euclidean space, \({\phi }_\mathrm{E}\) and \({\phi }_\mathrm{F}\) do not agree. (The latter can be arbitrarily larger than the former, for instance, a very thin triangle.) The actual relationship that Feige found between these notions is nontrivial. Let \(\mathcal {L}_2(X)\) be the set of non-expansive embeddings from \(X\) into Euclidean space. Feige proved the following:

Theorem 1

(Feige) For any \(n\) point metric space \((X,d)\):

$$\begin{aligned} 1 \le \big [ \frac{ {\phi }_{F}(X)}{\sup _{f \in \mathcal {L}_2(X)} {\phi }_{E}(f(X)) }\big ]^{1/(n-1)} \le 2. \end{aligned}$$

Thus, remarkably, \({\phi }_\mathrm{{F}}(X)\) is characterized to within a factor of \(2\) (after normalizing for dimension) by the Euclidean embeddings of \(X\).

1.1.2 Our Work, Part I: Robustness of the Metric Volume

What we show first is that Feige’s definition is insensitive to the minimization over permutations implicit in Eq. (1), and so also a generalized version of Theorem 1 can be obtained.

Theorem 2

There is a constant \(C\) such that for any \(n\)-point metric space \((X,d)\), and with \(\pi _\mathrm{{P}}\) defined as above, and for every \(\pi \in S_n\):

$$\begin{aligned} 1 \le \left( \frac{\prod _{i=2}^n d_X(v_{\pi (i)},\{v_{\pi (1)},\ldots ,v_{\pi (i-1)}\})}{\prod _{i=2}^n d_X(v_{\pi _\mathrm{{P}}(i)},\{v_{\pi _\mathrm{{P}}(1)},\ldots ,v_{\pi _\mathrm{{P}}(i-1)}\})} \right) ^{1/(n-1)} \le C. \end{aligned}$$

An alternative interpretation of this result can be presented as the analysis of the following online problem. Consider the following variant of the online metric Steiner tree problem [14]. Given a complete weighted graph \((V,E)\), at each time unit \(i\), the adversary outputs a vertex \(v_i \in V\) and an online algorithm can buy edges \(E_i \subseteq E\). At each time unit \(i\), the edges bought \(E_1,\dots ,E_i\) must induce a connected graph among the current set of vertices \(v_1,\dots ,v_i\). The competitive ratio of an online algorithm is the worst ratio between the cost of the edges bought and the cost of the edges bought by the optimal offline algorithm. This problem has been well-studied when the cost of buying an edge is proportional to its length. Imase and Waxman prove that the greedy algorithm is \(O(\log n)\) competitive, and shown that this bound is asymptotically tight. It is natural to consider a variant where the cost of buying is a concave function of the edge length. In this case a better result may be possible. In particular we analyze the case where this cost function is logarithmic in edge length. Such a logarithmic cost function may capture the economy-of-scale effects where buying multiplicatively longer edges costs only additively more. In Sect. 2.1, we prove the following corollary of Theorem 2,

Corollary 1

Given a complete weighted graph with arbitrary weights which are at least \(2\), the greedy algorithm is \(O(1)\)-competitive for the Online Metric Steiner Tree with logarithmic edge costs.

1.1.3 Our Work, Part II: Volume Preserving Embeddings

We use Theorem 2 and recent results on metric embeddings [3] to show an algorithm that provides a non-contractive embedding into Euclidean space that faithfully preserves volume in the following sense: the embedding obtains simultaneously both \(O(\log k)\) average volume distortion and \(O(\log n)\) worst case volume distortion for sets of size \(k\).

Given an \(n\) point metric space \((X,d)\) an injective mapping \(f:X \rightarrow L_2\) is called an embedding. An embedding is \((k-1)\)-dimensional non-contractive if for any \(S \in {X\atopwithdelims ()k}\): \({\phi }_\mathrm{{E}}(f(S))\ge {\phi }_\mathrm{{F}}(S)\).

Let \(f\) be a \((k-1)\)-dimensional non-contractive embedding. For a set \(S \subseteq {X\atopwithdelims ()k}\) define the \((k-1)\)-dimensional distortion of \(S\) under \(f\) as

$$\begin{aligned} \mathrm{dist}_\mathrm{F}(S) = \Big [\frac{ {\phi }_\mathrm{E}(f(S))}{{\phi }_\mathrm{F}(S)}\Big ]^{1/(k-1)}. \end{aligned}$$

For \(2\le k \le n\) define the \((k-1)\)-dimensional distortion of \(f\) as

$$\begin{aligned} \mathrm{dist}^{(k-1)}(f)= \max _{S \in {X \atopwithdelims ()k}} \mathrm{dist}_f(S). \end{aligned}$$

More generally, for \(2\le k \le n\) and \(1 \le q \le \infty \), define the \((k-1)\)-dimensional \(\ell _q\)-distortion of \(f\) as

$$\begin{aligned} \mathrm{dist}_q^{(k-1)}(f) = {\mathbb {E}}_{S \sim {X\atopwithdelims ()k}}[\mathrm{dist}_f(S)^q ]^{1/q}, \end{aligned}$$

where the expectation is taken according to the uniform distribution over \({X\atopwithdelims ()k}\). Observe that the notion of \((k-1)\)-dimensional distortion is expressed by \(\mathrm{dist}_\infty ^{(k-1)}(f)\) and the average \((k-1)\)-dimensional distortion is expressed by the \(\mathrm{dist}_1^{(k-1)}(f)\)-distortion.

It is worth noting that Feige’s definition of volume is related to the maximum volume obtained by non-expansive embeddings, while the definition of average distortion and \(\ell _q\)-distortion are using non-contractive embeddings. We note that these definitions are crucial in order to capture the coarse geometric notion described above and achieve results that significantly beat the usual worst case lower bounds (which depend on the size of the metric). It is clear that one can modify the definition to allow arbitrary embeddings (in particular non-contractive) by defining distortions normalized by taking their ratio with respect to the largest contraction.Footnote 1

Our main theorem on volume preserving embeddings is:

Theorem 3

For any metric space \((X,d)\) on \(n\) points and any \(2\le k \le n\), there exists a map \(f:X\rightarrow L_2\) such that for any \(1\le q \le \infty ,\, \mathrm{dist}_q^{(k-1)}(f) \in O(\min \{\lceil q/(k-1)\rceil \cdot \log k, \log n\})\). In particular, \(\mathrm{dist}_\infty ^{(k-1)}(f) \in O(\log n)\) and \(\mathrm{dist}_1^{(k-1)}(f) \in O(\log k)\).

On top of the robustness property of the general volume definition of Theorem 2 the proof of Theorem 3 builds on the embedding techniques developed in [3] (in the context of pairwise distortion) along with combinatorial arguments that enable the stated bounds on the average and \(\ell _q\)-volume distortions.

Our embedding preserves well sets with typically large distances and can be viewed within the context of coarse geometry where we desire a “high level” geometric representation of the space. This follows from a special property formally stated in Lemma 5.

1.2 Related Work

Embeddings of metric spaces have been a central field of research in theoretical computer science in recent years, due to the fact the metric spaces are important objects in representation of data. A fundamental theorem of Bourgain [5] states that every \(n\) point metric space \((X,d)\) can be embedded in \(L_2\) with distortion \(O(\log n)\), where the distortion is defined as the worst-case multiplicative factor by which a pair of distances change. Our work extends this result in two aspects: (1) bounding the distortion of sets of arbitrary size, and (2) providing bounds for the \(\ell _q\)-distortion for all \(q \le \infty \).

1.2.1 Volume Preserving Embeddings

Feige [10] introduced volume preserving embeddings while developing an approximation algorithm for the bandwidth problem. He showed that Bourgain’s embedding provides an embedding into Euclidean space with \((k-1)\)-dimensional distortion of \(O(\sqrt{\log n}\cdot \sqrt{\log n+k \log k})\).

Following Feige’s work some special cases of volume preserving embeddings were studied, where the metric space \(X\) is restricted to a certain class of metric spaces. Rao [21] studies the case where \(X\) is planar or is an excluded-minor metric showing constant \((k-1)\)-dimensional distortions. Gupta [12] showed an improved approximation of the bandwidth for trees and chordal graphs. As the Feige volume does not coincide with the standard Euclidean volume, it is also interesting to study this special case when the metric space is given in Euclidean space. This case was studied by Rao [21], Dunagan and Vempala [8] and by Lee [19]. We note that our work provides the first average distortion and \(\ell _q\)-distortion analysis also in the context of this special case.

The first improvement on Feige’s volume distortion bounds comes from the work of Rao [21]. As observed by many researchers Rao’s embedding gives more general results depending on a certain decomposability parameter of the space. This provides a bound on the \((k-1)\)-dimensional distortion of \(O((\log n)^{3/2})\) for all \(k\le n\). This bound has been further improved to \(O(\log n)\) in work of Krauthgamer et al. [16]. Krauthgamer et al. [15] show a matching \(\Omega (\log n)\) lower bound on the \((k-1)\)-dimensional distortion for all \(k <n^{1/3}\).

1.2.2 Average and \(\ell _q\) Distortion

The notions of average distortion and \(\ell _q\)-distortion is tightly related to the notions of partial embeddings and scaling embedding.Footnote 2 A \((1-\varepsilon )\) partial embedding requires distortion at most \(\alpha \) for at least \((1-\varepsilon )\) fraction of the pairs. In a scaling embedding we have a function \(\alpha :(0,1)\rightarrow \mathbb {R}\), and it demands \((1-\varepsilon )\) fraction of the pairs to have distortion at most \(\alpha (\varepsilon )\), for all \(\varepsilon \in (0,1)\) simultaneously. These notions were introduced by Kleinberg et al. [18], largely motivated by the study of distances in computer networks.

In [1] partial embedding into \(L_p\) with tight \(O(\log 1/\varepsilon )\) partial distortion were given. The embedding method of [3] provides a scaling embedding with \(O(\log 1/\varepsilon )\) distortion for all values of \(\varepsilon >0\) simultaneously. As a consequence of having scaling embedding, they show that any metric space can be embedded into \(L_p\) with constant average distortion, and more generally that the \(\ell _q\)-distortion bounded by \(O(q)\), while maintaining the best worse case distortion possible of \(O(\log n)\), simultaneously.

Previous results on average distortion have applications for a variety of approximation problems, including uncapacitated quadratic assignment [3], and in addition have been used in solving graph theoretic problems [9]. Following [1, 3, 18] related notions have been studied in various contexts [2, 6, 7, 17].

2 Robustness of the Metric Volume

Proof of Theorem 2 For a tree \(T\) on \(n\) vertices \(\{v_1,\ldots ,v_n\}\) let \(\overline{{\phi }}(T)\) be the product of the edge lengths. Because of the matroid exchange property, this product is minimized by an MST.Footnote 3 Thus for any metric space on points \(\{v_1,\ldots ,v_n\}\) and any spanning tree \(T\), \({\phi }_{F}(v_1,\ldots ,v_n) \le \overline{{\phi }}(T)/(n-1)!\); the inequality is saturated by any (and only a) minimum spanning tree.

Definition 1

A forced spanning tree (FST) for a finite metric space is a spanning tree whose vertices can be ordered \(v_1,\ldots ,v_n\) so that for every \(i>1\), \(v_i\) is connected to a vertex that is closest among \(v_1,\ldots ,v_{i-1}\), and to no other among these. (We call such an ordering admissible for the tree.)

An MST is an FST with the additional property that in an admissible ordering \(v_i\) is a closest vertex to \(v_1,\ldots ,v_{i-1}\) among \(v_i,\ldots ,v_n\).

Definition 2

For a tree \(T\) let \(\Delta (T)\) denote its diameter (the largest distance between any two points in the tree). Let the diameter \(\Delta (F)\) of a forest \(F\) with components \(T_1,T_2,\ldots ,T_m\) be \(\Delta (F)=\max _{1\le i\le m}\Delta (T_i)\). For a metric space \((X,d)\) let \(\Delta _k(X)=\min \{\Delta (F)\mid F\) is a spanning forest of X with k connected components}.

Lemma 2

Let \((X,d)\) be a metric space. Let \(k \ge 1\). An FST for \(X\) has at most \(k-1\) edges of length greater than \(\Delta _k(X)\).

Proof

Let \(v_1,\ldots ,v_n\) be an admissible ordering of the vertices of the FST. Assign each edge to its higher-indexed vertex. Since the ordering is admissible, this assignment is injective. The lemma is trivial for \(k=1\). For \(k \ge 2\), cover \(X\) by the union of \(k\) trees each of diameter at most \(\Delta _k(X)\). Only the lowest-indexed vertex in a tree can be assigned an edge longer than \(\Delta _k(X)\). (Note that \(v_1\) is assigned no edge, hence the bound of \(k-1\).) \(\square \)

Corollary 3

For any \(n\)-point metric space \((X,d)\) and any FST \(T'\) for \(X\), \( \overline{{\phi }}(T')\le \prod _{k=1}^{n-1}\Delta _k(X). \)

Proof

Order the edges from \(1\) to \(n-1\) by decreasing length. The \(k\)th edge is no longer than \(\Delta _k(X)\). \(\square \)

Using Corollary 3, our proof of Theorem 2 reduces to showing that for any MST \(T\) of \(X,\, \prod _{k=1}^{n-1}\Delta _k(X)\le e^{O(n-1)}\overline{{\phi }}(T)\). Specifically we shall show that for any spanning tree \(T\),

$$\begin{aligned} \prod _{k=1}^{n-1}\Delta _k(X)\le \frac{1}{n^2} \Big (\frac{4 \pi ^2}{3} \Big )^{n-1} \overline{{\phi }}(T). \end{aligned}$$

(Observe incidentally that the FST created by the Gonzalez [11] and Hochbaum–Shmoys [13] process has \(\overline{{\phi }}\) at least \(2^{1-n} \prod _{k=1}^{n-1}\Delta _k(X)\).)

The idea is to recursively decompose \(T\) by cutting an edge; letting the two remaining trees be \(T_1\) (with some \(m\) edges) and \(T_2\) (with \(n-2-m\) edges), we shall upper bound \(\prod _1^{n-1} \Delta _k(T)\) in terms of \(\prod _1^{m} \Delta _k(T_1)\) and \(\prod _1^{n-2-m} \Delta _k(T_2)\). More on this after we show how to pick an edge to cut. Recall: \(\sum _{j \ge 1} 1/j^2 = \pi ^2/6\).

Edge selection Find a diametric path \(\gamma \) of \(T\), i.e., a simple path whose length \(|\gamma |\) equals the diameter \(\Delta (T)\). For appropriate \(\ell \ge 2\) let \(u_1,\ldots ,u_\ell \) be the weights of the edges of \(\gamma \) in the order they appear on the path. Select the \(j\)th edge on the path, for a \(1 \le j \le \ell \) for which \(u_j/|\gamma | > 1/(2(\pi ^2/6)\min \{j,\ell +1-j\}^2)\). Such an edge exists, as otherwise \(\sum _1^\ell u_j \le (6/\pi ^2) |\gamma | \sum _1^\ell j^{-2} < |\gamma |\). Without loss of generality \(j \le \ell +1-j\) (otherwise flip the indexing on \(\gamma \)), hence cutting \(u_j\) contributes overhead \(|\gamma | / u_{j} < 2(\pi ^2/6)j^2\) to the product \(\prod _1^{n-1} \Delta _k\), and yields subtrees \(T_1\) and \(T_2\) each containing at least \(j-1\) edges.

Think of this recursive process as successively breaking the spanning tree into a finer and finer forest. Note that we haven’t yet specified which tree of the forest is cut, but we have specified which edge in that tree is cut. The order in which trees are chosen to be cut is: \(F_k(T)\) (which has \(k\) components) is defined by (a) \(F_1(T)=T\); (b) for \(1<k<n\), \(F_k(T)\) is obtained from \(F_{k-1}(T)\) by cutting an edge in the tree of greatest diameter.

Note that by definition \(\Delta _k(X)\le \Delta (F_k(T))\).

Induction Now we show that

$$\begin{aligned} \prod _1^{n-1} \Delta (F_k(T)) \le \frac{1}{n^2} \Big (\frac{4 \pi ^2}{3} \Big )^{n-1} \overline{{\phi }}(T) . \end{aligned}$$

It will be convenient to do this by an induction showing that there are constants \(c_1,c_2>0\) such that

$$\begin{aligned} \prod _1^{n-1} \Delta (F_k(T)) \le e^{c_1(n-1)-c_2\log n} \overline{{\phi }}(T), \end{aligned}$$

and finally justify the choices \(c_1=\log (4 \pi ^2/3)\) and \(c_2=2\). As to base-cases, \(n=1\) is trivial, and \(n=2\) is assured for any \(c_1 \ge 0\).

For \(n>2\) let the children of \(T\) be \(T_1\) and \(T_2\), that is to say, \(F_2(T)=\{T_1,T_2\}\). Let \(m\) and \(n-2-m\) be the numbers of edges in \(T_1\) and \(T_2\) respectively. Observe that with \(j\) as defined above, \(\min \{m,n-2-m\} \ge j-1 \ge 0\).

Examine three sequences of forests: the \(T\) sequence, \(F_1(T),\ldots ,F_{n-1}(T)\); the \(T_1\) sequence, \(F_1(T_1),\ldots ,F_{m}(T_1)\); the \(T_2\) sequence, \(F_1(T_2),\ldots ,F_{n-2-m}(T_2)\).

As indicated earlier, in each forest \(f\) in the \(T\) sequence other than \(F_1(T)\), choose a component \(t\) of greatest diameter, i.e., one for which \(\Delta (t)=\Delta (f)\). (In case of ties some consistent choice must be made within the \(T,T_1\) and \(T_2\) sequences.)

If \(t\) lies within \(T_1\), assign \(f\) to the forest in the \(T_1\) sequence that agrees with \(f\) within \(T_1\). Similarly if \(t\) lies within \(T_2\), assign \(f\) to the appropriate forest in the \(T_2\) sequence. Due to the process defining the forests \(F_k(T)\), this assignment is injective. Moreover, a forest in the \(T\) sequence, and the forest it is assigned to in the \(T_1\) or \(T_2\) sequence, share a common diameter. Hence

$$\begin{aligned} \prod _2^{n-1} \Delta (F_k(T)) = \Big (\prod _1^{m} \Delta (F_k(T_1))\Big ) \Big (\prod _1^{n-2-m} \Delta (F_k(T_2))\Big ). \end{aligned}$$

Therefore

$$\begin{aligned} \prod _1^{n-1} \Delta (F_k(T))&= \Delta (T)\cdot \prod _2^{n-1} \Delta (F_k(T)) \\&= \Delta (T)\cdot \Big (\prod _1^{m} \Delta (F_k(T_1))\Big ) \Big (\prod _1^{n-2-m} \Delta (F_k(T_2))\Big ). \end{aligned}$$

Now by induction

$$\begin{aligned} \prod _1^{n-1} \Delta (F_k(T)) \le \Delta (T)\cdot e^{c_1 m-c_2\log (m+1)} \cdot \overline{{\phi }}(T_1) \cdot e^{c_1(n-2-m)-c_2\log (n-1-m)} \cdot \overline{{\phi }}(T_2). \end{aligned}$$

As \(\overline{{\phi }}(T) = u_j \cdot \overline{{\phi }}(T_1) \overline{{\phi }}(T_2)\) we get

$$\begin{aligned}&\frac{ \prod _1^{n-1} \Delta (F_k(T)) }{\overline{{\phi }}(T)} \\&\quad \le (\Delta (T)/u_{j})\cdot \exp \big \{c_1(n-2)-c_2(\log (m+1) + \log (n-1-m))\big \} \\&\quad \le \exp \big \{\log (2(\pi ^2/6)j^2) + c_1(n-2)-c_2(\log (m+1) + \log (n-1-m))\big \}\\&\quad \le \exp \big \{\log (\pi ^2 j^2/3) + c_1(n-2)-c_2(\log j + \log (n/2))\big \}\\&\quad \le \exp \big \{c_1(n-1) -c_2 \log n -(c_2-2) \log j -(c_1- c_2 \log 2 - \log (\pi ^2 /3)) \big \} \end{aligned}$$

Choose \(c_2 \ge 2\) to take care of the third term in the exponent, and choose \(c_1 \ge \log (\pi ^2/3) + c_2 \log 2 \) to take care of the fourth term in the exponent. (In the theorem statement, both of these choices have been made with equality.) So

$$\begin{aligned} \cdots \le \exp \left\{ c_1(n-1)-c_2 \log n \right\} . \end{aligned}$$

\(\square \)

2.1 Online Metric Steiner Tree

Here we prove Corollary 1. Recall that in the online metric Steiner Tree problem we are given a complete weighted graph \(G=(V,E,w)\), with \(d_G\) the shortest path metric on \(G\) with respect to the weights, and the cost of each edge is the logarithm of its weight (we shall assume all weights are at least \(2\), so the cost of every edge is at least \(1\)). Given a sequence \(v_1,\dots ,v_n\) of vertices from \(V\), we should output at every step \(1\le i\le n\) a subgraph \(C_i\) such that \(v_1,\dots ,v_i\) are connected in \(C_i\), and such that \(C_{i-1}\subseteq C_i\) for all \(i\). The cost of the subgraph \(C_i\) is the summation over the costs of edges in \(C_i\). The greedy algorithm does the following: at every step \(i\ge 2\), add the edge \(\{v_i,v_j\}\) where \(v_j\in \{v_1,\dots ,v_{i-1}\}\) is the closest to \(v_i\) among the previous vertices.

First we lower bound the cost of the optimal (offline) algorithm. For each \(i\), the contribution of connecting \(v_i\) to the minimum Steiner tree is at least \(\frac{1}{2}d_G(v_i,V\setminus \{v_i\})\). Since the weights are at least \(2\), for any path \(u_1,\dots ,u_k\) of length \(\ell \) we have that

$$\begin{aligned} \sum _{j=2}^k\log (d_G(u_{j-1},u_j))\ge \log \Big (\sum _{j=2}^kd_G(u_{j-1},u_j)\Big )\ge \log \ell . \end{aligned}$$

This implies that the cost of connecting each \(v_i\) is at least \(\log \big (\frac{1}{2}d_G(v_i,V\setminus \{v_i\})\big )\), and thus

$$\begin{aligned} \mathrm{cost}(OPT)\ge \sum _{i=2}^n \log \big (\frac{1}{2}d_G(v_i,V\setminus \{v_i\})\big )=\log \Big (\frac{1}{2^{n-1}}\prod _{i=2}^nd_G(v_i,V\setminus \{v_i\})\Big ). \end{aligned}$$

Next we upper bound the cost of the greedy algorithm, which is

$$\begin{aligned} \mathrm{cost}(ALG)\le \sum _{i=2}^n\log \big (d_G(v_i,\{v_1,\dots v_{i-1}\})\big )=\log \Big (\prod _{i=2}^nd_G(v_i,\{v_1,\dots v_{i-1}\})\Big ). \end{aligned}$$

Using Theorem 2 we have that

$$\begin{aligned} \mathrm{cost}(ALG)\le \mathrm{cost}(OPT)+O(n), \end{aligned}$$

and as \(\mathrm{cost}(OPT)\ge n-1\) we have that the greedy is also a \(O(1)\) multiplicative approximation algorithm.

3 Volume Preserving Embeddings

In this section we prove Theorem 3. The construction will be based on the embedding of [3], who gave a general framework for embedding metrics into normed spaces. It was shown in [3] that for every metric space \((X,d)\) on \(n\) points there exists a distribution over maps \(f:X\rightarrow \mathbb {R}\) with the following properties: Every map in the support has expansion \(O(\log n)\), and for every pair of points \(x,y\in X\), with probability \(1/2\) the map \(f\) does not contract \(x,y\). Moreover, it was shown that distortion is scaling: for every \(0<\varepsilon <1\), at least \(1-\varepsilon \) fraction of the pairs of \(X\) have expansion only \(O(\log (1/\varepsilon ))\). Using this, one can construct an embedding of \(X\) into \(\mathbb {R}^{O(\log n)}\) by taking \(O(\log n)\) independent copies of \(f\), and applying concentration bounds. Having such a scaling distortion implies \(O(1)\) average distortion and an \(O(q)\) bound on the \(\ell _q\) distortion.

Here we extend this framework for embedding that preserve the volume of subsets of \(X\) of cardinality \(k\). First we strengthen the analysis of the line embedding of [3], so that the bound on the contraction of the embedding holds for any \(x\in X\) and any affine combination of the images of points in a subset \(S\subset X\) (with constant probability). We then define an appropriate analogue notion of scaling distortion for sets of size \(k\), and show that taking \(O(k\log n)\) independent copies of the random line embedding yields an embedding with the appropriate bounds on the worst case, average and in general the \((k-1)\)-dimensional \(\ell _q\) distortion.

3.1 The Embedding

The following is a variation on a Lemma from [3], where the bound on contraction is strengthen to hold for subsets \(S=\{s_0,\dots ,s_{k-1}\}\), rather than just for pairs. More precisely, instead of simply lower bounding the distance between two images of points in \(X\), we will need to lower bound the distance of the image of some point \(s_i\) from any affine combination of the images of \(s_0,\dots s_{i-1}\) (conditioned on the values of these images). We can only prove this for some very specific ordering of the points in \(S\), so from now on we shall enforce an ordering on every subset \(S\subseteq X\) that complies with the requirements of the Lemma.

Lemma 4

There exists a universal constant \(\hat{C}\) such that for every finite metric space \((X,d)\) on \(n\) points, there exists a distribution \({\mathcal D}\) over functions \(f:X \rightarrow \mathbb {R}\) such that the following holds.Footnote 4

  • For all \(u,v \in X\) and all \(f \in \mathrm{supp}({\mathcal D})\),

    $$\begin{aligned} | f(u) - f(v)| \le \hat{C} \cdot \log \Big ( \frac{n}{|B(u,d(u,v))|} \Big ) \cdot d(u,v). \end{aligned}$$
  • For every subset \(S\subseteq X\) of size \(k\), there exists an ordering \(S=(s_0,\dots s_{k-1})\), such that for any \(1\le i\le k-1\), values \(x_0,\dots ,x_{i-1}\in \mathbb {R}\) and coefficients \(\alpha _0,\dots ,\alpha _{i-1}\in \mathbb {R}\) with \(\sum _{j=0}^{i-1}\alpha _j=1\):

    $$\begin{aligned} \mathop {\Pr }\limits _{f\sim {\mathcal D}} \Big [ \Big | f(s_i) \!-\! \sum _{j=0}^{i-1}\alpha _jx_j\Big | \!\ge \! d(s_i,\{s_0,\dots ,s_{i-1}\})/\hat{C}\mid f(s_j) \!=\!x_j \forall ~0\!\le \! j\!\le \! i\!-\!1\Big ] \!\ge \! 1/2. \end{aligned}$$

Let \(D=c\cdot k\ln n\) where \(c\) is a constant to be determined later. Define the embedding \(g:X\rightarrow \mathbb {R}^D\) by

$$\begin{aligned} g=\frac{4\hat{C}}{\sqrt{D}}\bigoplus _{t=1}^Df_t, \end{aligned}$$

where each \(f_t\) is sampled independently according to Lemma 4.

Next, we generalize the notion of scaling distortion for subsets of size \(k\). To this end, for each subset \(S\in {X\atopwithdelims ()k}\) with its ordering \(S=(s_0,\dots ,s_{k-1})\) (the ordering enforced by Lemma 4), define a sequence \((\varepsilon _1,\dots ,\varepsilon _{k-1})\) as follows. For each \(1\le i\le k-1\) let \(0\le j(i)<i\) be such that \(d(s_i,\{s_0,\dots ,s_{i-1}\})=d(s_i,s_{j(i)})\). Let \(\varepsilon _i\) be the value such that \(|B(s_{j(i)},d(s_i,s_{j(i)}))|=\varepsilon _in\). In other words, \(s_i\) is the \(\varepsilon _in\) nearest neighbor of the closest point to it in \(\{s_0,\dots ,s_{i-1}\}\).

Lemma 5

For any embedding \(g\) in the support of the distribution and any \(S\in {X\atopwithdelims ()k}\),

$$\begin{aligned} \frac{{\phi }_\mathrm{E}(g(S))}{{\phi }_\mathrm{F}(S)}\le \prod _{i=1}^{k-1}O(\log (1/\varepsilon _i))~. \end{aligned}$$

Proof

Fix any \(1\le i\le k-1\), and let \(0\le j(i)\le i-1\) be such that \(d(s_i,s_{j(i)})=d(s_i,\{s_0,\dots ,s_{i-1}\})\). By the definition of \(\varepsilon _i\), \(|B(s_{j(i)},d(s_i,s_{j(i)}))|=\varepsilon _in\). Using the first property of Lemma 4 we have for any \(t\in [D]\),

$$\begin{aligned} |f_t(s_i)-f_t(s_{j(i)})|&\le \hat{C}\cdot \log \Big (\frac{n}{|B(s_{j(i)},d(s_i,s_{j(i)}))|} \Big )\cdot d(s_i,s_{j(i)})\\&= \hat{C}\cdot \log (1/\varepsilon _i)\cdot d(s_i,s_{j(i)}), \end{aligned}$$

thus also

$$\begin{aligned} d_{E}(g(s_i),g(s_{j(i)}))&\le \Big (\frac{(4\hat{C})^2}{D}\sum _{t=1}^D(\hat{C}\cdot \log (1/\varepsilon _i)\cdot d(s_i,s_{j(i)}))^2\Big )^{1/2}\nonumber \\&\le 4\hat{C}^2\log (1/\varepsilon _i)\cdot d(s_i,s_{j(i)}). \end{aligned}$$
(2)

Now Theorem 2 suggests that

$$\begin{aligned} \prod _{i=1}^{k-1}d(s_i,s_{j(i)})\le C^k\prod _{i=1}^{k-1}d(s_i,S\setminus \{s_i\})=C^k(k-1)!\cdot {\phi }_\mathrm{{F}}(S), \end{aligned}$$

and we conclude the proof by

$$\begin{aligned} {\phi }_\mathrm{E}(g(S))&= \frac{1}{(k-1)!}\prod _{i=1}^{k-1}d_\mathrm{E}(g(s_i), \mathrm{affspan}(g(s_0),\dots ,g(s_{i-1})))\\&\le \frac{1}{(k-1)!}\prod _{i=1}^{k-1}d_\mathrm{E}(g(s_i),g(s_{j(i)}))\\&\le \frac{1}{(k-1)!}\prod _{i=1}^{k-1}4\hat{C}^2\log (1/\varepsilon _i)\cdot d(s_i,s_{j(i)})\\&\le \prod _{i=1}^{k-1}O\left( \log (1/\varepsilon _i)\right) \cdot {\phi }_\mathrm{F}(S). \end{aligned}$$

\(\square \)

Lemma 6

With probability at least \(1-1/n^k\), the embedding \(g\) is \((k-1)\)-dimensional non-contractive.

Proof

Fix some \(S=(s_0,\dots ,s_{k-1})\) and \(1\le i\le k-1\). Let \(\delta _i=d(s_i,\{s_0,\dots ,s_{i-1}\})\) and \(A_i=\mathrm{affspan}(g(s_0),\ldots ,g(s_{i-1}))\). We would like to give a lower bound on the distance from \(g(s_i)\) to \(A_i\) in terms of \(\delta _i\). The main difficulty is that the nearest point to \(g(s_i)\) in \(A_i\) naturally depends on the value of \(g(s_i)\), thus we cannot use the second property of Lemma 4 directly on the nearest point (we may condition only on \(g(s_j)\) for \(j<i\)). The solution is as follows: rather than showing a lower bound on the distance from \(g(s_i)\) to the closest point in \(A_i\), we will show a lower bound on the distance from \(g(s_i)\) to all the points in a suitable net of \(A_i\).

To this end, let \(0\le j\le i-1\) be such that \(\delta _i=d(s_i,s_j)\), and let \(N_i\) be a \(\delta _i\)-net of \(B(g(s_j),8\hat{C}^2\delta _i\log n)\cap A_i\). As \(A_i\) is an \(i-1\) dimensional space, and in such low dimensional space a ball of radius \(2r\) can be covered by \(2^{O(i)}\) balls of radius \(r\). Applying this covering repeatedly we conclude that \(B(g(s_j),8\hat{C}^2\delta _i\log n)\cap A_i\) can be covered by \(2^{O(i\log \log n)}\) balls of radius \(\delta _i/2\), and as each net point is contained in at most one of these balls, it follows that \(|N_i|=2^{O(i\log \log n)}< n^k\) (for sufficiently large \(n\)).

Now, if \(b_i\in A_i\) is the closest point to \(g(s_i)\), then by using (2) and the fact that \(\varepsilon _i\ge 1/n\).

$$\begin{aligned} d_\mathrm{E}(g(s_j),b_i)&\le d_\mathrm{E}(g(s_i),g(s_j))+d_\mathrm{E}(g(s_i),b_i)\\&\le 2d_\mathrm{E}(g(s_i),g(s_j))\\&\le 2(4\hat{C}^2)\log n\cdot d(s_i,s_j)\\&= 8\hat{C}^2\log n\cdot \delta _i. \end{aligned}$$

This suggests that indeed \(b_i\in B(g(s_j),8\hat{C}^2\delta _i\log n)\), so that there exists \(a'_i\in N_i\) with

$$\begin{aligned} d_\mathrm{E}(a'_i,b_i)\le \delta _i. \end{aligned}$$
(3)

Next we prove that there is a high probability that \(g(s_i)\) is sufficiently far from all net points. Let \(a_i\in N_i\) be an arbitrary point of the net, and let \(\alpha _0,\dots ,\alpha _{i-1}\) be such that \(\sum _{j=0}^{i-1}\alpha _j=1\) and \(a_i=\sum _{j=0}^{i-1}\alpha _jg(s_j)\). Observe that

$$\begin{aligned} d_\mathrm{E}(g(s_i),a_i)^2=\frac{(4\hat{C})^2}{D}\sum _{t=1}^D \Big (f_t(s_i)-\sum _{j=0}^{i-1}\alpha _jf_t(s_j)\Big )^2 \end{aligned}$$

For each \(t\in [D]\) let \(Z_t\) be an indicator random variable for the event \(|f_t(s_i)-\sum _{j=0}^{i-1}\alpha _jf_t(s_j)|\ge \delta _i/\hat{C}\), and let \(Z=Z(S,i,a_i)=\sum _{t=1}^DZ_t\). Observe that if it is the case that \(Z\ge D/4\), then

$$\begin{aligned} d_\mathrm{E}(g(s_i),a_i)&\ge \Big (\frac{(4\hat{C})^2}{D}\sum _{t~:~ Z_t=1}\Big (f_t(s_i)-\sum _{j=0}^{i-1}\alpha _jf_t(s_j)\Big )^2\Big )^{1/2}\nonumber \\&\ge \Big (\frac{(4\hat{C})^2}{D}\cdot \frac{D}{4}(\delta _i/\hat{C})^2\Big )^{1/2}\nonumber \\&= 2\delta _i. \end{aligned}$$
(4)

By the triangle inequality, (3) and (4) used on \(a'_i\),

$$\begin{aligned} d_\mathrm{E}(g(s_i),b_i)\ge d_\mathrm{E}(g(s_i),a_i)-d_\mathrm{E}(a'_i,b_i)\ge 2\delta _i-\delta _i= \delta _i~. \end{aligned}$$

We conclude that

$$\begin{aligned} {\phi }_\mathrm{E}(g(S))&= \frac{1}{(k-1)!}\prod _{i=1}^{k-1}d_\mathrm{E}(g(s_i),b_i)\\&\ge \frac{1}{(k-1)!}\prod _{i=1}^{k-1}\delta _i\\&\ge {\phi }_\mathrm{F}(S). \end{aligned}$$

It remains to show that with probability at least \(1-1/n^k\), all of the bad events \(\{Z(S,i,a_i)<D/4\}_{S,i,a_i}\) do not happen. For a given \(Z\), by Lemma 4 \(\Pr [Z_t\mid g(s_0),\dots ,g(s_{i-1})]\ge 1/2\) (because the different coordinates are independently chosen, so for each \(Z_t\), conditioning on \(g\) is the same as conditioning just on \(f_t\)), so that \({\mathbb {E}}[Z]\ge D/2\). The crucial observation is that in the definition of the bad events we fixed only \(g(s_0),\dots g(s_{i-1})\) (to determine \(A_i\) and the net), but not \(g(s_i)\). Using a standard Chernoff bound

$$\begin{aligned} \Pr [Z<D/4]\le \Pr [Z<{\mathbb {E}}[Z]/2]\le e^{-{\mathbb {E}}[Z]/8}\le e^{-D/(16)}\le n^{-3k}, \end{aligned}$$

where the last inequality holds when \(c=48\), say. By applying the union bound on all possible \({n\atopwithdelims ()k}\) sets \(S\), all \(k\) possible indices \(i\), and all the different points \(a_i\in \mathbb {N}_i\) (recall that \(|N_i|<n^k\)) we have that with probability at most \(k\cdot n^k\cdot {n\atopwithdelims ()k}/n^{3k}\le 1/n^k\), some bad event happened. \(\square \)

By Lemmas 5 and 6 we have that there is high probability to obtain an embedding \(g:X\rightarrow \mathbb {R}^D\), such that for every subset \(S\in {X\atopwithdelims ()k}\) with its sequence \((\varepsilon _1,\dots ,\varepsilon _{k-1})\),

$$\begin{aligned} \mathrm{dist}_g(S)\le O\Big (\Big (\prod _{i=1}^{k-1}\log (1/\varepsilon _i)\Big )^{1/(k-1)}\Big ). \end{aligned}$$
(5)

In the following sections we analyze the \(\ell _q\) volume distortion of such an embedding. For the sake of simplicity we start by the \(\ell _\infty \) and then the \(\ell _1\) volume distortions before handling the general \((k-1)\)-dimensional \(\ell _q\)-distortion.

3.1.1 Bounding the \((k-1)\)-dimensional distortion

Lemma 7

The (worst case) \((k-1)\)-dimensional distortion of \(g\) is \(O(\log n)\) i.e. \(\mathrm{dist}_\infty ^{(k-1)}(g) = O(\log n)\).

Proof

For any set \(S\in {X\atopwithdelims ()k}\) and \(i\in [k-1]\), \(\varepsilon _i\ge 1/n\). So by (5)

$$\begin{aligned} \mathrm{dist}_g(S)\le O\Big (\Big (\prod _{i=1}^{k-1}\log (1/\varepsilon _i)\Big )^{1/(k-1)}\Big )\le O(\log n). \end{aligned}$$

\(\square \)

3.1.2 Bounding the average \((k-1)\)-dimensional distortion

Lemma 8

The average \((k-1)\)-dimensional distortion of \(g\) is \(O(\log k)\) i.e. \(\mathrm{dist}_1^{(k-1)}(g) = O(\log k)\).

Proof

For every set \(S\in {X\atopwithdelims ()k}\) let \(m=m(S)=\min _i\{\varepsilon _in\}\). By (5) there is a universal constant \(C'\) such that the average distortion over all possible \(S\in {X\atopwithdelims ()k}\) can be bounded as follows:

$$\begin{aligned} \frac{\mathrm{dist}_1^{(k-1)}(g)}{C'}&\le \mathbb {E}_{S\in {X\atopwithdelims ()k}}\Big [\Big (\prod _{i=1}^{k-1}\log (1/\varepsilon _i)\Big )^{1/(k-1)}\Big ]\\&\le \mathbb {E}\Big [\Big (\prod _{i=1}^{k-1}\log (n/m)\Big )^{1/(k-1)}\Big ]\\&= \mathbb {E}\left[ \log (n/m)\right] . \end{aligned}$$

In what follows we attempt to bound \(\mathbb {E}\left[ \log (n/m)\right] \). First we show that for every \(i\) and every \(t\in [n]\), we have that \(\Pr [\varepsilon _i=t/n]\le 2k/n\) (recall that \(g\) is fixed, and the probability is over a uniform choice of a subset \(S\)). This is because conditioning on any \(s_0,\dots ,s_{i-1}\), for every \(0\le j\le i-1\) the probability that \(s_i\) is the \(t\)-th nearest neighbor of \(s_j\) is at most \(1/(n-i)\), and by the union bound, the probability that there exists such a \(j\) is at most \(i/(n-i)\le k/(n-k)\le 2k/n\) (assuming \(k<n/2\), as otherwise the lemma is trivial). It follows by the union bound that for all \(t\in [n]\),

$$\begin{aligned} \Pr [m=t]\le \sum _{i=1}^{k-1}\Pr [\varepsilon _i=t/n]\le 2k^2/n. \end{aligned}$$

Let \(h = \lceil \frac{n}{k^2} \rceil \), so that \(\Pr [m=t]\le 2/h\). Now,

$$\begin{aligned} \mathbb {E}\left[ \log (n/m)\right]&\le \sum _{t=1}^{h}\Pr [m=t]\cdot \log (n/t)+ \Pr [m>h]\cdot \log (n/h)\\&\le \frac{2}{h} \Big (h\log n - \sum _{t=1}^{h}\log t\Big )+\log (k^2)+2. \end{aligned}$$

Note that \(\sum _{t=1}^{h}\log t = \log (h!)\ge h\log (h/e)\), hence

$$\begin{aligned} \mathbb {E}\left[ \log (n/m)\right] \le 2(\log n-\log (n/(e k^2))+2\log k+2 = O(\log k). \end{aligned}$$

\(\square \)

3.1.3 Bounding the \((k-1)\)-dimensional \(\ell _q\)-distortion

Here we generalize the bounds on the worst-case and average \((k-1)\)-dimensional distortion, to the \(\ell _q\) norm of the \((k-1)\)-dimensional distortion for arbitrary \(1\le ~q\le ~\infty \), and thus prove Theorem 3. Taking higher norms of the distortion suggests that we have to be more careful in estimating the probability of a sequence \((\varepsilon _1,\dots ,\varepsilon _{k-1})\) for a random set \(S\) (unlike the \(q=1\) case, where we could use only the minimal \(\varepsilon _i\)).

Lemma 9

For any \(1\le q\le \infty \), \(\mathrm{dist}_q^{(k-1)}(\hat{f}) = O(\lceil q/(k-1)\rceil \cdot \log k)\).

Proof

For \(\ell \in \{0,1,\dots ,k-1\}\), let \({\mathcal {S}}^{(\ell )}\subseteq {X\atopwithdelims ()k}\) contain all the sets \(S\) that have exactly \(\ell \) values of \(\varepsilon _i\) which are bigger than \(1/k^6\). In what follows we attempt to bound \(|{\mathcal {S}}^{(\ell )}|\). There are at most \({k-1\atopwithdelims ()\ell }\le 2^k\) possibilities to choose the \(\ell \) locations in the sequence \((\varepsilon _1,\dots ,\varepsilon _{k-1})\) that will have values larger than \(1/k^6\). Assume that these locations are fixed to be the first \(\ell \) elements of the sequence, by sorting the sequence. How many sets correspond to such sequences? There are at most \({n\atopwithdelims ()\ell +1}\) possibilities to choose \(s_0\) and the other \(\ell \) points which induce the first \(\ell \) values \(\varepsilon _1,\dots ,\varepsilon _\ell \). As for the other values, observe that for \(i>\ell \) we have \(\varepsilon _i<1/k^6\), which suggests that \(s_i\) is one of the \(\varepsilon _in\) nearest neighbor of at least one of the other \(k-1\) points, and thus there are at most \(k\cdot \varepsilon _in\) choices for \(s_i\). Let \(K_\ell =\{6\log k,6\log k+1,\dots ,\log n\}^{k-\ell -1}\), for ease of notation assume \(K_\ell \) is indexed by integers \(\ell <i<k\) (that is, for \(x\in K_\ell \) denote by \(x_{\ell +1}\) the first element of \(x\), and by \(x_{k-1}\) the last one), and fix some \(x\in K_\ell \). Denote by \({\mathcal {S}}^{(\ell )}_x\) the collection of sets in \({\mathcal {S}}^{(\ell )}\) satisfying that \(2^{-x_i}<\varepsilon _i\le 2^{-x_i+1}\) for all \(\ell <i<k\). Then

$$\begin{aligned} |{\mathcal {S}}^{(\ell )}_x|\le \Big (\begin{array}{c}{n} \\ {\ell +1}\end{array}\Big )\cdot 2^k\cdot \prod _{i=\ell +1}^{k-1}k\cdot n/2^{x_i}. \end{aligned}$$
(6)

Let \(C'\) be the constant in Lemma 5, and let \(m= \lceil q/(k-1) \rceil \). First we use (5) and the monotonicity of the normalized \(\ell _p\) norm to argue that

$$\begin{aligned} \frac{\mathrm{dist}_q^{(k-1)}(g)}{C'}&\le \mathbb {E}_{S\in {X\atopwithdelims ()k}}\Big [\Big (\prod _{i=1}^{k-1}\log (1/\varepsilon _i)\Big )^{q/(k-1)}\Big ]^{1/q}\nonumber \\&\le \mathbb {E}_{S\in {X\atopwithdelims ()k}}\Big [\Big (\prod _{i=1}^{k-1}\log (1/\varepsilon _i)\Big )^m\Big ]^{\frac{1}{m(k-1)}}\nonumber \\&=\Big [\Big (\begin{array}{c}{n} \\ {k}\end{array}\Big )^{-1}\sum _{\ell =0}^{k-1}\sum _{S\in {\mathcal {S}}^{(\ell )}}\Big (\prod _{i=1}^{k-1}\log (1/\varepsilon _i)\Big )^{m}\Big ]^{\frac{1}{m(k-1)}}\nonumber \\&\le \Big [\Big (\begin{array}{c}{n} \\ {k}\end{array}\Big )^{-1}\sum _{\ell =0}^{k-1}\sum _{x\in K_\ell }\sum _{S\in {\mathcal {S}}^{(\ell )}_x}\Big ((6\log k)^\ell \prod _{i=\ell +1}^{k-1}\log (1/\varepsilon _i)\Big )^{m}\Big ]^{\frac{1}{m(k-1)}}\nonumber \\&\le \Big [\Big (\begin{array}{c}{n} \\ {k}\end{array}\Big )^{-1}\sum _{\ell =0}^{k-1}(6\log k)^{\ell m}\sum _{x\in K_\ell }\sum _{S\in {\mathcal {S}}^{(\ell )}_x}\Big (\prod _{i=\ell +1}^{k-1}x_i\Big )^{m}\Big ]^{\frac{1}{m(k-1)}}\nonumber \\&\mathop {\le }\limits ^{(6)}\Big [\Big (\begin{array}{c}{n} \\ {k}\end{array}\Big )^{-1}\sum _{\ell =0}^{k-1}(6\log k)^{\ell m}\sum _{x\in K_\ell }\Big (\begin{array}{c}{n} \\ {\ell +1}\end{array}\Big ) \cdot 2^k\cdot \prod _{i=\ell +1}^{k-1}k\cdot n/2^{x_i}\prod _{i=\ell +1}^{k-1}x_i^m\Big ]^{\frac{1}{m(k-1)}}\nonumber \\&=\Big [\Big (\begin{array}{c}{n} \\ {k}\end{array}\Big )^{-1} \cdot 2^k\sum _{\ell =0}^{k-1}\Big (\begin{array}{c}{n} \\ {\ell +1}\end{array}\Big )(6\log k)^{\ell m} \cdot (kn)^{k-\ell -1}\sum _{x\in K_\ell }\prod _{i=\ell +1}^{k-1}x_i^m/2^{x_i}\Big ]^{\frac{1}{m(k-1)}}.\nonumber \\ \end{aligned}$$
(7)

Next, we focus on the expression \(\sum _{x\in K_\ell }\prod _{i=\ell +1}^{k-1}x_i^m/2^{x_i}\). Recall that \(K_\ell \) is a \(k-\ell -1\) tuple of elements in \(\{6\log k,\dots ,\log n\}\), so that each \(x_i\ge 6\log k\), and we may bound \(1/2^{x_i}\le 1/2^{3\log k}\cdot 1/2^{x_i/2}=1/k^3\cdot 1/2^{x_i/2}\). Now, rather than a summation of products, we will take a product of summations. That is, we write

$$\begin{aligned} \sum _{x\in K_\ell }\prod _{i=\ell +1}^{k-1}x_i^m/2^{x_i}&\le \frac{1}{k^3}\sum _{x\in K_\ell }\prod _{i=\ell +1}^{k-1}x_i^m/2^{x_i/2}\nonumber \\&= \prod _{i=\ell +1}^{k-1}\Big (\sum _{z=6\log k}^{\log n}z^m/2^{z/2}\Big ),\nonumber \\&= \Big (\sum _{z=6\log k}^{\log n}z^m/2^{z/2}\Big )^{k-\ell -1} \end{aligned}$$
(8)

where the first equation holds because for any choice of a vector \(x\in K_\ell \) with its corresponding product, we can associate a unique sequence of \(k-\ell -1\) choices of numbers \(z\). Next we bound the summation, with the variable change \(y=z-6\log k\),

$$\begin{aligned} \sum _{z=6\log k}^{\log n}z^m/2^{z/2}&\le \sum _{z=6\log k}^{\infty }z^m/2^{z/2}\\&= \sum _{y=0}^{\infty }(y+6\log k)^m/2^{(y+6\log k)/2}\\&\le \frac{1}{k^3}\sum _{y=0}^{\infty }\big (2y^m+2(6\log k)^m\big )/2^{y/2}\\&\le \frac{8(6\log k)^m}{k^3}+\frac{2}{k^3}\sum _{y=0}^{\infty }y^m/2^{y/2}, \end{aligned}$$

where the last inequality is using that \(\sum _{y\ge 0}2^{-y/2}\le 4\). We replace the sum by an integral and calculate

$$\begin{aligned} \sum _{y=0}^{\infty }y^m/2^{y/2}\le \sqrt{2}\int \limits _0^\infty y^m/2^{y/2}dy\le (16m)^m, \end{aligned}$$

which yields the following bound on (8)

$$\begin{aligned} \Big (\sum _{z=6\log k}^{\log n}z^m/2^{z/2}\Big )^{k-\ell -1}\le \Big (\frac{8(6\log k)^m}{k^3}+\frac{4(16m)^m}{k^3}\Big )^{k-\ell -1} \end{aligned}$$
(9)

Plugging (9) into (7) we get that

$$\begin{aligned}&\frac{\mathrm{dist}_q^{(k-1)}(g)}{C'}\\&\quad \le \Big [\Big (\begin{array}{c}{n} \\ {k}\end{array}\Big )^{-1} 2^k\sum _{\ell =0}^{k-1}\Big (\begin{array}{c}{n} \\ {\ell +1}\end{array}\Big )(6\log k)^{\ell m} \cdot (kn)^{k-\ell -1}\Big (\frac{8(6\log k)^m}{k^3}+\frac{4(16m)^m}{k^3}\Big )^{k-\ell -1}\Big ]^{\frac{1}{m(k-1)}}\\&\quad \le \Big [\Big (\begin{array}{c}{n} \\ {k}\end{array}\Big )^{-1} 2^k\sum _{\ell =0}^{k-1}\Big (\begin{array}{c}{n} \\ {\ell +1}\end{array}\Big )(n/k^2)^{k-\ell -1}\Big ((100\log k)^{m(k-1)}\cdot (100m)^{m(k-1)}\Big )\Big ]^{\frac{1}{m(k-1)}}\\&\quad \le \Big [k\cdot 2^k\Big ((100\log k)^{m(k-1)}\cdot (100m)^{m(k-1)}\Big )\Big ]^{\frac{1}{m(k-1)}}, \end{aligned}$$

where the last inequality is using that (for \(k\ge 2\) and) for every \(0\le \ell \le k-1\), \({n\atopwithdelims ()\ell +1} (n/k^2)^{k-\ell -1} \le {n\atopwithdelims ()k}\). Note that the above expression is at most \(O(m\log k) = O(\lceil q/(k-1)\rceil \cdot \log k)\) as required. \(\square \)