1 Introduction

Pairwise distances in an undirected, unweighted graph can be computed by performing a graph exploration, such as breadth-first search, from every vertex. This straightforward procedure determines the diameter of a given graph with n vertices and m edges in time O(nm). It is surprisingly difficult to improve upon this idea in general. In fact, Roditty and Vassilevska Williams [17] have shown that an algorithm that can distinguish between diameter 2 and 3 in an undirected sparse graph in subquadratic time would refute the Orthogonal Vectors conjecture.

However, for very sparse graphs, the running time becomes linear even for weighted graphs. For instance, the diameter of a star can be computed by finding the two largest edge weights. The diameter of a tree can be computed in linear time O(n) by a folklore result that traverses the graph twice. In other words, for graphs with vertex cover number 1 or treewidth 1, the running time is O(n).

The present paper investigates how these structural parameters influence the complexity of computing several graph distance measures. These measures are the eccentricity of every vertex (its maximum distance to any other vertex), the diameter and radius of the graph (the maximum and minimum eccentricities), and the Wiener index (the sum of the distances between all pairs of vertices); precise definitions are in Sect. 4.1. Throughout this paper we will write

$$\begin{aligned} B(n,d) = \left( {\begin{array}{c}d+\lceil \log n\rceil \\ d\end{array}}\right) \,. \end{aligned}$$

Theorem 1

The eccentricities, diameter, radius, and Wiener index of a given undirected n-vertex graph G with nonnegative integer weights can be computed in time

  1. 1.

    \(O(n\cdot B(n,k)\cdot 2^k )\) with \(k={\text {vc}}(G)\), where \({\text {vc}}(G)\) is the vertex cover number of G,

  2. 2.

    \(O(n\cdot B(n,k)\cdot 2^k \log n)\) with \(k= 5{\text {tw}}(G)+4\), where \({\text {tw}}(G)\) is the treewidth of G.

For every \(\epsilon >0\), the bounds in both cases are

$$\begin{aligned} n^{1+\epsilon }\exp O(k)\,. \end{aligned}$$

Since \({\text {tw}}(G)\le {\text {vc}}(G)\), the treewidth result is in some sense stronger. However, the vertex cover result is slightly faster, already contains the core algorithmic idea, and avoids many distracting technicalities.

Theorem 1 improves the dependency on the treewidth over the running time

$$\begin{aligned} n^{1+\epsilon } \exp O\bigl ({\text {tw}}(G)\log {\text {tw}}(G)\bigr ) \end{aligned}$$

of Abboud, Vassilevska Williams, and Wang [1]. Previously, Cabello and Knauer [7] had shown that for constant treewidth \(k\ge 3\), the diameter (and other distance parameters) can be computed in time \(O(n\log ^{k-1} n)\), where the Landau symbol absorbs the exponential dependency on k as well as the time required for computing a tree decomposition. The bound in Theorem 1 is tight in the following sense. Abboud et al. [1] also showed that under the Strong Exponential Time Hypothesis of Impagliazzo, Paturi, and Zane [13], there can be no algorithm that computes the diameter with running time

$$\begin{aligned} n^{2-\delta }\exp {o(k)}\qquad \text {for any }\delta >0\,, \end{aligned}$$
(1)

for \(k={\text {vc}}(G)\) and (therefore also) \(k={\text {tw}}(G)\). In fact, this holds under the potentially weaker Orthogonal Vectors conjecture, see [20] for an introduction to these arguments. Thus, under this assumption, the dependency on k in Theorem 1 cannot be significantly improved, even if the dependency on n is relaxed from just above linear to just below quadratic. This closes an open question raised in [1].

Our analysis encompasses the Wiener index, an important structural graph parameter left unexplored by [1].

Perhaps surprisingly, the main insight needed to establish Theorem 1 has nothing to do with graph distances or treewidth. Instead, we make—or re-discover—the following observation about the running time of d-dimensional range trees:

Lemma 2

([16]) A d-dimensional range tree over n points supporting orthogonal range queries for the aggregate value over a commutative monoid has query time \(O(2^d B(n,d))\) and can be built in time \(O(nd^2 B(n,d))\).

This is a more careful statement than the standard textbook analysis, which gives the query time as \(O(\log ^d n)\) and the construction time as \(O(n\log ^d n)\). For many values of d, the asymptotic complexities of these bounds agree—in particular, this is true for constant d and for very large d, which are the main regimes of interest in computational geometry. But crucially, B(nd) is always \(n^{\epsilon } \exp {O(d)}\) for any \(\epsilon > 0\), while \(\log ^d n\) is not.

Using known reductions, this implies that the following multivariate lower bound on orthogonal range searching is tight:

Theorem 3

(Implicit in [1]) A data structure for the orthogonal range query problem in d dimensions for the monoid \(({\mathbf {Z}},\max )\) with construction time \(n\cdot q'(n,d)\) and query time \(q'(n,d)\), where

$$\begin{aligned} q'(n,d) = n^{1-\epsilon } \exp o(d) \end{aligned}$$

for some \(\epsilon > 0\), refutes the Strong Exponential Time hypothesis.

We observe in the appendix that for unweighted graphs, the vertex cover result can be improved without using the techniques advertised in the present paper.

Theorem 4

The eccentricities, diameter and radius of a given undirected, unweighted n-vertex graph G with vertex cover number k can be computed in time \(O(nk +2^kk^2)\). The Wiener index can be computed in time \(O(nk2^k)\).

Both of these bounds are \(n\exp O(k)\), matching (1). We do not know of a similar simplification for treewidth; the bound in Theorem 1.2, and the full construction behind it, seem to be the best we can do even for unit lengths.

1.1 Related Work

Abboud et al. [1] show that given a graph and a tree decomposition of width k, various graph distances can be computed in time \(O( k^2n\log ^{k-1} n )\). This bound is \(n^{1+\epsilon } \exp O(k\log k)\) for any \(\epsilon >0\). It is known how to compute an approximate tree decomposition with \(k=O({\text {tw}}(G))\) from the input graph G in time \(n\exp O({\text {tw}}(G))\) [6], so from a given graph (without a tree decomposition) the algorithm from [1] works in time \(n^{1+\epsilon } \exp O({\text {tw}}(G)\log {\text {tw}}(G))\), extending the construction of Cabello and Knauer [7] to superconstant treewidth. According to [7], the idea of expressing graph distances as coordinates was first mentioned by Shi [18].

If the diameter in the input graph is constant, the diameter can be computed in time \(n\exp O({\text {tw}}(G))\) [12]. This is tight in both parameters in the sense that [1] rules out the running time (1) even for distinguishing diameter 2 from 3, and every algorithm needs to inspect \(\Omega (n)\) vertices even for treewidth 1. For non-constant diameter \(\Delta\), the bound from [12] deteriorates as \(n\exp O({\text {tw}}(G)\log \Delta )\). However, the construction cannot be used to compute the Wiener index.

The literature on algorithms for graph distance parameters such as diameter or Wiener index is very rich, and we refer to the introduction of [1] for an overview of results directly relating to the present work. A recent paper by Bentert and Nichterlein [2] gives a comprehensive overview of many other parameterisations.

Orthogonal range searching using a multidimensional range tree was first described by Bentley [3], Lueker [15], Willard [19], and Lee and Wong [14], who showed that this data structure supports query time \(O(\log ^d n)\) and construction time \(O(n \log ^{d-1}n)\). Several papers have improved this in various ways by factors logarithmic in n; for instance, Chazelle’s construction [9] achieves query time \(O(\log ^{d-1} n)\). In general, queries that report the points Q within a given range, instead of (like in the present paper) computing sums or maxima, incur an additional O(|Q|) term in the query time.

1.2 Discussion

In hindsight, the present result is a somewhat undramatic resolution of an open problem that has been viewed as potentially fruitful by many people [1], including the second author of this paper [12]. In particular, the resolution has led neither to an exciting new technique for showing conditional lower bounds of the form \(n^{2-\epsilon } \exp {\omega (k)}\), nor a clever new algorithm for graph diameter. Instead, our solution follows the ideas of Cabello and Knauer [7] for constant treewidth, much like in [1]. All that was needed was a better understanding of the asymptotics of bivariate functions, rediscovering a 40-year old analysis of spatial data structures [16] (see the discussion in Sect. 3.3), and using a recent algorithm for approximate tree decompositions [6].

Of course, we can derive some satisfaction from the presentation of asymptotically tight bounds for fundamental graph parameters under a well-studied parameterization. In particular, the surprisingly elegant reductions in [1] cannot be improved. However, as we show in the appendix, when we parameterize by vertex cover number instead of treewidth, we can establish even cleaner and tight bounds without much effort.

Instead, the conceptual value of the present work may be in applying the multivariate perspective on high-dimensional computational geometry, reviving an overlooked analysis for non-constant dimension. To see the difference in perspective, Chazelle’s improvement [9] of d-dimensional range queries from \(\log ^d n\) to \(\log ^{d-1} n\) makes a lot of sense for small d, but from the multivariate point of view, both bounds are \(n^\epsilon \exp \Omega (d\log d)\). The range of relationships between d and n where the multivariate perspective on range trees gives some new insight is when d is asymptotically just shy of \(\log n\), see Sect. 2.1.

Table 1 summaries the known bounds for computing the diameter. It remains open to find an algorithm for diameter with running time \(n\exp O({\text {tw}}(G))\) even for unweighted graphs, or an argument that such an algorithm is unlikely to exist under standard hypotheses. This requires better understanding of the regime \(d=o(\log n)\).

Table 1 Bounds on algorithms for computing the diameter of a graph

2 Preliminaries

2.1 Asymptotics

We summarise the asymptotic relationships between various functions appearing in the present paper:

Lemma 5

$$\begin{aligned} B(n,d) = O(\log ^d n)\,. \end{aligned}$$
(2)

For any \(\epsilon > 0\),

$$\begin{aligned} B(n,d)= & {} n^\epsilon \exp O(d)\,, \end{aligned}$$
(3)
$$\begin{aligned} \log ^d n= & {} n^\epsilon \exp \Omega (d\log d)\,, \end{aligned}$$
(4)
$$\begin{aligned} \log ^d n= & {} n^\epsilon \exp O(d\log d)\,. \end{aligned}$$
(5)

The first expression shows that B(nd) is always at least as good a bound as \(O(\log ^d n)\). The next two expressions show that from the perspective of parameterised complexity, the two bounds differ asymptotically: B(nd) depends single-exponentially on d (no matter how small \(\epsilon >0\) is chosen), while \(\log ^d n\) does not (no matter how large\(\epsilon\) is chosen). Our proof in fact establishes the stronger bound \(B(n,d) \le n^\epsilon + \exp O(d)\). Expression (5) just shows that (4) is maximally pessimistic.

Proof

Write \(h=\lceil \log n\rceil\). To see (2), consider first the case where \(d<h\). Using \(\left( {\begin{array}{c}a\\ b\end{array}}\right) \le a^b/b!\) we see that

$$\begin{aligned} \left( {\begin{array}{c}d+h\\ d\end{array}}\right) \le \left( {\begin{array}{c}2h\\ d\end{array}}\right) \le \frac{(2h)^d}{d!} = \frac{2^d}{d!}h^d=O( \log ^d n)\,. \end{aligned}$$
(6)

Next, if \(d\ge h\) then

$$\begin{aligned} \left( {\begin{array}{c}d+h\\ d\end{array}}\right) =\left( {\begin{array}{c}d+h\\ h\end{array}}\right) \le \left( {\begin{array}{c}2d\\ h\end{array}}\right) = \frac{2^h}{h!} d^h \le d^h\,, \end{aligned}$$

provided \(h\ge 4\). It remains to observe that \(d^h\le h^d=O(\log ^d n)\). Indeed, since the function \(\alpha \mapsto \alpha /\ln \alpha\) is increasing for \(\alpha \ge \mathrm e\), we have \(h/\ln h \le d/\ln d\), which implies \(\exp (h\ln d)\le \exp (d\ln h)\) as needed.

For (3), we let \(\delta = d/h\) and consider two cases: \(\delta = o(1)\) or not. First, from Stirling’s formula we know \(\left( {\begin{array}{c}a\\ b\end{array}}\right) \le \big ( \frac{\mathrm {e}a}{b} \big )^b\), so

$$\begin{aligned} \left( {\begin{array}{c}d + h\\ d\end{array}}\right) = \left( {\begin{array}{c}(1+\delta )h\\ \delta h\end{array}}\right) \le \Big (\frac{\mathrm {e} (1+\delta ) h}{\delta h}\Big )^{\delta h} \le \Big (\frac{\mathrm {e}(1+\delta )}{\delta }\Big )^{2\delta \log n} = n^{2\delta \log (\mathrm {e}(1 + \delta )\delta ^{-1})}\,. \end{aligned}$$

Using that \(\delta \mapsto 2\delta \log (\mathrm {e}(1 + \delta )\delta ^{-1})\) is positive in the interval \(\big (0, \frac{1}{2} \big ]\) and tends to 0 for \(\delta \rightarrow 0\), we obtain \(\left( {\begin{array}{c}d + h\\ d\end{array}}\right) \le n^\epsilon\) for any sufficiently small \(\delta\).

It remains to consider the case that \(\delta \ge c\) for some positive constant c depending only on \(\epsilon\). In this case, we have

$$\begin{aligned} \left( {\begin{array}{c}d+h\\ d\end{array}}\right) \le \left( {\begin{array}{c}(1+1/c)d\\ d\end{array}}\right) < 2^{(1+1/c)d} = \exp O(d)\,. \end{aligned}$$

We turn to (4). Let \(\epsilon >0\) and consider any function g such that for all \(n\ge 1\),

$$\begin{aligned} \log ^d n\le n^\epsilon g(d)\,. \end{aligned}$$

Then \(\log g(d) \ge d\log \log n - \epsilon \log n\). In particular, for \(n=2^d\), we have \(\log g(d) \ge d\log d - \epsilon d = \Omega (d\log d)\), so \(g(d)=\exp \Omega (d\log d)\).

Finally for (5), we repeat the argument from [1]. If \(d\le \epsilon \log n/\log \log n\) then \(\log ^d n = 2^{d\log \log n} \le n^\epsilon \,.\) In particular, if \(d=o(\log n/\log \log n)\) then \(\log ^d n = n^{o(1)}\). Moreover, for \(d\ge \log ^{1/2} n\) we have \(\log \log n \le 2 \log d\) and thus \(\log ^d n = 2^{d \log \log n} \le 4^{d \log d}.\)\(\square\)

These calculations also show the regimes in which these considerations are at all interesting. For \(d=o(\log n/\log \log n)\) both functions are bounded by \(n^{o(1)}\), and the multivariate perspective gives no insight. For \(d\ge \log n\), both bounds exceed n, and we are better off running n BFSs for computing diameters, or passing through the entire point set for range searching.

2.2 Model of Computation

We operate in the word RAM, assuming constant-time arithmetic operations on coordinates and edge lengths, as well as constant-time operations in the monoid supported by our range queries. For ease of presentation, edge lengths are assumed to be nonnegative integers; we could work with nonnegative weights instead [7].

3 Orthogonal Range Queries

3.1 Preliminaries

Let P be a set of d-dimensional points. We will view every point \(p\in P\) as a vector \(p=(p_1,\ldots , p_d)\).

A commutative monoid is a set M with an associative and commutative binary operator \(\oplus\) with identity. The reader is invited to think of M as the integers with \(-\infty\) as identity and \(a\oplus b = \max \{a,b\}\).

Let \(f:P\rightarrow M\) be a function and define for each subset \(Q\subseteq P\)

$$\begin{aligned} f(Q) = \bigoplus \{\, f(q):q\in Q\}\, \end{aligned}$$

with the understanding that \(f(\emptyset )\) is the identity in M. See Fig. 1 for a small example.

Fig. 1
figure 1

Four points in three dimensions. With the monoid \(({\mathbf {Z}},\max )\) we have \(f(\{p,r,s\}) = 8\)

3.2 Range Trees

Consider dimension \(i\in \{1,\ldots ,d\}\) and enumerate the points in Q as \(q^{(1)},\ldots ,q^{(r)}\) such that \(q^{(j)}_i \le q^{(j+1)}_i\), for instance by ordering after the ith coordinate and breaking ties lexicographically. Define \({\text {med}}_i (Q)\) to be the median point \(q^{(\lceil r/2\rceil )}\), and similarly \(\min _i(Q) = q^{(1)}\) and \(\max _i(Q)= q^{(r)}\). Set

$$Q_L = \left\{q^{(1)},\ldots , q^{(\lceil r/2\rceil )}\right\},\qquad Q_R = \left\{q^{(1+\lceil r/2\rceil )},\ldots , q^{(r)}\right\}.$$
(7)

For \(i\in \{1,\ldots , d\}\), the range tree\(R_i(Q)\) for Q is a node x with the following associated values:

  • L[x], a reference to range tree \(T_i(Q_L)\), called the left child of x. Only exists if \(|Q|>1\).

  • R[x], a reference to range tree \(T_i(Q_R)\), called the right child of x. Only exists if \(|Q|>1\).

  • D[x], a reference to range tree \(T_{i+1}(Q)\), called the secondary, associate, or higher-dimensional structure. Only exists for \(i<d\).

  • \(l[x]=\min _i(Q)\).

  • \(r[x]=\max _i(Q)\).

  • \(f[x]= f(Q)\). Only exists for \(i=d\).

Construction

Constructing a range tree for Q is a straightforward recursive procedure:


Algorithm C (Construction). Given integer\(i\in \{1,\ldots , d\}\)and a listQof points, this algorithm constructs the range tree\(R_i(Q)\)with rootx.

C1:

[Base case \(Q = \{q\}\).] Recursively construct \(D[x]=T_{i+1}(Q)\) if \(i<d\), otherwise set \(f[x]=f(q)\). Set \(l[x] = r[x] = q_i\). Return x.

C2:

[Find median.] Determine \(q={\text {med}}_i Q\), \(l[x]=\min _i(Q)\), \(r[x]=\max _i(Q)\).

C3:

[Split Q.] Let \(Q_L\) and \(Q_R\) as given by (7), note that both are nonempty.

C4:

[Recurse.] Recursively construct \(L[x]=R_i(Q_L)\) from \(Q_L\). Recursively construct \(R[x]=R_i(Q_R)\) from \(Q_R\). If \(i<d\) then recursively construct \(D[x]=T_{i+1}(Q)\). If \(i=d\) then set \(f[x]= f[L[x]]\oplus f[R[x]]\).

The data structure can be viewed as a collection of binary trees whose nodes x represent various subsets \(P_x\) of the original point set P. In the interest of analysis, we now introduce a scheme for naming the individual nodes x, and thereby also the subsets \(P_x\). Each node x is identified by a string of letters from \(\{\mathrm L, \mathrm R,\mathrm D\}\) as follows. Associate with x a set of points, often called the canonical subset of x, as follows. For the empty string \(\epsilon\) we set \(P_\epsilon = P\). In general, if \(Q=P_x\) then \(P_{x\mathrm L} = Q_L\), \(P_{x\mathrm R} = Q_R\) and \(P_{x\mathrm D} = Q\). The strings over \(\{\mathrm L, \mathrm R,\mathrm D\}\) can be understood as uniquely describing a path through in the data structure; for instance, L means ‘go left, i.e., to the left subtree, the one stored at L[x]’ and D means ‘go to the next dimension, i.e., to the subtree stored at D[x].’ The name of a node now describes the unique path that reaches it. Figure 2 shows (part of) the range tree for the points in Fig. 1.

Fig. 2
figure 2

Part of the range tree for the points from Fig. 1. The label of node x appears in red on the arrow pointing to x. Nodes contain \(l[x]\!\!:\!\!r[x]\). The references L[x] and R[x] appear as children in a binary tree using usual drawing conventions. The reference D[x] appears as a dashed arrow (possibly interrupted); the placement on the page follows no other logic than economy of layout and readability. References D[x] from leaf nodes, such as \(D[\mathrm L\mathrm L]\) leading to node LLD, are not shown; this conceals 12 single-node trees. The ‘3rd-dimensional nodes,’ whose names contain two Ds, show the values f[x] next to the node. To ease comprehension, leaf nodes are decorated with their canonical subset, which is a singleton from \(\{p,q,r,s\}\). The reader can infer the canonical subset for an internal node as the union of leaves of the subtree; for instance, \(P_{\mathrm D\mathrm R} = \{r,s\}\). However, note that these point sets are not explicitly stored in the data structure

Lemma 6

Let \(n=|P|\). Algorithm C computes the d-dimensional range tree for P in time linear in \(nd^2 B(n,d)\).

Proof

We run Algorithm C on input P and \(i=1\).

Disregarding the recursive calls, the running time of algorithm C on input i and Q is dominated by Steps C2 and C3, i.e., splitting Q into two sets of equal size. It is known that this task can be performed using O(|Q|) many comparisons [5]. Each (lexicographic) comparison can take d steps. Thus, the running time for constructing \(R_i(Q)\) is linear in d|Q| plus the time spent in recursive calls.

This means that we can bound the running time for constructing \(T_1(P)\) by bounding the sizes of the sets \(P_x\) associated with every node x in the data structure. If X denotes the set of nodes in the data structure, then we want to bound

$$\begin{aligned} \sum _{x\in X} |P_x| = \sum _{x\in X} |\{\, p\in P:p\in P_x\,\}|= \sum _{p\in P} |\{\, x\in X:p\in P_x\,\}| \,. \end{aligned}$$

Thus, we need to determine, for given \(p\in P\), the number of subsets \(P_x\) in which p appears. By construction, there are fewer than d occurrences of D in x. Set \(h= \lceil \log n\rceil\). Every L or R corresponds to cutting the current points set in half, so if x contains more than h occurrences that are either L or R then \(P_x\) is empty. Thus, x has at most \(h + d\) letters. For two different strings x and \(x'\) that agree on the positions of D, the sets \(P_x\) and \(P_{x'}\) are disjoint, so p appears in at most one of them. We conclude that the number of sets \(P_x\) such that \(p\in P_x\) is bounded by the number of ways to arrange fewer than d many Ds and at most h non-Ds. Using the identity \(\left( {\begin{array}{c}a+0\\ 0\end{array}}\right) + \cdots +\left( {\begin{array}{c}a+b\\ b\end{array}}\right) = \left( {\begin{array}{c}a+b+1\\ b\end{array}}\right)\) repeatedly, we compute this number as

$$\begin{aligned}&\sum _{i=0}^{d-1}\sum _{j=0}^h \left( {\begin{array}{c}i+j\\ j\end{array}}\right) = \sum _{i=0}^{d-1} \left( {\begin{array}{c}i+h+1\\ h\end{array}}\right) = \sum _{i=0}^{d-1} \left( {\begin{array}{c}i+h+1\\ i+1\end{array}}\right) \\&\quad =(-1) + \sum _{i=0}^{d} \left( {\begin{array}{c}i+h\\ i\end{array}}\right) = \left( {\begin{array}{c}h+d+1\\ d\end{array}}\right) -1 = \frac{h+d+1}{h+1}\left( {\begin{array}{c}h+d\\ d\end{array}}\right) -1 \le d\left( {\begin{array}{c}d+h\\ d\end{array}}\right) \,. \end{aligned}$$

The bound follows from aggregating this contribution over all \(p\in P\). In summary, the running time becomes

$$\begin{aligned} \sum _{x\in X} d|P_x| \le d\sum _{p\in P} d\left( {\begin{array}{c}d+h\\ d\end{array}}\right) \le nd^2B(n,d)\,. \end{aligned}$$

\(\square\)

The running time in the above lemma can be improved with some effort to \(O(d^2 n\log n + dnB(n,d))\), but this would not affect our overall results.

Search.

In this section, we fix two sequences of integers \(l_1,\ldots , l_d\) and \(r_1,\ldots , r_d\) describing the query boxB given by

$$\begin{aligned} B = [l_1,r_1]\times \cdots \times [l_d,r_d]\,. \end{aligned}$$

Algorithm Q(Query). Given integer\(i\in \{1,\ldots ,d\}\), a query boxBas above and a range tree\(R_i(Q)\)with rootxfor a set of pointsQsuch that every point\(q\in Q\)satisfies\(l_j\le q_j\le r_j\)for\(j\in \{1,\ldots , i-1\}\), this algorithm returns\(\bigoplus \{\, f(q):q\in Q\cap B\,\}\).

Q1:

[Empty?] If the data structure is empty, or \(l_i> r[x]\), or \(l[x]>r_i\), then return the identity in the underlying monoid M.

Q2:

[Done?] If \(i=d\) and \(l_d\le \min _d[x]\) and \(\max _d[x]\le r_d\) then return f[x].

Q3:

[Next dimension?] If \(i<d\) and \(l_i\le l[x]\) and \(r[x]\le r_i\) then query the range tree at D[x] for dimension \(i+1\). Return the resulting value.

Q4:

[Split.] Query the range tree L[x] for dimension i; the result is a value \(f_L\). Query the range tree R[x] for dimension i; the result is a value \(f_R\). Return \(f_L\oplus f_R\). \(\square\)

To prove correctness, we show that this algorithm is correct for each point set \(Q=P_x\).

Lemma 7

Let \(i=D(x)+1\), where D(x) is the number of Ds in x. Assume that \(P_x\) is such that \(l_j\le p_i\le r_j\) for all \(j\in \{1,\ldots , i-1\}\) for each \(p\in P_x\). Then the query algorithm on input x and i returns \(f(B\cap P_x)\).

Proof

We use backwards induction in |x|.

If \(|x| = h+d\) then \(P_x\) is the empty set, in which case the algorithm correctly returns the identity in M.

If the algorithm executes Step Q2 then B is satisfied for all \(q \in P_x\), in which case the algorithm correctly returns \(f[x] = f(P_x)\).

If the algorithm executes Step Q3 then B satisfies the condition in the lemma for \(i+1\), and the number of Ds in \(P_{x\mathrm D}\) is \(i+1\), and D[x] store the \((i+1)\)th range tree for \(P_{x\mathrm D}\). Thus, by induction the algorithm returns \(f(P_{x\mathrm D}\cap B)\), which equals \(f(P_x\cap B)\) because \(P_{x\mathrm D}= P_x\).

Otherwise, by induction, \(f_L= f(P_{x\mathrm L}\cap B)\) and \(f_R=f(P_{x\mathrm R}\cap B)\). Since \(P_{x\mathrm L} \cup P_{x\mathrm R} = P_x\), we have \(f(P_x\cap B) = f((P_{x\mathrm L} \cap B) \cup (P_{x\mathrm R} \cap P)) = f_L\oplus f_R\). \(\square\)

Lemma 8

If x is the root of the range tree for P, then on input \(i=1\), x, and B, the query algorithm returns \(f(P\cap B)\) in time linear in \(2^dB(n,d)\).

Proof

Correctness follows from the previous lemma.

For the running time, we first observe that the query algorithm does constant work in each visited node. Thus it suffices to bound the number of visited nodes as

$$\begin{aligned} 2^d\left( {\begin{array}{c}h+d\\ d\end{array}}\right) \qquad (d\ge 1,h\ge 0)\,. \end{aligned}$$
(8)

We will show by induction in d that (8) is the correct bound for every call to a d-dimensional range tree for a point set \(P_x\), where \(h=\lceil \log |P_x|\rceil\). In the base case, for \(d=1\), we need to show that the number of visited nodes is at most \(2^1\left( {\begin{array}{c}h+1\\ 1\end{array}}\right) = 2(h+1)\) for any height h. But this is just the standard observation that interval searching amounts to traversing two root–leaf paths, each of which contains at most h non-root nodes, and that the height of balanced binary search tree on n values is at most \(1 + \lceil \log n\rceil\).

We now carefully consider the case for \(d\ge 2\). The two easy cases are Q1 and Q2, which incur no additional nodes to be visited, so the number of visited nodes is 1, which is bounded by (8). Step Q3 leads to a recursive call for a \((d-1)\)-dimensional range tree over the same point set \(P_{x\mathrm D}=P_x\), and we verify for \(h\ge 1\):

$$\begin{aligned}&1+2^{d-1}\left( {\begin{array}{c}h+d-1\\ d-1\end{array}}\right) \le 2^{d-1}\left( {\begin{array}{c}h+d-1\\ d\end{array}}\right) + 2^{d-1} \left( {\begin{array}{c}h+d-1\\ d-1\end{array}}\right) \\&\quad = 2^{d-1} \left( {\begin{array}{c}h+d\\ d\end{array}}\right) < 2^d\left( {\begin{array}{c}h+d\\ d\end{array}}\right) \,. \end{aligned}$$

The case \(h=0\) (i.e., \(P_x\) is a singleton) is immediate. The interesting case is Step Q4. We need to follow two paths from x to the leaves of the binary tree of x. Consider the leaves l and r in the subtree rooted at x associated with the points \(\min _i(P_x)\) and \(\max _i(P_x)\) as defined in Sect. 3.2. We describe the situation of the path Y from l to x; the other case is symmetrical. At each internal node \(y\in Y\), the algorithm chooses Step Q4 (because \(l_i\ge l[y]\)). There are two cases for what happens at \(y\mathrm L\) and \(y\mathrm R\). If \(l_i\le {\text {med}}_i(P_y)\) then \(P_{y\mathrm R}\) satisfies \(l_i\le \min _i(P_{y\mathrm R}) \le r_i\), so the call to \(y\mathrm R\) will choose Step Q3. By induction, this incurs \(2^{d-1}\left( {\begin{array}{c}d-1+i\\ d-1\end{array}}\right)\) visits, where i is the height of y. In the other case, the call to \(y\mathrm L\) will choose Step Q1, which incurs no extra visits. Thus, the number of nodes visited on the left path is at most

$$\begin{aligned}&h + \sum _{i=0}^{h-1} 2^{d-1} \left( {\begin{array}{c}d-1+i\\ d-1\end{array}}\right) \le 2^{d-1}\sum _{i=0}^h \left( {\begin{array}{c}d-1+i\\ d-1\end{array}}\right) = 2^{d-1}\sum _{i=0}^h \left( {\begin{array}{c}d-1+i\\ i\end{array}}\right) \\&\quad = 2^{d-1} \left( {\begin{array}{c}d+h\\ h\end{array}}\right) \,, \end{aligned}$$

where the inequality uses the bound \(h\le 2^{d-1}\left( {\begin{array}{c}d-1+h\\ d-1\end{array}}\right)\), which is immediate for \(d\ge 2\) and \(h\ge 0\). The total number of nodes visited is at most twice that value, and therefore bounded by (8). \(\square\)

3.3 Discussion

The textbook analysis of range trees, and similar d-dimensional spatial algorithms and data structures sets up a recurrence relation like

$$\begin{aligned} r(n,d) = 2r(n/2,d) + r(n,d-1)\,, \end{aligned}$$

for the construction and

$$\begin{aligned} r(n,d) = \max \{\,r(n/2,d) , r(n,d-1)\,\}\,, \end{aligned}$$

for the query time. One then observes that \(n\log ^d n\) and \(\log ^d n\) are the solutions to these recurrences. This analysis goes back to Bentley’s original paper [3].

Along the lines of the previous section, one can show that the functions \(n\cdot B(n,d)\) and B(nd) solve these recurrences as well. A detailed derivation can be found in [16], which also contains combinatorial arguments of how to interpret the binomial coefficients in the context of spatial data structures. A later paper of Chan [8] also takes the recurrences as a starting point, and observes asymptotically improved solution for the related question of dominance queries.

4 Graph Distances

4.1 Preliminaries

We consider an undirected graph G whose edges have nonnegative integer weights. The set of vertices is V(G) and has size n. For a vertex subset U we write G[U] for the induced subgraph. The neighbourhood N(v) of v are the vertices that share an edge with v.

A path from u to v is called a uv-path and denoted P. The length of a path, denoted l(P), is the sum of its edge lengths.

The distance from vertex u to vertex v, denoted d(uv), is the length of a shortest uv-path, i.e., the minumum of l(P) over all uv-paths P. The Wiener index of G, denoted \({\text {wien}}(G)\) is \(\sum _{u,v\in V(G)} d(u,v)\). The eccentricity of a vertex u, denoted e(u) is given by \(e(u) = \max \{\, d(u,v):v\in V(G)\,\}\). The diameter of G, denoted \({\text {diam}}(G)\) is \(\max \{\, e(u):u\in V(G)\,\}\). The radius of G, denoted \({\text {rad}}(G)\) is \(\min \{\, e(u):u\in V(G)\,\}\).

4.2 Separated Eccentricities

We follow the construction of [7].

Given a graph G, let \({\mathscr {S}}_{x,w}\) denote the set of shortest xw-paths. We refine the notion of eccentricity to a subset W of vertices. Formally,

$$\begin{aligned} e(u,W) = \max _{w\in W} \{\, l(P) :P\in {\mathscr {S}}_{u, w}\,\}\,. \end{aligned}$$

In particular, \(e(u)= e(u,V(G))\) and \(e(u)=\max \{ e(u,X), e(u,Y)\}\) if \(X\cup Y = V(G)\).

A vertex subset ZseparatesX and Y if every xy-path with \(x\in X\) and \(y\in Y\) and \(x\ne y\) contains a vertex from Z.

Enumerate \(Z= \{z_1,\ldots , z_k\}\). For \(i\in \{1,\ldots , k\}\) define the ith eccentricity \(e_i(x,Y)\) as the maximum distance from x to any vertex in Y ‘via \(z_i\).’ Formally,

$$\begin{aligned} e_i(x,Y) =&\max _{y\in Y} \{ \, l(P):P\in {\mathscr {S}}_{x,y}, z_i\in V(P) \,\}\,. \end{aligned}$$

See Fig. 3 for a small example.

Fig. 3
figure 3

Left: Example with \(Z=\{z_1,z_2,z_3\}\) and \(Y=Z\cup \{y,y',y''\}\). We have \(e(x,Y)=5\) (along \(xz_1y\)) and \(e(x',Y)=3\). For the case \(i=3\) we see \(e_3(x,Y) = 4\) along \(xz_3y''\), because there are no shortest paths from x via \(z_3\) to y or \(y'\), and the one-edge path \(xz_3\) itself is shorter. Similarly, \(e_3(x', Y) = 3\) (along \(x'z_3y'\)). Middle: The corresponding points in \({\mathbf {Z}}^2\), only the first two coordinates are shown, and only for the points in \(Y\setminus Z\). The points corresponding to \(y'\) and \(y''\) both belong to the rectangle for \(x'\), certifying that there are shortest \(x',y'\)- and \(x',y''\)-paths through \(z_3\). Right: Over \(R_{x'}\), the point \(p_{y'}\) maximises f. We have \(e_3(x',Y)= l(x'z_3y') = d(x',z_3) + f(p_{y'})=1+2=3\)

Lemma 9

If Z separates X and Y then \(e(x,Y)= \max _{i=1}^k e_i(x,Y)\) for \(x\in X\).

Proof

A shortest xy-path with \(y\in Y\) must contain a vertex from Z, say \(z_i\). Thus, \(e(x,Y)\le e_i(x,Y)\). Conversely, \(e(x,Y)\ge e_j(x,Y)\) for all \(j\in \{1,\ldots ,k\}\) from the definition. \(\square\)

Now we can write the eccentricity via\(z_i\) as the distance to\(z_i\) plus a range query:

Lemma 10

Let \(i\in \{1,\ldots ,k\}\) and assume \(\{z_1,\ldots , z_k\}\) separates X and Y. We will define a set of points \(\{\,p_y:y\in Y\,\}\) and a function f on this set as follows. Define for each \(y\in Y\) the k-dimensional point

$$\begin{aligned} p_y = \begin{pmatrix} d(z_i, y)-d(z_1, y)\\ \vdots \\ d(z_i,y)-d(z_k, y) \end{pmatrix}\quad \text {with } f(p_y)= d(z_i,y)\,. \end{aligned}$$
(9)

Define for each \(x\in X\) the rectangle

$$\begin{aligned} R_x=I_1\times \cdots \times I_k\,,\quad \text {where }I_j =[-\infty , d(x,z_j)-d(x,z_i)] \,. \end{aligned}$$
(10)

Then

$$\begin{aligned} e_i(x,Y) = d(x,z_i) + \max _{y:p_y\in R_x} f(p_y)\,. \end{aligned}$$

Proof

Consider a shortest xy-path P containing \(z_i\in Z\). No other xy-path is shorter than P, so in particular we have

$$\begin{aligned} d(x,z_i) + d(z_i,y)&\le d(x,z_j) + d(z_j,y)\,,\qquad \text {for all }j\in \{1,\ldots ,k\}\,, \end{aligned}$$

equivalently,

$$\begin{aligned} d(z_i,y) - d(z_j,y)&\le d(x,z_j)-d(x,z_i)\,,\qquad \text {for all }j\in \{1,\ldots , k\}\,. \end{aligned}$$
(11)

which means \(p_y\in R_x\). Moreover, if y is chosen so that P attains the eccentricity \(e_i(x,Y)\) then \(e_i(x,Y) = l(P) = d(x,z_i) + d(z_i,y)\) and \(p_y\) maximises \(f(p_y)=d(z_i,y)\) over the points in \(R_x\). \(\square\)

We note that the ith coordinate of \(p_y\) is always 0 and of \(R_y\) is always \([-\infty , 0]\), so the reduction is actually to a \((k-1)\)-dimensional range query instance. However, we are mainly interested in the asymptotic dependency on k, so we avoid the possible (but tedious) improvement that arises from this observation.

We have arrived at the following algorithm:


Algorithm S (Separated Eccentricities). Given an undirected, connected graphGwith nonnegative integer weights and vertex subsetsXandYsuch that\(V(G)=X\cup Y\)and a separatorZof sizek, this algorithm computes the eccentricitye(xY) of every vertex\(x\in X\cup Z\).

S1:

[Distances from separator.] Compute d(zv) for each \(z\in Z,v\in V(G)\) using k applications of Dijkstra’s algorithm. Compute \(e(z,Y)= \max _{y\in Y} d(z,y)\) for each \(z\in Z\).

S2:

[Build range trees.] For each \(i\in \{1,\ldots , k\}\), construct a k-dimensional range tree for the points \(\{\, p_y:y\in Y\,\}\) given by (9) using the monoid \(({\mathbf {Z}},\max )\).

S3:

[Query range trees.] For each \(x\in X\) and for each \(i\in \{1,\ldots ,k\}\) query the ith range tree for the rectangle \(R_x\) given by (10) and add \(d(x,z_i)\). The result is \(e_i(x,Y)\) by Lemma 10. Set \(e(x,Y)= \max _{i=1}^k e_i(x,Y)\).

Algorithm S is correct by the observation in Step S3 (based on Lemma 10) and Lemma 9.

Lemma 11

Algorithm S runs in time \(O\bigl (k m\log n + n 2^k B(n,k) \bigr )\).

Proof

The first term accounts for Step S1. Using Lemma 2, we see that Steps S2 and S3 take time \(O(|Y|k^2\cdot B(|Y|,k))\) and \(O(|X|2^k \cdot B(|Y|,k))\) for each \(i\in \{1,\ldots ,k\}\). Since \(k^2=O(2^k)\) and both |Y| and |X| are at most n, both expressions are asymptotically dominated by the second term. \(\square\)

4.3 Parameterization by Vertex Cover Number

Graphs with small vertex cover number allow for a particularly simple application of the construction from Sect. 4.2, because the same small separator (namely, the vertex cover itself) separates every vertex from the rest of the graph.

A vertex cover is a vertex subset C of V(G) such that every edge in G has at least one endpoint in C. The smallest k for which a vertex cover of size k exists is the vertex cover number of a graph, denoted \({\text {vc}}(G)\). The number of edges in such a graph is at most \(n\cdot {\text {vc}}(G)\).

Proof of Theorem 1.1, distances

Set \(X=V(G)-C\), \(Y= V(G)\), and \(Z=C\). Clearly, every vertex x is separated from all vertices by its neighbourhood N(x). Since C is a vertex cover, \(N(x)\subseteq C\) for all \(x\notin C\). Thus, Z separates X and Y. Note that we have \(e(x)=e(x,Y)\), so it suffices to run algorithm S to compute all eccentricities. The running time is immediate from Lemma 11 with Z of size \(k\le {\text {vc}}(G)\) and \(m\le kn\). From the eccentricities, the radius and diameter can be computed in linear time using their definition. \(\square\)

4.4 Parameterization by Treewidth

In the more general case, not all paths from x need to pass through the separator Z. Therefore we cannot determine e(x) from e(xY) alone, as we did in Sect. 4.3. Instead, since \(e(x)=\max \{e(x,X),e(x,Y)\}\), it remains to compute e(xX). However, e(xX) is entirely determined by the subgraph G[X] (once we add shortcuts inside Z), so we can handle this recursively. The necessary recursive decomposition is provided by a tree decomposition. We need the approximate treewidth construction of Bodlaender et al. [6]. The analysis of the resulting recurrence for superconstant dimension follows from Abboud et al. [1].

We need a decomposition from [7]. Let \(k+1<n\). A skewk-separator treeT of an n-vertex graph G is a binary tree such that each node t of T is associated with a vertex set \(Z_t\subseteq V(G)\) such that

  • \(|Z_t| \le k\),

  • If \(L_t\) and \(R_t\) denote the vertices of G associated with the left and right subtrees of t, respectively, then \(Z_t\) separates \(L_t\) and \(R_t\) and

    $$\begin{aligned} \frac{n}{k+1}\le |L_t\cup Z_t|\le \frac{nk}{k+1}\,, \end{aligned}$$
    (12)
  • T remains a skew k-separator even if edges between vertices of \(Z_t\) are added.

It is known that such a tree can be found from a tree decomposition, and an approximate tree decomposition can be found in single-exponential time. We summarise these results in the following lemma:

Lemma 12

([7, Lemma 3] with [6, Theorem 1]) For a given n-vertex input graph G, a skew \((5{\text {tw}}(G)+4)\)-separator tree can be computed in time \(n\exp O({\text {tw}}(G)).\)

We are ready for the algorithm.


Algorithm E (Eccentricities). Given an undirected, connected graphGwith nonnegative integer weights and a skewk-separator tree with roott, this algorithm computes the eccentricitye(v) of every vertex\(v\in V(G)\). We write\(Z=Z_t\), \(X= L_t\cup Z_t\), and\(Y=R_t \cup Z_t\).

E1:

[Base case.] If \(n/\ln n< 3k(k+1)\) find all distances using Dijkstra’s algorithm. Terminate.

E2:

[Find e(xY)] Compute e(xY) for all \(x\in X\) using algorithm S.

E3:

[Add shortcuts.] For each pair \(z,z'\in Z\), add the edge \(zz'\) to G, weighted by \(d(z,z')\). Remove duplicate edges, retaining the shortest.

E4:

[Recurse on G[X] and combine.] Recursively compute the distances in G[X] using the left subtree of t as a skew k-separator tree. The result are eccentricities e(xX) for each \(x\in X\). Set \(e(x) = \max \{ e(x,X), e(x,Y)\}\).

E5:

[Flip.] Repeat Steps E2–4 with the roles of X and Y exchanged.

Lemma 13

The running time of Algorithm E is \(O(n\cdot B(n,k)\cdot 2^k \log n)\).

Proof

Assume \(n\ge 8\). Let T(nd) denote the running time of Algorithm E.

The graph G has treewidth O(k), so it has O(nk) edges. Step E1 consists of n executions of Dijkstra’s algorithm with n bounded by \(O(k^2\log k)\). This takes time \(O(k^5\log ^2 k)\), which is bounded by \(O(2^k)\). Step E3 was analysed in Lemma 11 and takes time \(O(n2^k B(n,k))\). Accounting for the recursive calls in Step E4 for both X and Y using \(|Y| \le n - |X| + k\), we arrive at the divide-and-conquer recurrence

$$\begin{aligned} T(n,k) ={\left\{ \begin{array}{ll} O(k^5\log ^2 k) \,, &{}\text {if } n/\ln n< 4k(k+1)\,;\\ n\cdot S(n,k) + T(|X|, k) + T(n-|X|+k,k)\,, &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$

for some non-decreasing function S satisfying \(S(n,k) = O\bigl (2^k B(n,k)\bigr )\,.\) We would expect this recurrence to solve to roughly \(S(n,k)\cdot n\log n\) if the partition were perfectly balanced and k were constant, but the dependence on k is not clear, so we give a careful analysis.

The lemma is implied by the bound

$$\begin{aligned} T(n,k) \le 3 (k+1)\cdot S(n,k)\cdot n\ln n\,, \end{aligned}$$
(13)

which we will show by strong induction in n for all k. Write \(s=|X|\) and \(r=n-s+k\). By induction, we can bound

$$\begin{aligned} \frac{T(s,k) + T(r,k)}{3(k+1)} \le S(s,k)\cdot s\ln s + S(r,k)\cdot r\ln r \le S(n,k)\cdot (s\ln s +r\ln r)\,. \end{aligned}$$
(14)

From the bounds (12) on s we have \(s\le nk/(k+1)\) and \(r \le n-n/(k+1) +k = (nk/(k+1)) +k\), so if we set

$$\begin{aligned} t=\frac{nk}{k+1} +k\,, \end{aligned}$$

then both \(s\le t\) and \(r\le t\). Thus, we can get rid of s and r in the last term of (14) as

$$\begin{aligned} s\ln s + r\ln r \le s\ln t + r\ln t = s\ln t +(n-s+k)\ln t = n\ln t + k\ln t \le n\ln t + k\ln n\,. \end{aligned}$$
(15)

Step E1 ensures \(k(k+1)\le n/(3\ln n) \le \textstyle \frac{1}{3}n\,,\) so we get

$$\begin{aligned} t = \frac{nk}{k+1} + \frac{k(k+1)}{k+1} \le n \cdot \frac{k+\frac{1}{3}}{k+1}\,. \end{aligned}$$

Using the bound \(\ln y\le y-1\) for \(y >0\), we have

$$\begin{aligned} \ln t \le \ln n + \ln \biggl (\frac{k+\frac{1}{3}}{k+1}\biggr )\le \ln n + \biggl (\frac{k+\frac{1}{3}}{k+1} - 1\biggr ) = \ln n - \frac{2}{3(k+1)} \,. \end{aligned}$$
(16)

Using this in (15), and then going back to (14), we have

$$\begin{aligned} \frac{T(s,k)+T(r,k)}{3(k+1)\cdot S(n,k)} \le n\ln t + k\ln n\le n\ln n - \frac{2n}{3(k+1)}+k\ln n \le n\ln n-\frac{n}{3(k+1)} \,, \end{aligned}$$

where the last step uses \(k\ln n \le n/(3(k+1))\), which is ensured by Step E1. Returning to the recurrence, we can now verify

$$\begin{aligned} T(n,k) = n\cdot S(n,k) + T(r,k)+T(s,k) \le n\cdot S(n,k) + n\cdot S(n,k)\cdot (3(k+1)\ln n - 1)\,, \end{aligned}$$

which simplifies to (13). \(\square\)

We can now establish Theorem 1:

Proof of Thm. 1.2, distances

To compute all eccentricities for a given graph, we find a k-skew separator for \(k=5{\text {tw}}(G)+4\) using Lemma 12 in time \(n\exp O({\text {tw}}(G))\). We then run Algorithm E, using Lemma 13 to bound the running time. From the eccentricities, the radius and diameter can be computed in linear time using their definition. \(\square\)

4.5 Extension to Wiener Index

Algorithm E can be modified to compute the Wiener index, as described in [7, Sec. 4], completing the proof of Theorem 1. Instead of repeating those arguments, we content ourselves here with pointing out the necessary modifications in our presentation.

The orthogonal range queries for vertex \(x\in X\) now need to report the sum of distances to every \(y\in Y\), rather than just the value of the maximum distance e(xY). Such a query can be handled with our data structure and a more careful choice of monoid, but another technical issue appears. While the distance maxima satisfied \(e(x,Y)= \max \{e_1(x,Y), \ldots , e_k(x,Y)\}\) according to Lemma 9, no similar expression holds for distance sums. This is simply because there can be shortest xy-paths via two different \(z_i\), and their contribution would lead to overcounting. The solution is to associate the distance d(xy) with exactly one \(i\in \{1,\ldots , k\}\), namely the smallest i for which a shortest xy-path passes through \(z_i\).

This leads to the following (somewhat laborious) construction. For \(x\in X\), partition \(Y=Y_1\cup \cdots \cup Y_k\) into disjoint sets such that \(y\in Y_i\) if and only if (i) there is a shortest xy-path through \(z_i\) and (ii) there is none through \(z_j\) for \(j<i\). Then define \(s_i(x,Y)\) as the sum of distances from x to \(Y_i\):

$$\begin{aligned} s_i(x,Y) = \sum _{y\in Y_i} d(x,y_i)\,. \end{aligned}$$

We observe that \(s_1(x,Y)+\cdots +s_k(x,Y)\) is the sum of distances from x to all vertices in Y.

To compute \(s_i(x,Y)\), we modify the construction from Lemma 10 slightly. The coordinates of \(p_y\) are as before. The rectangle \(R_x\) associated with x for \(i\in \{1,\ldots , k\}\) now becomes \([-\infty , r_1]\times \cdots \times [-\infty , r_k]\) where

$$\begin{aligned} r_j = {\left\{ \begin{array}{ll} d(x,z_j) - d(x,z_i) -1\,,&{} j<i\,; \\ d(x,z_j) - d(x,z_i) \,,&{} j\ge i\,. \end{array}\right. } \end{aligned}$$

Following the proof of Lemma 10, the ‘\(-1\)’ above for \(j<i\) ensures that \(p_y\in R_x\) now also requires

$$\begin{aligned} d(x,z_i) + d(z_i, y)< d(x,z_j) + d(z_j, y)\qquad (i<j)\,. \end{aligned}$$

In other words, xy-paths through \(z_j\) for \(j<i\) cannot be shortest paths, so we have avoided overcounting. The domain of the function f in (9) is changed to the monoid of positive integer tuples (ab) with the operation \((a,b)\oplus (a',b') = (a+a', b+b')\) with identity element (0, 0). The value associated with vertex \(p_y\) is changed to \(f(p(y)) = (1, d(z_i,y))\).

It then holds that the query for \(R_x\) will return a tuple (NS) where

$$\begin{aligned} N = |Y_i| \text { and } S = \sum _{y\in Y_i} d(z_i, y) \end{aligned}$$
(17)

so that \(s_i(x,Y)\) can be computed as

$$\begin{aligned} s_i(x,Y) = N \cdot d(x,z_i) + S\,. \end{aligned}$$

These changes suffice to establish the Wiener index part of Theorem 1.1, finishing the argument for vertex cover.

To extend these results to the recursive construction for treewidth from Sect. 4.4 now only requires some delicacy regarding how sums of distances cross the separator. This part is carefully argued in [7, Lemma 8], and there is no reason to repeat it here. With the these changes, Theorem 1.2 is established.