1 Introduction

Given n points in the unit square, Newman [20, Problem 57] proved that there is a closed polygonal Hamiltonian cycle (tour) H through the n points such that the sum of the squares of its edge-lengths is at most 4. The upper bound of 4 cannot be improved: Fig. 1 shows three different point sets whose optimal tours yield exact equality. More importantly, the above upper bound is independent of n.

Fig. 1
figure 1

Tight examples with 4, 2, and 5 points: \(1+1+1+1= 2+2 = 1+1+1+ 1/2+1/2\)

Meir [19] considered the extension of this problem to higher dimensions. For a point \(x \in \mathbb {R}^k\), let |x| denote the Euclidean length of x; namely, if \(x=(\xi _1,\xi _2,\ldots ,\xi _k)\), then

$$\begin{aligned} |x|= \left( \sum _{i=1}^k \xi _i^2 \right) ^{1/2}. \end{aligned}$$

For two points \(x,y \in \mathbb {R}^k\), let the weight of the edge \(e=xy\), be \(|e|:=|x-y|\), i.e., the Euclidean distance between x and y.

Let X be an n-element point set in the unit cube \([0,1]^k\). For a graph G on vertex set X, set

$$\begin{aligned} S_k(G)&= \sum _{e\in G} |e|^k \quad \text { and } \quad s_k(G) = \left( \sum _{e\in G} |e|^k\right) ^{1/k}. \end{aligned}$$
(1)

We refer to \(S_k(G)\) and \(s_k(G)\) as the unscaled and scaled costs, respectively. Denote by \(S_k^{\texttt {HC}}(X)\), \(S_k^{\texttt {ST}}(X)\) and \(S_k^{\texttt {HP}}(X)\) (\(s_k^{\texttt {HC}}(X)\), \(s_k^{\texttt {ST}}(X)\) and \(s_k^{\texttt {HP}}(X)\)) the minimum over \(S_k(G)\) (\(s_k(G)\)) where G is a Hamiltonian cycle, respectively a spanning tree or Hamiltonian Path with vertex set X. Further, let

$$\begin{aligned} s_k^{\texttt {HC}}(n)= & {} \sup \{ s_k^{\texttt {HC}}(X) :X\subseteq [0,1]^k, |X|=n \},\\ s_k^{\texttt {ST}}(n)= & {} \sup \{ s_k^{\texttt {ST}}(X) :X\subseteq [0,1]^k, |X|=n \}, \\ s_k^{\texttt {HP}}(n)= & {} \sup \{ s_k^{\texttt {HP}}(X) :X\subseteq [0,1]^k, |X|=n \}, \\ s_k^{\texttt {HC}}= & {} \sup _{n \ge 2} s_k^{\texttt {HC}}(n), \quad s_k^{\texttt {ST}} = \sup _{n \ge 2} s_k^{\texttt {ST}}(n) \quad \text {and} \\ s_k^{\texttt {HP}}= & {} \sup _{n \ge 2} s_k^{\texttt {HP}}(n). \end{aligned}$$

It is clear that \(s_k^{\texttt {HC}}(n) \ge s_k^{\texttt {HC}}(m)\), whenever \(n \ge m\) (by clustering points and taking the limit). In this notation, Newman’s result mentioned earlier reads \(s_2^{\texttt {HC}}(n) =2\) for every \(n \ge 2\). A more recent reference to this result can be found in [6, Problem 124]. Currently this is the only exact value known. Meir [19] asked whether \(s_k(n)\) is bounded from above by a constant \(c_k>0\) for every k. Soon after, Bollobás and Meir [7] answered Meir’s question in the positive by proving that \( s_k^{\texttt {HC}}(n) \le 9 \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}\) for every \(k \ge 3\) and \(n \ge 2\) (and recall that \(c_2=2\)). From the other direction, the 2-point example consisting of two opposite vertices of \(\{0,1\}^k\) shows that \(s_k^{\texttt {HC}}(n) \ge 2^{1/k} \cdot \sqrt{k}\) for every \(k \ge 2\) and \(n \ge 2\); see Fig. 1 (center). We record their result below.

Theorem 1.1

(Bollobás and Meir [7]). Let \(k\ge 3\) and \(n\ge 2\). Then,

$$\begin{aligned} 2^{1/k}\sqrt{k} \le s_k^{\texttt {HC}}(n) \le 3^{2-\frac{1}{k}}2^{1/k}\sqrt{k}. \end{aligned}$$

In the conclusion of their paper [7], the authors conjectured that \(s_k^{\texttt {HC}}(n) = 2^{1/k} \cdot \sqrt{k}\) for every \(k \ge 2\) and \(n \ge 2\). Meir [19] also asked for an algorithm that computes a tour whose cost is bounded by a constant depending on k. As we will see in more detail in Sect. 2, Bollobás and Meir’s proof implicitly gives a positive answer to this latter question. Similarly, our new bounds in Theorem 1.3 and Corollary 5.1 are constructive too.

Background and related work. The traveling salesman problem (TSP) is perhaps the most studied problem in the theory of combinatorial optimization. Its approximability depends on the particular version of the problem. Specifically, TSP with Euclidean distances admits a polynomial-time approximation scheme [3, 16]. If the distances form a metric, then the problem is \(\textsf {MaxSNP}\)-hard [21] and the best approximation ratio known is essentially 3/2 [8, 13].

Estimating the length of a shortest tour of n points in the unit square with respect to Euclidean distances has been studied as early as 1940 s and 1950s by Fejes Tóth [10], Few [11], and Verblunsky [30], respectively. Few [11] proved that the (Euclidean) length of a shortest cycle (tour) through n points in the unit square \([0,1]^2\) is at most \(\sqrt{2n}+7/4\). The same upper bound holds for the minimum spanning tree [11]. Few’s bound was rediscovered in 1983 by Supowit, Reingold, and Plaisted [26]. A slightly better upper bound for the shortest cycle, \(1.392 \sqrt{n} + 7/4\), has been derived by Karloff [14], who also emphasized the difficulty of the problem. The current best lower bound for the length of such a cycle is due to Fejes Tóth [10] and Few [11]: it is \(\left( \frac{4}{3}\right) ^{1/4} \sqrt{n} - o(\sqrt{n})\), where \((4/3)^{1/4} = 1.075\ldots \). For every dimension \(k \ge 3\), Few showed that the maximum length of a shortest tour through n points in the unit cube is \(\Theta (n^{1-1/k})\). Moran [18] studied the length of the shortest traveling salesman tour through a set of n points of unit diameter in \(\mathbb {R}^k\).

The length of a shortest tour through a random sample \(\{ X_1,\ldots ,X_n\}\) of n points in the unit cube \([0,1]^k\) was determined by Beardwood, Halton, and Hammersley. Let this length be denoted by \(L(X_1,\ldots ,X_n)\). If \(\{X_i\}\) is a sequence of independent random variables with the uniform distribution on \([0,1]^k\), then there is a constant \(\beta (k)>0\) such that

$$\begin{aligned} {L(X_1,\ldots ,X_n) / n^{1-1/k}} \rightarrow \beta (k) \end{aligned}$$

with probability one [4]. Later, Rhee [22] proved that \({\beta (k) / \sqrt{k}} \rightarrow {1 / \sqrt{2 \pi e}}\), see also [25]. The relevance of the cube diagonal, \(\sqrt{k}\), in the above formulas, can be also observed in our estimates for \(s_k(n)\); see Theorem 1.3 (ii) and Conjecture 5.5.

Expressions for the cost of a Hamiltonian cycle of the kind in (1) have been considered in the context of power assignment problems in wireless networks. Let X be an n-element point set in the unit cube \([0,1]^k\) and \(\alpha \ge 1\) be a real number. For a Hamiltonian cycle H as above, one is interested in minimizing a cost of the form

$$\begin{aligned} \textrm{cost}(H)= \sum _{i=1}^n |x_i - x_{i+1}|^\alpha . \end{aligned}$$
(2)

Such costs typically reflect the energy costs along the edges that make the cycle [9, 15] in wireless network transmission. An illustrative example is that of a virtual token floating through the network, where sensor nodes can attach or read data from the token before sending it to the next node on the cycle. One can speak about finding a traveling salesman tour (TSP tour) of minimum energy cost [12]. The fact that k is the smallest value of \(\alpha \) for which the cost in (2) is bounded from above by a constant (depending on k but independent on n) should be noted [7, 15]; a fine grid section in the cube proves this point.

As pointed out in several places in the literature [2, 5, 9, 12], simply computing a short (even optimal) tour for the underlying Euclidean instance does not work, i.e., does not provide a good approximation with respect to the power costs in (2). Funke, Laue, Lotker and Naujoks [12] showed that the cost of an optimal tour for the Euclidean instance can be a factor of \(\Omega (n)\) larger than that of optimal tour for the power costs (a simple example can be constructed with equidistant points on a line or on a circle of large radius).

In [12] a recursive algorithm was also presented, that given n points in \(\mathbb {R}^2\), it constructs a TSP tour for edge costs \(|pq|^\alpha = |e|^\alpha \), whose cost is at most \(2 \cdot 3^{\alpha -1}\) times that of a minimum spanning tree (MST) of the point set. Since the cost of an MST does not exceed that of an optimal Euclidean TSP tour, their algorithm is \(2 \cdot 3^{\alpha -1}\)-factor approximation for the TSP with power costs as in (2). The authors further show that the approach extends to \(\mathbb {R}^k\) with the same ratio:

Theorem 1.2

(Funke, Laue, Lotker, and Naujoks [12]). There exists a \(2 \cdot 3^{\alpha -1}\)-approximation algorithm for the TSP in \(\mathbb {R}^k\) if the edge weights are Euclidean distances to the power \(\alpha \).

If for some \(\tau >1\) distances of a TSP instance satisfy

$$\begin{aligned} \textrm{dist}(x,z) \le \tau \left( \textrm{dist}(x,y) + \textrm{dist}(y,z) \right) , \end{aligned}$$

for any three vertices xyz, we say that they satisfy the relaxed triangle inequality, see [2, 5, 17]. It is important to note that the metric with Euclidean distances to the power \(\alpha \) satisfies the relaxed triangle inequality with \(\tau = 2^{\alpha -1}\); see [9, 12]. For \(\alpha =2\) (i.e., TSP with squared distances), Theorem 1.2 yields a 6-approximation. De Berg, van Nijnatten, Sitters, Woeginger and Wolff [9] obtained a 5-approximation.

Our results. The upper bound \(s_k^{\texttt {HC}}(n) \le 9 \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}\), where \(k \ge 3\), has stood unchanged for 30 years [7]. Here we obtain several improvements.

Theorem 1.3

The following bounds are in effect:

  1. (i)

    There exists a 4-element point set in \([0,1]^3\) such that the cost of the shortest tour is at least \(2^{7/6}=2.24\ldots \). Consequently, \(s_3^{\texttt {HC}}(n) \ge 2^{7/6}=2.24\ldots \), for every \(n \ge 4\).

  2. (ii)

    Let X be an n-element point set in the k-dimensional unit cube \([0,1]^k\), \(k \ge 3\). Then there exists a tour \(H=x_1, x_2, \ldots , x_n\) through the n points, such that \(\left( \sum _{i=1}^n |x_i - x_{i+1}|^k \right) ^{1/k} \le 3 \sqrt{5} \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}\). Consequently, \(s_k^{\texttt {HC}}(n) \le 3 \sqrt{5} \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k} = 6.708\ldots \cdot \left( \frac{2}{3} \right) ^{1/k} \cdot \sqrt{k}\).

  3. (iii)

    H can be computed in time proportional to that needed for computing a MST of the points, in particular, in subquadratic time.

Several sharper bounds are obtained for sufficiently large k. We note that the conjectured optimal configuration consisting of a diameter pair of the cube as well as the lower bound construction we will present for \(k=3\) in Theorem 1.3 (i) are subsets of \(\{0,1\}^k\). This raises the natural question if one can determine the maximum of \(s_k^{\texttt {HC}}(X)\) if the point set X is in \(\{0,1\}^k\). We answer this question.

Theorem 1.4

There exists an integer \(k_0\) such that for all \(k\ge k_0\) the following holds. If X is an arbitrary subset of vertices of \(\{0,1\}^k\), then there exists a Hamiltonian cycle H through X such that \(s_k(H) \le 2^{1/k}\sqrt{k}\).

The “sufficiently large” requirement for Theorem 1.4 is in fact quite modest. The threshold \(k_0\) is below 30. Note that the bound in Theorem 1.4 is attained for \(|X|=2\).

Theorem 1.5

For the family of minimum spanning trees, we have

$$\begin{aligned} s_k^{\texttt {ST}} \le \sqrt{k} \ (1+o_k(1)). \end{aligned}$$

Apart from the error term, this bound is best possible.

By transforming a minimum spanning tree into a Hamiltonian cycle by using the method of Sekanina [23] and Bollobás and Meir [7], we obtain \(s_k^{\texttt {HC}} \le 3\sqrt{k} \ (1+o_k(1))\). A further refinement based on a two-phase algorithm and a new greedy algorithm that maintains a collection of spanning paths allows us to obtain the following sharper bound.

Theorem 1.6

For the family of Hamiltonian cycles, we have

$$\begin{aligned} s_k^{\texttt {HC}} \le 2.91 \sqrt{k} \ (1+o_k(1)). \end{aligned}$$

When the number of points n is bounded by a constant (independent of k), we can obtain a better asymptotic bound, close to the conjectured value \(2^{1/k}\sqrt{k}\).

Theorem 1.7

Let \(n\ge 2\) be fixed. For the family of Hamiltonian cycles, we have

$$\begin{aligned} s_k^{\texttt {HC}}(n)= 2^{1/k}\sqrt{k} \ (1+o_k(1)). \end{aligned}$$

Note however, that in Theorem 1.7 we require n to be constant; it does not imply \(s_k^{\texttt {HC}}= 2^{1/k}\sqrt{k} \ (1+o_k(1))\).

The improved upper bounds in Theorem 1.3 and 1.6, have implications for the existence of Hamiltonian paths and perfect matchings whose costs are bounded from above by constants depending on k. These are discussed in Sect. 5.

2 Hamiltonian Cycles: Exact Upper and Lower Bounds

2.1 An Improved Lower Bound for \(k=3\)

In this subsection we prove Theorem 1.3(i). Consider the four-element point set

$$\begin{aligned} X=\{(0,0,0),(0,1,1),(1,0,1),(1,1,0)\} \subset [0,1]^3. \end{aligned}$$

X is in fact a binary code of length 3 with minimum Hamming distance 2; see, e.g., [29, Ch. 5]. As such, the corresponding Euclidean pairwise distances are at least \(\sqrt{2}\). Consequently, the unscaled cost of any TSP tour H is at least \(S_k(H)\ge 4 \cdot (\sqrt{2})^3 = 11.31\ldots \). On the other hand, the conjectured [7] optimal unscaled cost was \(2 \cdot (\sqrt{3})^3 = 10.39\ldots \). \(\square \)

It is possible that the new lower bound gives the right value of \(s_3^{\texttt {HC}}(n)\) for \(n \ge 4\), see Conjecture 5.5 in Sect. 5.

Remark. Interestingly enough, for \(k=4\), there exist (at least) two different point sets, one with \(n=2\) and the other with \(n=8\), whose shortest tours have the same cost \(S_4^{\texttt {HC}}(X)\) as the conjectured value, \(S_4^{\texttt {HC}}(n) = 2 \cdot (\sqrt{4})^4 = 32\). The former set consists of a pair of diagonally opposite vertices, say, \(\{(0,0,0,0),(1,1,1,1)\}\). This is in fact the point set that is behind the conjectured maximum cost for every k. The latter set is a binary code of length 4 with minimum distance 2; for example, one can take the eight binary vectors with an even number of ones:

$$\begin{aligned} X&= \{(0,0,0,0),(0,0,1,1),(0,1,0,1),(0,1,1,0),\\&(1,0,0,1), (1,0,1,0), (1,1,0,0), (1,1,1,1)\}. \end{aligned}$$

Then \(S_4^{\texttt {HC}}(X) \ge 8 (\sqrt{2})^4 = 32\) and this value can be attained; equivalently, \(s_4^{\texttt {HC}}(X) \ge 2^{5/4}\). We were not able to find two different sets X with \(s_k^{\texttt {HC}}(X) \ge 2^{1/k} \cdot \sqrt{k}\) for any other \(k \ge 5\).

2.2 An Improved Upper Bound for Every \(k \ge 3\)

In this section we prove the last two items in Theorem 1.3. Our proof is modeled by that in [7]. It uses a ball packing argument based on the following lemma. (A similar lemma, however, with smaller ball radii, can be found in [15].)

Lemma 2.1

(Bollobás and Meir [7]). Let \(T=(V,E)\) be a minimum spanning tree for a finite point set \(X \subset \mathbb {R}^k\). For each edge \(e=xy \in E\) let \(B_{e}\) be the open ball of radius \(\frac{1}{4} |x-y|\) centered at \(\frac{1}{2} (x+y)\). Then \(B_{e} \cap B_{e'} =\emptyset \) whenever e and \(e'\) are edges of T. The factor \(\frac{1}{4}\) is as large as possible.

In addition, a suitable order of traversing the vertices of a minimum spanning tree first developed by Sekanina [23, 24] is needed. The algorithm can be made to run in linear time. A proof of this traversal result — in slightly different terms — also appears in [7]. A few definitions and notations (from [7]) are as follows. The h’th power \(G^h\) of a graph \(G=(V,E)\) is the graph with vertex set V and edge set \(E(G^h) = \{xy :x,y \in V, 1 \le d(x,y) \le h\}\). Here d(xy) is the distance between x and y in the graph. Let T be a tree and \(xy \in E(T^h)\). An edge \(uv \in E(T)\) is said to be used by xy if the edge uv is on the unique path in T (of length at most h) from x to y. If H is a subgraph of \(T^h\), then an edge of T is used t times by H if it is used by t edges of H.

Lemma 2.2

(Sekanina [23], Bollobás and Meir [7]). Let x be a vertex of a tree T with at least 3 vertices. Then \(T^3\), the cube of T, contains a Hamiltonian cycle H such that every edge of T is used exactly twice by H, and one of the edges of H incident to x is an edge of T.

It implies the following lemma which is not stated explicitly in [7] but is used in the proof of their Theorem 3. For completeness, we include their proof here.

Lemma 2.3

(Bollobás and Meir [7]). Let T be a spanning tree for a finite point set \(X \subset \mathbb {R}^k\). Then there exists a Hamiltonian cycle H on X such that

$$\begin{aligned} S_k(H)\le \frac{2}{3}\cdot 3^k \cdot S_k(T). \end{aligned}$$

Proof

Let \(e_1,\ldots ,e_n\) be the edges of a Hamiltonian cycle H in \(T^3\) guaranteed by Lemma 2.2. Suppose that the edges of T used by \(e_i\) have lengths \(d_{i_{1}},\ldots , d_{i_{\ell }}\), where \(\ell \le 3\). Set \(f_i = d_{i_{1}}+\ldots +d_{i_{\ell }}\) and \(f=(f_i)_{i\in [n]}\in {\mathbb {R}}^n\). Then \(|e_i|\le f_i\) for every i, each \(f_i\) is a sum of at most three \(d_j\)’s and each \(d_j\) occurs in the representations of two \(f_i\)’s.

Now, we can form three vectors \(v_1,v_2,v_3\in {\mathbb {R}}^n\) such that \(f=v_1+v_2+v_3\), every coordinate of \(v_i\) is a \(d_j\) or 0, and every \(d_j\) occurs exactly twice as a coordinate in the three \(v_i\)’s. Therefore, \(\sum _{i=1}^3 \Vert v_i \Vert _k^k = 2\sum _{j=1}^{n-1}d_j^k\). Hence, by the triangle-inequality and Jensen’s inequality,

$$\begin{aligned} \Vert f \Vert _k&=\Vert v_1+v_2+v_3 \Vert _k \le \sum _{i=1}^3 \Vert v_i \Vert _k\le 3 \left( \frac{1}{3} \sum _{i=1}^3 \Vert v_i \Vert _k^k \right) ^{1/k} \\&= 3 \left( \frac{2}{3} \sum _{j=1}^{n-1}d_j^k \right) ^{1/k}= 3 \left( \frac{2}{3}\right) ^{1/k} \cdot s_k(T), \end{aligned}$$

and thus

$$\begin{aligned} S_k(H)=\sum _{i=1}^n |e_i|^k\le \Vert f \Vert _k^k\le \frac{2}{3}\cdot 3^k \cdot S_k(T). \end{aligned}$$

\(\square \)

For convenience, here we work with the unit cube \(U=[-1/2,1/2]^k\) centered at the origin \(o=(0,\ldots ,0)\). Assume that \(n \ge 3\), since it is clear otherwise that \(s_k(H) \le 2^{1/k} \cdot \sqrt{k}\). It was shown in [7] that \(\cup _{e\in T} B_e \) is contained in the ball of radius \(0.75 \sqrt{k}\) centered at the origin o. We next show that \(\cup _{e\in T} B_e\) is contained in the ball of radius \( \frac{\sqrt{5}}{4} \sqrt{k} =0.559 \ldots \cdot \sqrt{k}\) centered at o. The idea for the improvement is that centers of balls corresponding to long edges of T cannot be too far from the center of the cube. The key step is the following.

Lemma 2.4

Let \(U=[-1/2,1/2]^k\) and \(u,v \in U\). Then

$$\begin{aligned} \frac{|u+v|}{2} + \frac{|u-v|}{4} \le \frac{\sqrt{5}}{4} \sqrt{k}. \end{aligned}$$
(3)

This inequality is the best possible.

Proof

To start with, note that

$$\begin{aligned} |u|^2 \le \sum _1 ^k \frac{1}{4} = \frac{k}{4}, \quad \quad |v|^2 \le \sum _1 ^k \frac{1}{4} = \frac{k}{4} \quad \quad \text {and} \quad \quad |u -v| \le \sqrt{k}. \end{aligned}$$

The first two relations immediately yield

$$\begin{aligned} |u|^2 + |v|^2 \le \frac{k}{2}. \end{aligned}$$
(4)

Recall the identities

$$\begin{aligned} |u+v|^2 = |u|^2 + |v|^2 + 2 uv, \quad \quad |u-v|^2&= |u|^2 + |v|^2 - 2 uv. \end{aligned}$$
(5)

Here uv is the dot product of u and v. We deduce that

$$\begin{aligned} |u-v|^2 = 2 (|u|^2+|v|^2) - |u+v|^2 \le 2 (|u|^2+|v|^2). \end{aligned}$$

We can thus write \(|u-v| = \lambda \sqrt{|u|^2+|v|^2}\), where \(0 \le \lambda \le \sqrt{2}\), whence

$$\begin{aligned} |u-v|^2 = \lambda ^2 (|u|^2+|v|^2). \end{aligned}$$

From the two equations in (5) we also obtain

$$\begin{aligned} |u+v| = \sqrt{2(|u|^2 + |v|^2) - |u-v|^2} = \sqrt{(2 - \lambda ^2) (|u|^2 + |v|^2)}. \end{aligned}$$

Substituting the expressions of \(|u+v|\) and \(|u-v|\) and using (4) yields

$$\begin{aligned} \frac{|u+v|}{2} + \frac{|u-v|}{4}&= \frac{\sqrt{(2 - \lambda ^2) (|u|^2 + |v|^2)}}{2} + \frac{\lambda \sqrt{|u|^2+|v|^2}}{4} \\&= \left( \frac{ \sqrt{2 - \lambda ^2}}{2} + \frac{\lambda }{4} \right) \sqrt{|u|^2+|v|^2} \\&\le \frac{1}{4} \left( \lambda + 2 \sqrt{2 -\lambda ^2} \right) \sqrt{ \frac{k}{2}}. \end{aligned}$$

A standard calculation shows that the function \(f(\lambda ) = \lambda + 2 \sqrt{2 -\lambda ^2}\), where \(0 \le \lambda \le \sqrt{2}\), attains its maximum, \(\sqrt{10}\), at \(\lambda = \sqrt{\frac{2}{5}}\). Consequently,

$$\begin{aligned} \frac{|u+v|}{2} + \frac{|u-v|}{4}&\le \frac{1}{4} \sqrt{10} \sqrt{\frac{k}{2}} = \frac{\sqrt{5}}{4} \sqrt{k}. \end{aligned}$$

This concludes the proof of the upper bound.

For a tight example, assume that k is a multiple of 5 and let \(u=u_1,\ldots ,u_k\), and \(v=v_1,\ldots ,v_k\), where

$$\begin{aligned} u_i&= \left\{ \begin{array}{ll} +\frac{1}{2}, &{} \text { for } i=1,\ldots ,\frac{4k}{5}, \\ -\frac{1}{2}, &{} \text { for } i=\frac{4k}{5}+1,\ldots ,k. \end{array} \right. \\ v_i&= +\frac{1}{2}, \ \ \ \ \text { for } i=1,\ldots ,k. \end{aligned}$$

It is now easily verified that

$$\begin{aligned} \frac{|u+v|}{2}&= \sqrt{\frac{4k}{5} \cdot \frac{1}{4}}, \frac{|u-v|}{4} = \sqrt{\frac{k}{5} \cdot \frac{1}{16}}, \quad \quad \text {and} \\ \frac{|u+v|}{2} + \frac{|u-v|}{4}&= \frac{5}{4} \cdot \sqrt{\frac{k}{5}} = \frac{\sqrt{5}}{4} \sqrt{k}, \end{aligned}$$

as required. \(\square \)

Final argument in the proof of Theorem 1.3. Let \(u,v \in U\) such that \(e=uv\) is an edge of the MST T. By the triangle inequality, the distance from the center of the cube to any point in the ball \(B_{e}\) is at most \(\frac{1}{2} |u+v| + \frac{1}{4} |u-v|\). By Lemma 2.4 this distance is at most \(\frac{\sqrt{5}}{4} \sqrt{k}\), thus \(\cup _{e\in T} B_e \subset B\), where B is the ball of radius \( \frac{\sqrt{5}}{4} \sqrt{k} =0.559 \ldots \cdot \sqrt{k}\) centered at o.

The ball packing argument in [7] yields \(S_k(T) \le (3 \sqrt{k})^k\). Using Lemma 2.4 instead improves this bound to \(S_k(T) \le (\sqrt{5k})^k\). By Lemma 2.3 we obtain a Hamiltonian cycle H through P satisfying

$$\begin{aligned} S_k(H) \le \frac{2}{3}\cdot 3^k \cdot S_k(T) \le \frac{2}{3} \cdot 3^k \cdot (5k)^{k/2}. \end{aligned}$$
(6)

Taking the k-th root completes the proof of item (ii). Note that the only change in the calculation is replacing a multiplicative factor of 3 by \(\sqrt{5}\) (in Inequality (2) from [7]). The improvement carries on proportionally and is reflected in the final bound.

Recall that the traversal of the MST T using the algorithm of Sekanina [23, 24] takes linear time. As such, the running time for computing the TSP tour is determined by the time to compute T. This proves item (iii) and completes the proof of Theorem 1.3. \(\square \)

An alternative way to verify the upper bound in (6) is by using Theorem 1.2. The details are left to the reader.

3 Hamiltonian Cycles for Subsets of Cube Vertices

In this section we consider our problem (the study of extremal values for Hamiltonian cycles and paths in \([0,1]^k\)) when the input is restricted to subsets of cube vertices. Note that this restriction is quite natural, since all known best constructions are attained or matched by such subsets. We will use some results on binary codes.

3.1 Preparation: Binary Codes

First we prove an optimization result which will be used multiple times throughout this paper.

Lemma 3.1

Let \(q_1,q_2,\ldots , q_m\in [0,1]\). Then,

$$\begin{aligned} \sum _{i<j} |q_i-q_j|^2 \le \left\lfloor \frac{m}{2} \right\rfloor \cdot \Big \lceil \frac{m}{2} \Big \rceil . \end{aligned}$$

Proof

We prove this result by induction on m. The statement holds trivially for \(m=1\) and \(m=2\). Let \(q_1,q_2,\ldots , q_m\in [0,1]\) for some \(m\ge 3\). We can assume \(0=q_1\le q_2\le \ldots \le q_m=1\). By the induction assumption,

$$\begin{aligned} \sum _{1<i<j<m} |q_i-q_j|^2 \le \left\lfloor \frac{m-2}{2} \right\rfloor \cdot \Big \lceil \frac{m-2}{2} \Big \rceil . \end{aligned}$$

Observe that the maximum of the quadratic function \(f(x) = x^2 + (1-x)^2\) over the interval [0, 1] is obtained at \(x=0\) or \(x=1\). Thus, \(|q_1-q_j|^2+|q_m-q_j|^2=q_j^2+(1-q_j)^2\le 1\) for \(j\in \{2,\ldots ,m-1\}\). Therefore,

$$\begin{aligned} \sum _{i<j} |q_i-q_j|^2&= |q_1-q_m|^2+\sum _{1<j<m} (|q_1-q_j|^2+|q_m-q_j|^2) + \sum _{1<i<j<m} |q_i-q_j|^2 \\&\le 1+(m-2)+ \left\lfloor \frac{m-2}{2} \right\rfloor \cdot \Big \lceil \frac{m-2}{2} \Big \rceil = \left\lfloor \frac{m}{2} \right\rfloor \cdot \Big \lceil \frac{m}{2} \Big \rceil , \end{aligned}$$

completing the proof of this lemma. \(\square \)

Lemma 3.2

Let \(\delta ,\gamma > 0\), and \(k_1,k_2\) be non-negative integers. Let \(X\subseteq [0,\delta ]^{k_1} \times [0,\gamma ]^{k_2}\) be a finite set of size \(|X|\ge m\ge 2\). Then there exists two distinct points \(p,q\in X\) such that

$$\begin{aligned} |p-q|^2\le \frac{\lfloor \frac{m}{2} \rfloor \lceil \frac{m}{2} \rceil }{\left( {\begin{array}{c}m\\ 2\end{array}}\right) } \, (\delta ^2k_1+\gamma ^2k_2). \end{aligned}$$

Proof

Let \(p_1,p_2,\ldots ,p_m\) be any m points from X. Given integers i and j, we denote by \({p_i}_j\) the j-th coordinate of \(p_i\). By applying Lemma 3.1 and scaling we obtain

$$\begin{aligned} \sum _{i<i'} |p_{{i}_j} -p_{{i'}_j}|^2&\le \left\lfloor \frac{m}{2} \right\rfloor \cdot \Big \lceil \frac{m}{2} \Big \rceil \cdot \delta ^2 \quad \quad \text {for every}\, j\in [k_1],\, \text {and} \end{aligned}$$
(7)
$$\begin{aligned} \sum _{i<i'} |p_{{i}_j} -p_{{i'}_j}|^2&\le \left\lfloor \frac{m}{2} \right\rfloor \cdot \Big \lceil \frac{m}{2} \Big \rceil \cdot \gamma ^2 \quad \quad \text {for every}\, j\in [k_1+k_2]\setminus [k_1]. \end{aligned}$$
(8)

By summing up the inequalities (7) and (8), we obtain

$$\begin{aligned} \sum _{i<i'} |p_{{i}} -p_{{i'}}|^2 \le \left\lfloor \frac{m}{2} \right\rfloor \cdot \Big \lceil \frac{m}{2} \Big \rceil \cdot (\delta ^2k_1+\gamma ^2k_2). \end{aligned}$$

Thus, by averaging over all pairs of points, the minimizing pair satisfies the claimed inequality. \(\square \)

Applying Lemma 3.2 with \(\delta =\gamma =1\), \(k_1=k\) and \(k_2=0\), immediately yields the following symmetric version.

Lemma 3.3

Let \(X\subseteq [0,1]^{k}\) of size \(|X|\ge m \ge 3\). Then there exist two distinct points \(p,q \in X\) such that

$$\begin{aligned} |p-q|^2 \le \frac{\lfloor \frac{m}{2} \rfloor \cdot \lceil \frac{m}{2} \rceil }{{m \atopwithdelims ()2}} \cdot k. \end{aligned}$$

Let A(kd) denote the maximum cardinality of a binary code of length k with minimum distance d. We recall the following fact [28]:

Lemma 3.4

(Singleton bound). \(A(k,d) \le 2^{k-d+1}\).

We need the following improvement.

Lemma 3.5

If \(d<\frac{2}{3}k\), then \(A(k,d) \le 2^{k-\frac{3}{2}d+2}\).

Proof

Towards contradiction, assume that there exists \(X\subseteq \{0,1\}^k\) of size \(|X|>2\cdot 2^{k-\frac{3}{2}d+1}\) such that \(|p-q|^2\ge d\) for every \(p,q\in X\). By the pigeonhole principle, there exists \(p,q,r \in X\) which coincide on the first \(\lfloor k-\frac{3}{2}d+1 \rfloor \) coordinates. By Lemma 3.3, applied with \(m=3\) to the last \(\lceil \frac{3}{2}d\rceil -1\) coordinates, we get that

$$\begin{aligned} \min \{|p-q|^2,|p-r|^2,|r-q|^2\} \le \frac{2}{3} \left( \Big \lceil \frac{3}{2}d\Big \rceil -1\right) < d, \end{aligned}$$

a contradiction. \(\square \)

3.2 Building a Path Greedily

In the proofs of some of our results we will analyze a greedy algorithm which takes a discrete point set \(X\subseteq [0,1]^k\) of size \(|X|=n\) as an input and creates a Hamiltonian path F through X. It processes the point pairs in nondecreasing order of distance and maintains a collection of paths.

Algorithm 1: Initially, set \(F_0\) to be the empty graph on X. For \(i\in [n-1]\), let \(e_i\) be an edge of smallest weight among all edges \(e\not \in F_{i-1}\) which satisfy that \(F_{i-1}+e\) is a vertex-disjoint union of paths. Set \(F_{i}:=F_{i-1}+e_i\). Then, \(F:=F_{n-1}\) is a Hamiltonian path.

Lemma 3.6

Let \(j\in [k]\). The number of edges \(e\in F\) satisfying \(|e|^2\ge j\) is less than A(kj).

Proof

Let \(\ell \) be the smallest integer such that \(|e_\ell |^2\ge j\). The number of edges \(e\in F\) satisfying \(|e|^2\ge j\) is less than the number of components in \(F_{\ell }\), which is \(n-\ell \). Let \(P_\ell \subseteq X\) be a set containing one endpoint of each path in \(F_{\ell }\). The set \(P_\ell \) is a binary code of length k with minimum distance j. Thus, the number of edges \(e\in F\) satisfying \(|e|^2\ge j\) is less than A(kj). \(\square \)

Proof of Theorem 1.4

If \(|X|=2\), the statement holds trivially. Assume \(n:=|X|\ge 3\). Let F be the Hamiltonian path created by Algorithm 1. We partition the edges \(e\in F\) into four classes.

  1. 1.

    short edges: \(|e|^2 \le \frac{k}{5}\).

  2. 2.

    medium edges: \(\frac{k}{5} < |e|^2 \le \frac{3k}{5}\).

  3. 3.

    long edges: \(\frac{3k}{5} < |e|^2 \le \frac{2k}{3} \).

  4. 4.

    very long edges: \(\frac{2k}{3} <|e|^2 \).

Denote by \(F^s,F^m,F^l,f^{vl}\) the subgraphs of F containing all short, medium, long and very long edges, respectively. They partition F and thus \(S_k(F)= S_k(F^s)+S_k(F^m)+S_k(F^l)+S_k(F^{vl})\). We will provide upper bounds for the four contributions separately.

Since \(n\le 2^k\), the number of short edges is trivially at most \(2^k\). Thus,

$$\begin{aligned} S_k(F^s)\le 2^k \left( \sqrt{\frac{k}{5}}\right) ^k. \end{aligned}$$

Now, we estimate \(S_k(F^m)\). Let j be an integer satisfying \(\frac{k}{5} < j \le \frac{3k}{5}\). The number of edges \(e\in F\) satisfying \(|e|^2\ge j\) is less than \( A(k,j)\le 2^{k-\frac{3}{2}j+2}\) by Lemmas 3.5 and  3.6. Therefore,

$$\begin{aligned} S_k(F^m)\le & {} \sum _{j=\Big \lceil \frac{k}{5}\Big \rceil }^{\left\lfloor \frac{3k}{5}\right\rfloor } 2^{k-\frac{3}{2}j+2} \left( \sqrt{j}\right) ^k \\\le & {} 4\cdot \left( \left\lfloor \frac{3k}{5}\right\rfloor -\Big \lceil \frac{k}{5}\Big \rceil \right) \cdot \left( 0.842\sqrt{k}\right) ^k \\\le & {} \frac{8k}{5} \cdot \left( 0.842\sqrt{k}\right) ^k. \end{aligned}$$

Here we used that the function \(f(x)=2^{1-3x/2} \sqrt{x}\), where \(x\ge 0\), is maximized for \(x=\frac{1}{\log (8)}\) and thus \(2^{1-3x/2} \sqrt{x}\le 0.842\).

Next, we estimate \(S_k(F^l)\). The number of edges \(e\in F\) satisfying \(|e|^2>\frac{3k}{5}\) is less than    \(A(k,\lfloor \frac{3k}{5}\rfloor +1)\le 4\) by Lemma 3.3, applied with \(m=5\) and by Lemma 3.6. Therefore,

$$\begin{aligned} S_k(F^l)\le 3 \cdot \left( \sqrt{\frac{2}{3}k}\right) ^k. \end{aligned}$$

Last, we estimate \(S_k(F^{vl})\). The number of edges \(e\in F\) satisfying \(|e|^2>\frac{2k}{3}\) is less than \(A(k,\lfloor \frac{2k}{3}\rfloor +1)\le 2\) by Lemma 3.3, applied with \(m=3\) and Lemma 3.6. Thus, there is at most one very long edge e in F. This very long edge has length at most \(|e| \le \sqrt{k-1}\) by the following argument. Consider the last step of the greedy algorithm, when the last two paths, call them \(P_1\) and \(P_2\), are being joined. Since \(|X|\ge 3\), one of them, say \(P_1\), contains at least two vertices. An endpoint of the path \(P_2\) has distance at most \(\sqrt{k-1}\) to one of the endpoints of \(P_1\), since not both endpoints can be opposite on the cube. Thus, \(|e| \le \sqrt{k-1}\). We get \(S_k(F^{vl})\le \sqrt{k-1}^k\).

Adding up the four contributions to \(S_k(F)\) yields

$$\begin{aligned} S_k(F)&= S_k(F^s)+S_k(F^m)+S_k(F^l)+S_k(F^{vl})\nonumber \\&\le 2^k \left( \sqrt{\frac{k}{5}} \right) ^k + \frac{8k}{5} \cdot \left( 0.842\sqrt{k}\right) ^k +3 \cdot \left( \sqrt{\frac{2}{3}k}\right) ^k+\left( \sqrt{k-1}\right) ^k\nonumber \\&= \left( \sqrt{k}\right) ^k\left( \left( \frac{2}{\sqrt{5}}\right) ^k+\frac{8k}{5}\cdot 0.842^k +3 \cdot \left( \sqrt{\frac{2}{3}}\right) ^k + \left( \sqrt{\frac{k-1}{k}}\right) ^k \right) \nonumber \\&< \left( \sqrt{k}\right) ^k, \end{aligned}$$
(9)

where the last inequality holds for k sufficiently large. We used the fact that \(\left( \sqrt{\frac{k-1}{k}}\right) ^k\) converges to \(e^{-1/2}\). Let H be the Hamiltonian cycle obtained from F by connecting the two endpoints. Then

$$\begin{aligned} S_k(H) \le S_k(F)+ \left( \sqrt{k}\right) ^k \le 2\left( \sqrt{k}\right) ^k. \end{aligned}$$

\(\square \)

We remark that the proof of Theorem 1.4 works for \(k_0=29\). The last inequality in (9) is strict. Thus, Theorem 1.4 is tight only for \(|X|=2\).

4 Hamiltonian Cycles: Asymptotic Upper Bounds

In this section we prove Theorems 1.5, 1.6 and 1.7.

4.1 Preparation

Lemma 4.1

Let \(0<\alpha < 1\) and \(Y\subseteq [0,1]^k\) such that \(|u-v|> \alpha \sqrt{k}\) for every two distinct points \(u,v\in Y\). Let \(m\in \mathbb {N}\). Then,

$$\begin{aligned} |Y|\le 2m \cdot \Big \lceil \sqrt{\frac{1}{2}\left( 1+\frac{1}{2m-1}\right) }\alpha ^{-1}\Big \rceil ^k. \end{aligned}$$

Proof

Let \(\beta =\Big \lceil \sqrt{\frac{1}{2}\left( 1+\frac{1}{2\,m-1}\right) }\alpha ^{-1}\Big \rceil \). Assume that \(|Y|>2m\cdot \beta ^k\). Partition the unit box \([0,1]^k\) into \(\beta ^ k\) boxes \(B_1,B_2,\ldots ,B_{\beta ^k}\) as follows: We split up [0, 1] into \( \beta \) disjoint consecutive intervals of length \(\beta ^{-1}\) each. This gives \(\beta ^k\) boxes in total.

Since \(|Y|> 2m\cdot \beta ^k\), there exists a box \(B_j\) such that at least 2m points from Y are contained in it. By Lemma 3.2, applied with \(\gamma =\delta =\beta ^{-1}, k_1=k\) and \(k_2=0\), there exist \(p,q\in B_j\cap Y\) such that \(|p-q|^2\le \frac{1}{2}\left( 1+\frac{1}{2m-1}\right) \beta ^{-2}k\). We conclude

$$\begin{aligned} \alpha ^2 k< |p-q|^2\le \frac{1}{2}\left( 1+\frac{1}{2m-1}\right) \beta ^{-2}k, \quad \text {implying} \quad \alpha < \sqrt{\frac{1}{2}\left( 1+\frac{1}{2m-1}\right) }\beta ^{-1}. \end{aligned}$$

However, by the choice of \(\beta \), we have \(\alpha < \sqrt{\frac{1}{2}\left( 1+\frac{1}{2\,m-1}\right) }\beta ^{-1}\le \alpha \), a contradiction. \(\square \)

The following lemma is a version of Lemma 4.1 which improves the bound in a certain range of \(\alpha \).

Lemma 4.2

Let \(\sqrt{\frac{100}{1791}}<\alpha < \sqrt{\frac{100}{199}}\) and \(Y\subseteq [0,1]^k\) such that \(|u-v|> \alpha \sqrt{k}\) for every two distinct points \(u,v\in Y\). Then,

$$\begin{aligned} |Y|\le 600 \cdot 3^{\frac{9}{8}\left( 1-\frac{199}{100}\alpha ^2\right) k}. \end{aligned}$$

Proof

Let \(a=\frac{9}{8}(1-\frac{199}{100}\alpha ^2)\). Note that \(0<a <1\). Partition the unit box \([0,1]^k\) into \(3^{\lceil ak\rceil }\) boxes \(B_1,B_2,\ldots ,B_{3^{\lceil ak \rceil }}\) as follows: Let \(I=\{1,2,\ldots , \lceil ak \rceil \}\subseteq [k]\). For the coordinates in I, we split up [0, 1] into 3 disjoint consecutive \([0,1]=[0,\frac{1}{3})\cup [\frac{1}{3},\frac{2}{3}) \cup [\frac{2}{3},1]\) intervals of length \(\frac{1}{3}\) each. If \(|Y|> 200 \cdot 3^{\lceil ak \rceil }\), then there exists a box \(B_j\) such that at least 200 points from Y are contained in it. By Lemma 3.2, applied with \(m=200\), \(\delta =\frac{1}{3}\), \(\gamma =1\), \(k_1=\lceil ak \rceil \) and \(k_2=k-k_1\), there exist \(p,q\in B_j\cap Y\) such that

$$\begin{aligned} \frac{|p-q|^2}{k}\le \frac{100}{199}\left( \frac{1}{3}\right) ^2 \frac{\lceil ak \rceil }{k}+ \frac{100}{199} \frac{k-\lceil ak \rceil }{k} \le \frac{100}{199}-\frac{8}{9}\frac{100}{199}a=\alpha ^2, \end{aligned}$$

contradicting \(\alpha ^2 k< |p-q|^2\). We conclude that

\(\square \)

Lemma 4.3

There exists \(k_0\) such that for all integers \(k\ge k_0\) the following holds. Let \(0< \alpha < 0.99\) and let \(Y\subseteq [0,1]^k\) such that \(|u-v|> \alpha \sqrt{k}\) for every two distinct points \(u,v\in Y\). Then \(|Y|\alpha ^k\le 0.999^k\).

Proof

Let \(k_0\) be sufficiently large for the following proof to hold. First, assume \(\sqrt{\frac{100}{199}}<\alpha <0.99\). Then \(|Y|\le 200\) by Lemma 3.3, applied with \(m=200\). Thus,

$$\begin{aligned} |Y|\alpha ^k \le 200 \alpha ^k \le 0.999^k. \end{aligned}$$

Next, assume \(0.29\le \alpha \le \sqrt{\frac{100}{199}}\). Then by Lemma 4.2,

$$\begin{aligned} |Y|\alpha ^k \le 600 \cdot \left( 3^{\frac{9}{8}\left( 1-\frac{199}{100}\alpha ^2\right) } \alpha \right) ^k \le 0.999^k. \end{aligned}$$

Finally, assume \(0<\alpha \le 0.29\). Then by Lemma 4.1, applied with \(m=100\),

$$\begin{aligned} |Y|\alpha ^k \le 200 \left( \Big \lceil \frac{\sqrt{\frac{1}{2}\left( 1+\frac{1}{199}\right) }}{\alpha }\Big \rceil \alpha \right) ^k \le 200 \left( \sqrt{\frac{100}{199}}+\alpha \right) ^k\le 0.999^k. \end{aligned}$$

\(\square \)

4.2 Proofs of Theorems 1.5,  1.6, and 1.7

First, we quickly demonstrate how Lemma 3.3 implies Theorem 1.7.

Proof of Theorem 1.7

Let \(X\subseteq [0,1]^k\) be a point set of size n. We run Algorithm 1 from Sect. 3.2. Let \(F_i\) be the collection of paths at the i-th step, let \(e_i\) be the edge added in the i-th step, and let \(F=F_{n-1}\) be the final Hamiltonian path.

We claim that \(|e_i| \le \sqrt{\frac{2}{3}k}\) for \(i\le n-2\). Let \(e_i=xy\). The vertices x and y are endpoints of two different paths in \(F_{i-1}\). Since \(F_{i-1}\) has at least \(n-(i-1)\ge n-(n-2-1)=3\) components, there exists a component containing neither x, nor y. Let \(z\in X\) be an endpoint of the path forming this component. Since \(e_i=xy\) was chosen in step i, but xz and yz were not, we have \(|xy| \le |xz|\) and \(|xy| \le |yz|\). By applying Lemma 3.3 to the set \(\{x,y,z\}\), we get that \(|e_i|=|xy| \le \sqrt{\frac{2}{3}k}\). Note that \(|e_{n-1}| \le \sqrt{k}\) trivially.

Now, let \(f=ab\) be the edge where a and b are the two endpoints of the final path F. Set \(H=F+f\) to be the Hamiltonian cycle when f is added to F. Since \(|f| \le \sqrt{k}\) trivially, we get

$$\begin{aligned} S_k(H)&=\sum _{e\in H}|e|^k= |f|^k+|e_{n-1}|^k+ \sum _{i=1}^{n-2} |e_i|^k \le 2\left( \sqrt{k}\right) ^k+ (n-2)\left( \sqrt{\frac{2}{3}k}\right) ^k. \end{aligned}$$

Consequently,

$$\begin{aligned} s_k^{\texttt {HC}}(n)\le s_k(H) \le 2^{1/k}\sqrt{k} \ (1+o_k(1)). \end{aligned}$$

\(\square \)

Proof of Theorem 1.5

Let k be sufficiently large and let \(X\subseteq [0,1]^k\) be a finite point set. Set

$$\begin{aligned} \ell :=\Big \lceil \log _{1+\frac{1}{k}}\left( 0.9 k^{\frac{3}{4}}\right) \Big \rceil = O(k \log k) \quad \quad \text {and} \quad \quad a_i:=\frac{(1+\frac{1}{k})^i}{k^{\frac{3}{4}}} \end{aligned}$$

for integers i, \(0\le i\le \ell \). Note that

$$\begin{aligned}&\frac{a_{i+1}}{a_i}=1+\frac{1}{k}\ \text {for} \ i\in \{0,1,\dots ,\ell -1\}, \quad \text {and}\\&\qquad \qquad \qquad \quad a_0< a_1< a_2< \cdots < a_{\ell -1}\le 0.9 \le a_\ell . \end{aligned}$$

Construct a minimum spanning tree T on vertex set X by successively joining points from X at minimal distance from each other, given the new edge does not create a cycle. For \(0\le i\le \ell \), let \(F_i\) be the forest with vertex set X and edges \(e\in T\) such that \(|e|\le a_i\sqrt{k}\). Then, \(F_0\subseteq F_1 \subseteq \dots \subseteq F_{\ell } \subseteq T\) since the sequence \((a_i)\) is increasing. If \(x,y\in X\) are in different components of \(F_i\), then \(|x-y|> a_i \sqrt{k}\).

We have \(a_0= k^{-3/4}\). For an edge \(e=xy\in F_0\), let \(B_e\) be the open ball of radius |e|/4 and center \(\frac{1}{2}(x+y)\). Since \(F_0\subseteq T\), by Lemma 2.1, the balls \(B_e\), \(e\in F_0\) are disjoint. Also, \(|e|\le a_0 \sqrt{k}=k^{-1/4}\). Denote by \(V_k\) for the volume of the k-dimensional unit ball. It is well-known that

$$\begin{aligned} V_k= {\left\{ \begin{array}{ll} \dfrac{\pi ^{k/2}}{(k/2)!} &{} \mathrm{if \ } k \ \mathrm{is \ even}, \\ \dfrac{2^k \cdot \pi ^{(k-1)/2} \, ((k-1)/2)!}{k!} &{} \mathrm{if \ } k \ \mathrm{is \ odd}. \end{array}\right. } \end{aligned}$$

By Stirling’s approximation, \(V_k \sim \frac{1}{\sqrt{k\pi }} (\frac{2\pi e}{k})^{k/2}\). Since \(\bigcup _{e\in F_0} B_e\subseteq [-k^{-1/4},1+k^{-1/4}]\), we have

$$\begin{aligned}{} & {} \sum _{e\in F_0}\left( \frac{|e|}{4}\right) ^k V_k \le ((1+2k^{-1/4}))^k, \quad \text {and thus} \quad \\{} & {} \sum _{e\in F_0}|e|^k\le \frac{4^k(1+2k^{-1/4})^k}{V_k} \le (0.97\sqrt{k})^k, \end{aligned}$$

for k sufficiently large. Now, let \(i\in \{0,1,\ldots ,\ell -1\}\). Let \(Y \subseteq X\) be a set of vertices containing exactly one vertex from every component of \(F_i\). Then \(|y-y'|> a_i \sqrt{k}\) for every pair \(y\ne y'\in Y\), and \(|F_{i+1}{\setminus } F_i|\le |Y|-1\). By Lemma 4.3 we have \(a_i^k|Y| \le 0.999^k\) for \(i\le \ell \). Thus,

$$\begin{aligned} \sum _{e\in F_{i+1}\setminus F_i} |e|^k\le (a_{i+1}\sqrt{k})^k|Y|= (a_{i}\sqrt{k})^k|Y|\left( 1+\frac{1}{k}\right) ^k \le 3 \cdot (0.999 \sqrt{k})^k, \end{aligned}$$

for \(i\le \ell \). Therefore,

$$\begin{aligned} \sum _{e\in F_\ell } |e|^k&= \sum _{e\in F_0} |e|^k +\sum _{i=0}^{\ell -1} \sum _{e\in F_{i+1}\setminus F_i} |e|^k \le \left( 0.97\sqrt{k}\right) ^k+ 3 \ell \cdot (0.999 \sqrt{k})^k, \end{aligned}$$

for k sufficiently large. If the forest \(F_\ell \) consist of at least three components then three points \(p,q,r\in X\), from different components each, have pairwise distance at least \(0.9\sqrt{k}\ge \sqrt{\frac{2}{3}k}\). This contradicts Lemma 3.3. Therefore, \(F_\ell \) has at most 2 components and thus there is at most one edge f in T which is not in \(F_\ell \). We conclude

$$\begin{aligned} \sum _{e\in T} |e|^k=\sum _{e\in F_\ell } |e|^k+ |f|^k\le \left( \sqrt{k}\right) ^k(1+o_k(k^{-1})), \end{aligned}$$

which implies that for the family of minimum spanning trees, we have \(s_k^{\texttt {ST}} \le \sqrt{k} \ (1+o_k(1))\), completing the proof of Theorem 1.5. \(\square \)

We remark that by applying Lemma 2.3 to T, there exists a Hamiltonian cycle H on vertex set X satisfying

$$\begin{aligned} \sum _{e\in H} |e|^k\le \frac{2}{3}\cdot 3^k \sum _{e\in T} |e|^k \le \left( 3\sqrt{k}\right) ^k(1+o_k(k^{-1})), \end{aligned}$$

implying that for the family of Hamiltonian cycles, we have \(s_k^{\texttt {HC}} \le 3\sqrt{k} \ (1+o_k(1))\).

Proof of Theorem 1.6

Create a forest F by successively joining points from X at minimal distance from each other, given the new edge e does not create a cycle and satisfies \(|e|\le k^{-1/4}\). This process stops when there is no such edge left. Let the trees \(T_1,\ldots , T_N\) be the components of F. Every two vertices from different \(T_i\)’s have pairwise distance at least \(k^{-1/4}\).

For an edge \(e=xy\in F\), let \(B_e\) be the open ball of radius |e|/4 and center \(\frac{1}{2}(x+y)\). By Lemma 2.1, the balls \(B_e\), \(e\in F\) are disjoint. Also, \(|e|\le k^{-\frac{1}{4}}\). We have \(\bigcup _{e\in F} B_e\subseteq [-k^{-1/4},1+k^{-1/4}]\). Writing \(V_k\) for the volume of the k-dimensional unit ball, we have

$$\begin{aligned}{} & {} \sum _{e\in F}\left( \frac{|e|}{4}\right) ^k V_k \le ((1+2k^{-1/4}))^k \quad \text {and thus} \\{} & {} \sum _{e\in F}|e|^k\le \frac{4^k(1+2k^{-1/4})^k}{V_k} \le (0.97\sqrt{k})^k, \end{aligned}$$

for k sufficiently large. Since the trees \(T_1,\ldots , T_N\) decompose the edge set of the forest F, we have

$$\begin{aligned} \sum _{i=1}^N \sum _{e\in T_i} |e|^k= \sum _{e\in F}|e|^k\le (0.97\sqrt{k})^k. \end{aligned}$$
(10)

By Lemma 2.3, for each \(i\in [N]\), there exists a Hamiltonian cycle \(H_i\) on \(V(T_i)\) such that

$$\begin{aligned} \sum _{e\in H_i} |e|^k\le 3^k \sum _{e\in T_i}|e|^k. \end{aligned}$$
(11)

Let \(F_0\) be the collection of paths obtained by taking the union of all \(H_i\), and removing an edge from each cycle. Then, by using (10) and (11), we obtain

$$\begin{aligned} \sum _{e\in F_0} |e|^k \le \sum _{i=1}^N\sum _{e\in H_i} |e|^k\le 3^k \sum _{i=1}^N\sum _{e\in T_i}|e|^k \le (2.91\sqrt{k})^k. \end{aligned}$$
(12)

Now, run Algorithm 1 from Sect. 3.2 initialized with \(F_0\) (instead of the empty graph). Recall that this algorithm adds edges of minimum weight such that in each step we maintain a collection of paths. Denote by Q the final path which is created by this algorithm. Set

$$\begin{aligned} \ell :=\Big \lceil \log _{1+\frac{1}{k}}\left( 0.9 k^{\frac{3}{4}}\right) \Big \rceil = O(k \log k) \quad \quad \text {and} \quad \quad a_i:=\frac{(1+\frac{1}{k})^i}{k^{\frac{3}{4}}} \end{aligned}$$

for integers i, \(0\le i\le \ell \). For \(0\le i\le \ell \), let \(F_i\) be the collection of paths with vertex set X and edges \(e\in Q\) such that \(|e|\le a_i\sqrt{k}\). Then, \(F_0\subseteq F_1 \subseteq \dots \subseteq F_{\ell } \subseteq Q\) since the sequence \((a_i)\) is increasing. If \(x,y\in X\) are in different components of \(F_i\), then \(|x-y|> a_i \sqrt{k}\). Now, let \(i\in \{0,1,\ldots ,\ell -1\}\). Let \(Y \subseteq X\) be a set of vertices containing exactly one endpoint of each path of \(F_i\). Then \(|y-y'|> a_i \sqrt{k}\) for every pair \(y\ne y'\in Y\), and \(|F_{i+1}\setminus F_i|\le |Y|-1\). By Lemma 4.3 we have \(a_i^k|Y| \le 0.999^k\le 1\) for \(i\le \ell \). Thus,

$$\begin{aligned} \sum _{e\in F_{i+1}\setminus F_i} |e|^k\le (a_{i+1}\sqrt{k})^k|Y|= (a_{i}\sqrt{k})^k|Y|\left( 1+\frac{1}{k}\right) ^k \le 3 \cdot \sqrt{k}^k, \end{aligned}$$
(13)

for \(i\le \ell \). Therefore, by combining (12) with (13), we obtain

$$\begin{aligned} \sum _{e\in F_\ell } |e|^k&= \sum _{e\in F_0} |e|^k +\sum _{i=0}^{\ell -1} \sum _{e\in F_{i+1}\setminus F_i} |e|^k \le \left( 2.91\sqrt{k}\right) ^k+ 3 \ell \cdot \sqrt{k}^k, \end{aligned}$$
(14)

for k sufficiently large. Similarly, as in the proof of Theorem 1.5, \(F_\ell \) has at most 2 components. Thus, using (14), the path Q satisfies

$$\begin{aligned} \sum _{e\in Q}|e|^k \le \sum _{e\in F_\ell }|e|^k +\sqrt{k}^k \le (2.91\sqrt{k})^k (1+o_k(1)). \end{aligned}$$

Adding one final edge f of weight at most \(|f| \le \sqrt{k}\) to Q we obtain a Hamiltonian cycle with the desired properties. \(\square \)

5 Concluding Remarks

The upper bounds we obtained on the lengths of Hamiltonian cycles have the following implications for the existence of perfect matchings whose cost is bounded from above by a constant (depending on k). For example, Theorems 1.3 and 1.4 have the following implications. The proofs of Corollary 5.1 and that of Corollary 5.2 are analogous to the proof of Corollary 5.4 below.

Corollary 5.1

Given n points in \([0,1]^k\), where \(k \ge 3\), and n is even, there exists a perfect matching M of the n points such that \(\left( \sum _{e \in M} |e|^k \right) ^{1/k} \le 3 \sqrt{5} \left( \frac{1}{3} \right) ^{1/k} \cdot \sqrt{k}\). The matching M can be computed in time proportional to that needed for computing a MST of the points, in particular, in subquadratic time.

Corollary 5.2

There exists an integer \(k_0\) such that for all \(k\ge k_0\) the following holds. If X is any even-size subset of vertices of \(\{0,1\}^k\), then there exists a perfect matching M of X such that \(s_k(M) \le \sqrt{k}\). This bound is best possible.

Recall that a MST of n points in \(\mathbb {R}^k\) (with respect to Euclidean distances) can be computed in \(O\left( n^{2 - \frac{2}{\lceil k/2 \rceil +1} + \varepsilon } \right) \) time, for any \(\varepsilon >0\) [1]. We also deduce the following related results (formulated here for the planar case, \(k=2\).)

Corollary 5.3

Let \(x_1,\ldots ,x_n\) be \(n \ge 2\) points in the unit square. Let \(d_i\) be the distance between \(x_i\) and its nearest point (other than \(x_i\)). Then the following inequality holds: \(\sum _{i=1}^n d_i^2 \le 4\).

Proof

Consider a Hamiltonian cycle, say \(x_1,\ldots ,x_n\), whose cost \(S_2(H)\) is at most 4. The distance from \(x_i\) to its nearest point is at most \(|x_i - x_{i+1}|\), for \(i=1,\ldots ,n\). By squaring the n inequalities and adding them up, the claimed inequality follows. \(\square \)

An alternative proof of Corollary 5.3 can be found in [27, Problem G.27].

Corollary 5.4

Let \(x_1,\ldots ,x_n\) be \(n \ge 2\) points in the unit square, where n is even. Then there exists a perfect matching M such that \(\sum _{e \in M} |e|^2 \le 2\). This bound is the best possible.

Proof

Consider a Hamiltonian cycle, say \(H=x_1,\ldots ,x_n\), whose cost \(S_2(H)\) is at most 4. H can be decomposed into two perfect matchings, one of which has a cost at most 2, as required.

The lower bounds for \(n=2\) and \(n=4\) are immediate (see Fig. 1). For every even \(n \ge 6\) and \(\varepsilon >0\), there are n points (in the neighborhoods of the four corners of the square) such that \(\sum _{e \in M} |e|^2 \ge 2-\varepsilon \). \(\square \)

We have improved the upper bound of Bollobás and Meir [7] by more than 25 percent in the exact formulation and by more than 67 percent in the asymptotic formulation. Apart from some doubt concerning the values of \(s_3^{\texttt {HC}}(n)\) and \(s_4^{\texttt {HC}}(n)\), we think that their lower bound gives the right answer for every higher dimension. In view of Theorem 1.3 (i) we adjust their conjecture as follows:

Conjecture 5.5

For Hamiltonian cycles, the following equalities hold:

$$\begin{aligned} s_k^{\texttt {HC}}= {\left\{ \begin{array}{ll} 2^{7/6}, &{} \text { for } k =3, \\ 2^{1/k} \cdot \sqrt{k}, &{} \text { for } k \ge 4. \end{array}\right. } \end{aligned}$$

Hamiltonian path. If one was looking for a Hamiltonian path, instead of a Hamiltonian cycle, then the 2-point extremal lower bound example (given by a cube diagonal) loses a factor of 2 (or with scaling \(2^{1/k}\)); and so the question arises: is it still the best example, or maybe only for large k? Analogous to the situation for Hamiltonian cycles, we think that there is a threshold value for k after which the extremal examples stabilizes at the 2-point example. The threshold values for cycles and paths seem to differ, see Conjecture 5.6 below.

The current upper bound proofs essentially remain the same as for Hamiltonian cycles, with the change that the last edge is not needed. Some upper bounds remain unchanged, and others do improve. In particular, \(s_2^{\texttt {HP}} \le s_2^{\texttt {HC}} = 2\) remains unchanged, whereas \(s_2^{\texttt {HP}} \ge \sqrt{3}\) is implied by the two extremal examples in Fig. 1 (left and right).

From the other direction, for small values of k consider once again a binary code of length k with minimum distance 2 given by the set of all \(x \in \{0,1\}^k\) with an even number of 1’s. It yields the values specified below.

Conjecture 5.6

For Hamiltonian paths, the following equalities hold:

$$\begin{aligned} s_k^{\texttt {HP}}= {\left\{ \begin{array}{ll} \sqrt{3}, &{} \text { for } k =2, \\ \left( 2^{k-1} -1 \right) ^{1/k} \cdot \sqrt{2}, &{} \text { for } k = 3,4,5,6, \\ \sqrt{k}, &{} \text { for } k \ge 7. \end{array}\right. } \end{aligned}$$

Further improvement. One might wonder where the next possible improvement is? We feel that it is in Lemma 2.2: It states that there is a Hamiltonian cycle such that each edge of the cycle is using at most 3 tree edges, yet the average usage is slightly less than 2. If it was true that every tree edge is used at most twice, then we would get a 2/3 factor improvement in the upper bound. However, the example of a tree with edges abbccddecffg shows that this is not the case. Still, it is likely that there is a way to gain more in a tree to cycle or path conversion.

A different version. We conclude with yet another version of the problem. Instead of the unit cube \([0,1]^k \subset \mathbb {R}^k\), let the diameter of the point set be at most 1: That is, \(\textrm{diam}(X) \le 1\), where \(X \subset \mathbb {R}^k\) and \(|X|=n\). What are the extremal values of the (say, unscaled) costs of a shortest Hamiltonian cycle (and path) for n points in \(\mathbb {R}^k\) under this constraint? Are they given by the vertices of a unit simplex in \(\mathbb {R}^k\) (\(k+1\) and k, respectively)?