1 Introduction

The Branch-and-Cut (B &C) algorithms are powerful tools to solve NP-hard problems. A core component of these algorithms is the solution of several Linear Programming (LP) relaxations that appear while searching for an integral optimal solution. Given an instance of an ILP model, its integrality gap for one linear relaxation is the ratio between the optimal solution of the integer problem and the corresponding LP relaxation. In practice, this is a measure of how much the relaxation is good to approximate the original problem. As a rule of thumb, given an Integer Linear Programming (ILP) model for an NP-hard problem, the larger is the integrality gap of the initial LP relaxation, the longer is the runtime for a B &C algorithm to prove that the best solution found is optimal. In practice, a large integrality gap at the root node very often implies that a great number of branch-and-bound nodes must be visited before proving the optimality of an integral solution. For this reason, a significant effort in designing efficient B &C algorithms is spent looking for tight LP relaxations [26].

Let us consider, for instance, the Travelling Salesman Problem (TSP): Given a set of n nodes \(V=\{1,\dots ,n\}\), a square matrix \({\varvec{C}} \in \mathbb {R}^{n \times n}_+\) whose entries \(c_{ij}\) represent the cost of going from node i to node j, we have to find the cyclic permutation \(\pi \) of V such that the total cost \(\sum _{i \in V} c_{i,\pi _i}\) is minimum. If the matrix \({\varvec{C}}\) is (i) symmetric, that is, \(c_{ij}=c_{ji}\) for all \(i,j \in V\), (ii) satisfies the triangle inequalities \(c_{ij} + c_{jk} \ge c_{ik}, \forall i,j,k \in V\), and (iii) \(c_{ij}=0\) if and only if \(i=j\), then the matrix \({\varvec{C}}\) defines a metric on V and gives a Metric TSP instance. Indeed, there exist several ILP models to solve the TSP [2, 9, 25, 27]. However, the most successful exact method is the branch-and-cut algorithm introduced by Padberg and Rinaldi [29], which includes as a core component the solution of an LP relaxation based on the subtour elimination constraints [28]. This LP relaxation is known in the literature as the Subtour Elimination Problem (SEP). Nowadays, the state-of-the-art exact software for the symmetric TSP is Concorde [1], a solver which implements a sophisticated B &C algorithm and which includes the separation of several facet-defining inequalities [2]. Concorde has a track record in solving the instances from the TSPLIB [31], a wide set of TSP benchmark instances. Note that most of the TSPLIB instances have a small integrality gap for the SEP. The largest integrality gap is equal to 1.095, achieved by the ts225 instance, which was specifically designed to foil TSP software. Except for that one instance, Johnson and McGeoch [20] are not aware of any testbed instance with a percentage integrality gap exceeding 1.03.

While studying theoretically the performance of the Held-Karp lower bound for the TSP, which is equivalent to the optimal lower bound of the SEP, Wolsey proved in [36] that the integrality gap for SEP is at most \(\frac{3}{2}\). Later, in his master thesis, Williamson conjectured that for metric TSP the integrality gap of subtour elimination relaxation is equal to \(\frac{4}{3}\) [35]. So far, this conjecture is only proved for a very special class of instances [6]. For cubic graphs, we know that \(\frac{4}{3}\) is an upper bound for the integrality gap [8]. In [3, 7], Boyd and Benoit have computed the exact integrality gap for every \(n \le 12\), and, thus, they have verified by exhaustive enumeration the conjecture for all Metric TSP instances with at most 12 nodes. In addition, Benoit and Boyd [3] have introduced a new class of graphs having an integrality gap that asymptotically converges to \(\frac{4}{3}\) as n tends to infinity. To the best of our knowledge, nobody has been able to find, for a given number of nodes, an instance with a higher integrality gap than the one they proposed. Curiously, we have computationally verified that those instances are not challenging for Concorde. More recently, new families of instances with an integrality gap that asymptotically converges to \(\frac{4}{3}\) were introduced in [19] by exploiting Tetrahedron configurations, and in [18, 38] by exploiting three paths configurations. All those families of instances are solvable in polynomial time: the instances of [18] are convex-hull-and-line TSP instances, and they are solvable in polynomial time with the algorithm proposed by Deineko et al. [10]. The instances of [19] can be solved using the polynomial time algorithm introduced in [33]. The instances of [38] can be solved in linear time, as shown by the same author. Nevertheless, the TSP instances introduced in [38] are remarkable: in practice, they are extremely challenging for Concorde.

In this paper, we introduce a new challenging integer programming problem, herein called the Integer Heuristic-OPT (IH-OPT) problem, whose optimal solution provides a Metric TSP instance with a large integrality gap. Our approach is purely computational, and we do not restrict to the generation of Euclidean or Rectilinear TSP instances as in [18, 19, 38], but we still require that the generated instance are metric. The decision variables of our problem are the entries of the symmetric cost matrix \({\varvec{C}}\), while the constraints are defined by the intersection of the metric cone [21] with an exponential number of inequalities, one for each possible permutation of V, that is, one for each possible solution of the TSP. For example, starting from an instance \(C_0\) of the TSPLIB, taking an optimal solution of the LP relaxation of the corresponding subtour elimination model, and by solving the IH-OPT problem, we can generate a new TSP instance \(C^*\) having an integrality gap larger than or equal to the integrality gap of the original instance \(C_0\) (e.g., see Table 2). In addition, by integrating the solution of the IH-OPT problem into a sampling procedure, we can generate several TSP instances that have a large integrality gap and are challenging in terms of runtime for Concorde (e.g., see Table 5). As a by-product of our work, we introduce the Hard-TSPLIB, a collection of 41 small TSP instances (i.e., with \(n \le 76\)) which are very challenging for Concorde in terms of runtime and number of branch-and-bound nodes.

The outline of this paper is as follows. In Sect. 2, we review the background material, and we fix the notation. Section 3 formally introduces the IH-OPT problem and discusses how it is related to previous works. Section 4 presents the sampling procedure that we use to generate TSP instances which are hard for Concorde in terms of runtime. In Sect. 5, we present our extensive computational results and describe how we generated the instances that we have included in the Hard-TSPLIB. Finally, in Sect. 6, we conclude with a discussion on future works.

2 Background material

In this section, we review the main formulation for the symmetric TSP and we introduce the Subtour Elimination Problem polytope. We formally define the integrality gap for the SEP, presenting the work of [3], which is the foundation of our work.

An instance of the symmetric TSP can be completely defined by the symmetric matrix \({\varvec{C}}\). Otherwise, we can define a TSP instance using a complete undirected graph \(K_n = (V, E)\) along with a cost vector \({\varvec{c}} \in \mathbb {R}_+^{\vert E \vert }\), which is given by the upper triangular matrix of \({\varvec{C}}\) (without the diagonal). While in this paper we focus on the general Metric TSP, and we denote it only with TSP, as defined in Sect. 1, two special cases that are relevant to compare our work with the literature:

  1. 1.

    The Euclidean TSP, where the input is a collection of n points in \(\mathbb {R}^d\), whose reciprocal distances are computed using the Euclidean norm. These are the type of TSP instances used in [18, 19], with \(d = 2\).

  2. 2.

    The Rectilinear TSP, where the input is again a collection of n points in \(\mathbb {R}^d\), but where the Manhattan norm (or city block or \(L^1\) norm) is used to compute the distances. These are the type of TSP instances used in [38], with \(d = 3\).

In both cases, using the collection of points given as input, and the corresponding distance function, it is possible to define a complete graph \(K_n\) with the set of nodes \(V=\{1,\dots , n\}\) and the set of edges \(E = V \times V\), with the corresponding cost vector \({\varvec{c}} \in \mathbb {R}_+^{\vert E \vert }\). In the following, we use \(\delta (S)\) with \(S \subset V\), to denote the set of weighted edges \(e=\{v,w\}, \ v \ne w\) with either \(v\in S\), \(w\not \in S\) or either \(w\in S\), \(v\not \in S\). We use later the two following collection of subsets of vertices: \({\mathscr {S}}:= \{ S \mid S \subset V, 3 \le |S| \le n - 3 \}\) and \({\mathscr {S}}_{ij}:= \{ S \mid S \in {\mathscr {S}}, \{i, j\} \in \delta (S) \}\).

Given a connected graph \(G=(V,E)\) and the cost vector \({\varvec{c}} \in \mathbb {R}_+^{\vert E \vert }\), the Travelling Salesman Problem, originally proposed in [9], is formulated as follows:

(1)
(2)
(3)
(4)
(5)

The decision variable \(x_e\) is equal to 1 if the edge \(e=\{v,w\}\) is part of an optimal tour. The objective function (1) minimizes the overall tour length. Constraints (2) state that each node v must have two incident edges. The Subtour Elimination Constraints (3) force the cut set \(\delta (S)\) of every proper subset S of V to contain at least two edges.

If we relax the integrality constraints (5), we can define the Subtour Elimination Problem (SEP), that, given a cost vector \({\varvec{c}}\), provides a lower bound of the optimal solution. In the next sections, we will denote as polytope of the SEP the set

$$\begin{aligned} P_{SEP}:= \{ x \in \mathbb {R}^{|E|}_+ \mid (2),(3),(4)\}. \end{aligned}$$
(6)

If we optimize the objective function (1) over \(P_{SEP}\), we get exactly the Subtour Elimination Problem.

Let us consider the complete graph \(K_n\). Let us denote by \(TOUR({\varvec{c}})\) and \(SUBT({\varvec{c}})\), respectively the optimal value of the TSP and SEP instance defined by the cost vector \({\varvec{c}}\). Similarly to [3], we denote by \(\alpha _n\) the integrality gap of SEP for \(K_n\), that is, the largest possible ratio between \(TOUR({\varvec{c}})\) and \(SUBT({\varvec{c}})\):

$$\begin{aligned} \alpha _n = \max _{\begin{array}{c} {\varvec{c}}\ge 0 \text { is metric} \end{array}} \dfrac{TOUR({\varvec{c}})}{SUBT({\varvec{c}})}. \end{aligned}$$
(7)

For \(n \le 5\), we have that \(\alpha _n=1\). The exact value for \(6 \le n \le 12\) was computed in [3, 7]. Williamson’s conjecture states that \(\alpha _n \le \frac{4}{3}\) for any n [35]. Let’s call \(\alpha _n({\varvec{c}})\), the integrality gap of a specific TSP instance on n nodes with cost vector \({\varvec{c}}\), that is

$$\begin{aligned} \alpha _n({\varvec{c}}) = \dfrac{TOUR({\varvec{c}})}{SUBT({\varvec{c}})} \end{aligned}$$

We can thus write

$$\begin{aligned} \alpha _n = \max _{\begin{array}{c} {\varvec{c}}\ge 0 \text { is metric} \end{array}} \alpha _n({\varvec{c}}) \end{aligned}$$

The families of instances introduced in [18, 19, 38] have all the properties that \(\alpha _n({\varvec{c}}) < \frac{4}{3}\) and \(\lim _{n \rightarrow \infty } \alpha _n({\varvec{c}}) = \frac{4}{3}\). However, the largest integrality gap for the instances in the TSPLIB is equal to 1.095 [31], which is largely less than \(\frac{4}{3} = 1.\overline{333}\).

Following the idea presented in [3], we can divide the cost vector \({\varvec{c}}\) by the optimum tour value \(TOUR({\varvec{c}})\), obtaining a new cost vector \({\varvec{c}}'\) that still satisfies the triangle inequalities, and which leaves \(\alpha _n({\varvec{c}})\) unchanged. Hence, we can restrict ourselves to metric cost vectors \({\varvec{c}}\) such that \(TOUR({\varvec{c}}) = 1\), and we can transform the maximization problem into a minimization:

$$\begin{aligned} \dfrac{1}{\alpha _n} = \min _{\begin{array}{c} {\varvec{c}}\ge 0\text { is metric,}\\ TOUR({\varvec{c}}) = 1 \end{array}} SUBT({\varvec{c}}). \end{aligned}$$
(8)

Problem (8) can be formulated as a mixed integer quadratic problem, where the decision variables are both the cost vector \({\varvec{c}}\) and the incidence vector \({\varvec{x}}\) of vertices of \(P_{SEP}\). Unfortunately, despite the recent improvements in the implementation of commercial optimization solvers, the quadratic model is intractable even for small values of n. In [3], the authors propose a clever idea for bypassing the quadratic model using the vertex representation of \(P_{SEP}\). That is, they represent \(P_{SEP}\) as the convex combinations of its vertices \(\{{\varvec{{\bar{x}}}}^{(1)},\ldots , {\varvec{\bar{x}}}^{(t)}\}\), which are finitely many. Then, for each vertex \({\varvec{{\bar{x}}}}^{(h)}\) of \(P_{SEP}\), they define an LP problem, called \(\text{ OPT }(\bar{{\varvec{x}}}^{(h)})\), having an exponential number of variables and constraints. Theoretical results in [3, 7] guarantee that only a subset of vertices is necessary to compute the integrality gap. By solving the \(\text{ OPT }(\bar{{\varvec{x}}}^{(h)})\) subproblem on each vertex of the previously mentioned subset, they were able to compute \(\alpha _n\) for \(n\le 10\) in [3], and for \(n \le 12\) in [7].

Given the complete graph on n nodes \(K_n = (V,E)\), we introduce one vector \({\varvec{{\bar{z}}}} \in \mathbb {R}^{|E|}\) for each \(\pi \) permutation of nodes, such that

$$\begin{aligned} \bar{z}_{ij} = {\left\{ \begin{array}{ll} 1 &{} \text{ if } \pi (i) = j \text{ or } \pi (j) = i, \\ 0 &{} \text{ otherwise }. \end{array}\right. } \end{aligned}$$

Let \({\mathscr {T}}_n\) be the collection of the incidence vectors \({\varvec{{\bar{z}}}} \in \mathbb {R}^{|E|}\) of all the possible tours of \(K_n\). Given a vertex \({\varvec{{\bar{x}}}}^{(h)}\) of \(P_{SEP}\), the OPT\((\bar{{\varvec{x}}}^{(h)})\) problem is defined in [3] as

(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)

Constraints (10) ensure that the optimal solution \({\varvec{c}}^*\) of \(\text{ OPT }(\bar{x}^{(h)})\) for every \({\varvec{{\bar{x}}}}^{(h)}\) is such that \(TSP({\varvec{c}}^*) = 1\), as discussed in [3]. Herein, we call the inequalities (10) the TSP constraints. Constraints (11) and (12) ensure that the cost vectors represent a semi-metric. Constraints (13)–(15) are the dual constraints associated to the dual problem of (1)–(4). Constraints (16)–(18) ensure that the vertex \({\varvec{{\bar{x}}}}^{(h)}\) remains the optimal solution of the SEP. If we denote by \({\mathcal {C}}^*\) the set of the optimal solutions of \(\text{ OPT }(\bar{{\varvec{x}}}^{(h)})\), the dual slackness constraints are introduced to guarantee

$$\begin{aligned} \arg \min \text{ SEP }({\varvec{c}}^*) = {\varvec{x}}^{(h)}, \quad \forall \ {\varvec{c}}^* \in {\mathcal {C}}^*. \end{aligned}$$
(19)

Note that, in [3, 7], the authors observed that, for \(6 \le n \le 12\), \(\vert {\mathcal {C}}^*\vert = 1\).

Hence, given the list of the vertices \(\{{\varvec{{\bar{x}}}}^{(1)},\ldots , {\varvec{{\bar{x}}}}^{(t)}\}\) of \(P_{SEP}\), the integrality gap \(\alpha _n\) of \(K_n\) is computed by solving

$$\begin{aligned} \frac{1}{\alpha _n} = \min _{h = 1, \ldots , t} \text{ OPT }(\bar{{\varvec{x}}}^{(h)}) \quad \quad \Rightarrow \quad \quad \alpha _n = \max _{h = 1, \ldots , t} \frac{1}{\text{ OPT }(\bar{{\varvec{x}}}^{(h)})}. \end{aligned}$$
(20)

Part of the original contribution presented in [3] is to prove that we need to consider only a subset of the t vertices of \(P_{SEP}\), namely, such vertices which support graphs satisfy certain properties. We refer the reader to [3] for the details.

In the following section, we modify the single problem \(\text{ OPT }(\bar{{\varvec{x}}}^{(h)})\) in order to introduce a new NP-hard problem that we use to generate metric TSP instances with a large integrality gap.

3 The integer heuristic-OPT problem

The goal of our work is to devise an efficient computational procedure to generate instances with a large integrality gap for the Metric TSP. We are not interested in computing the exact value \(\alpha _n\) for a fixed n as in [3], but we focus on finding a heuristic solution to problem (20).

Notice that problem (9)–(15) has an exponential number of variables due to the dual variables \(d_S\). Furthermore, the number of constraints (10) is equal to the number of tours, that is \(\frac{(n-1)!}{2}\), and the number of triangular inequalities constraints (11) is \(O(n^3)\). In practice, the exact solution of \(\text{ OPT }(\bar{{\varvec{x}}}^{(h)})\) is intractable even for small values of n. Note that the computation of \(\alpha _n\) for \(n=12\) required around 24 days [7]. In order to solve problem (20) heuristically, we introduce a new problem, called Heuristic-OPT (H-OPT), which is related to problem \(\text{ OPT }(\bar{{\varvec{x}}}^{(h)})\), but can be solved for larger values of n. The two key ideas for introducing the new problem are:

  1. (i)

    To generate a TSP instance with a sufficiently large integrality gap it is unnecessary to enumerate all vertices \(\{{\varvec{{\bar{x}}}}^{(1)},\ldots , {\varvec{{\bar{x}}}}^{(t)}\}\) of \(P_{SEP}\). We can sample a subset of vertices and take the instance providing the largest integrality gap.

  2. (ii)

    Since we do not perform exhaustive vertex enumeration, it is unnecessary to impose the complementary slackness constraints (13)-(15) to force that a given vertex \({\varvec{{\bar{x}}}}^{(h)}\) remains the same vertex of \(P_{SEP}\). Hence, we can remove all the variables and constraints related to the slackness conditions.

For these two reasons, given a vertex \({\varvec{{\bar{x}}}}^{(h)} \in P_{SEP}\), we define the following LP problem:

(21)
(22)
(23)
(24)

In practice, we have relaxed the problem \(\text{ OPT }(\bar{{\varvec{x}}}^{(h)})\) by removing constraints (13)–(15). Notice that in H-OPT we have only |E| cost variables, \(\frac{(n-1)!}{2}\) TSP constraints (22) (one for each tour), and \(O(n^3)\) triangles inequalities (23) that define the metric cone [21]. We can solve this LP problem by cutting planes, by separating both families of constraints.

A critical point in the solution of the H-OPT problem by branch-and-cut is the separation of (maximally violated) TSP constraints (22). The separation problem SP is defined as follows. Given a cost vector \({\varvec{{\bar{c}}}} \in \mathbb {R}^{|E|}_+\), we look for a tour whose incidence vector \({\varvec{\bar{z}}}\) verify the following:

$$\begin{aligned} \sum _{\{i,j\} \in E} {\bar{z}}_{ij} \bar{c}_{ij} < 1. \end{aligned}$$
(25)

Notice that any tour that satisfies the previous relation gives a violated TSP constraint. However, to prove that no tour violates the TSP constraint, we need to verify the following:

$$\begin{aligned} \min _{{\varvec{z}} \in {\mathscr {T}}_n} \left\{ \sum _{\{i,j\} \in E} z_{ij} \bar{c}_{ij} \right\} \ge 1. \end{aligned}$$
(26)

Thus, to add a new TSP constraint, we solve a TSP instance for a specific cost vector \({\varvec{{\bar{c}}}}\). In our implementation, we separate TSP constraints by first solving the TSP instance given by \({\varvec{c}}^*\) using the LK-H heuristic [16, 17], and whenever the heuristic fails to find a violated tour, we solve (26) by embedding Concorde in the code. Section 5.5 describes the details of our implementation. Clearly, the TSP constraints make this problem challenging, as stated in the following lemma.

Lemma 1

The H-OPT problem is NP-hard.

First of all, we prove the following Lemma.

Lemma 2

Let \(n=|V|\). If H-OPT can be solved in \(n^{O(1)}\) time, then the separation problem SP can be solved in \(n^{O(1)}\) time.

Proof

(Sketch) Let P be the polyhedron defined by Eqs. (22)–(24). The proof is implicit in the proof of a well-known theorem of Grötschel et al. [14] which states that for well-described polyhedron P the optimization problem can be solved in polynomial time if and only if the separation problem SP can be solved in polynomial time. Note that the requirement of well-described polyhedral P is only used for obtaining a time complexity polynomial in the input size of the problem. For example, if the length L of the input needed to describe the polyhedral P does not satisfy \(n\le L\), then even if the separation could be resolved in time \(n^{O(1)}\) we could not claim that the separation problem SP can be resolved in polynomial time.

To see how the proof of this lemma is implicit in [14], it is sufficient to note that in [14] it is not asked to provide the input size as tight as possible. Actually, it is possible to make the encoding dimension L of a polyhedron P greater than n by adding, for example, a string with n zeros. This ensures \(n\le L\), and the claim follows. \(\square \)

Proof

(Lemma 1) The claim follows by showing that the Hamiltonian Cycle problem (HC) can be solved in polynomial time by an oracle machine with an oracle for H-OPT. More precisely, we reduce in polynomial time HC to the separation problem SP of H-OPT. Then, by Lemma 2 an oracle machine for H-OPT that takes \(n^{O(1)}\) time implies that the separation problem SP (and therefore HC) can be solved in \(n^{O(1)}\) time.

Consider the following sets:

$$\begin{aligned} M= & {} \{ {\varvec{c}} \in {\textbf{R}}^m \; \vert \; c_e > 0, \forall e\in [m] \, c_{ij} \le c_{ik} + c_{jk} \; \forall i,j,k \in [n]\}, \end{aligned}$$
(27)
$$\begin{aligned} TSP= & {} \{ {\varvec{c}} \in {\textbf{R}}^m \; \vert , {\varvec{c}}^T{\varvec{x}} \ge 1 \; \forall {\varvec{x}} \in {\mathscr {T}} \}, \end{aligned}$$
(28)
$$\begin{aligned} P= & {} M \cap TSP. \end{aligned}$$
(29)

Let \(G=(V,E)\) be an undirected graph on n nodes, that is \(\vert V \vert = n\). This graph defines an instance of the HC problem, namely the decision problem that searches for a cycle in a graph. We can perform a standard polynomial time reduction to get a Metric TSP and then normalize costs as follows:

$$\begin{aligned} {\hat{c}}_e = {\left\{ \begin{array}{ll} (1 - \frac{\epsilon }{2})/n &{} e \in E, \\ (2-\epsilon )/n &{} \text{ otherwise, } \end{array}\right. } \end{aligned}$$
(30)

where \(1> \beta (n)> \epsilon > 0\), and the definition of \(\beta (n)\) will be clarified later in the proof of this theorem.

Note that this instance of the TSP is metric: identity and symmetry are obvious, and the triangle inequalities can be verified case-by-case. Note also that, the graph contains a Hamiltonian cycle if and only of the optimal solution of the TSP is \(1-\frac{\epsilon }{2} < 1\), and thus such cost vector is in \(M {\setminus } TSP\). On the opposite side, if the graph does not admit Hamiltonian cycle, then the TSP solution must contain at least one edge of length \((2-\epsilon )/n\). Then, the value of the optimal tour would be at least

$$\begin{aligned} \dfrac{2 - \epsilon }{n} + \dfrac{(n-1)(1-\frac{\epsilon }{2})}{n} = \frac{1}{n} + 1 - \frac{\epsilon }{n}\left( \frac{1}{2} + \frac{n}{2} \right) . \end{aligned}$$
(31)

We observe that if \(\varepsilon < \dfrac{2}{1+n}\), then (31) is greater than 1. Thus, we can set \(\beta (n) = \dfrac{2}{(n+1)}\). Note that from now on we have proved that a graph \(G = (V, E)\) on n nodes admits a Hamiltonian cycle if and only if the solution of the associated TSP with cost vector \(\hat{{\varvec{c}}}\) is less than 1. By putting everything together, we see that the Hamiltonian Cycle problem (HC) can be solved in polynomial time by an oracle machine with an oracle for H-OPT as follows:

  1. 1.

    Let \(G= (V, E)\) be an undirected graph on n nodes.

  2. 2.

    Use Eq. (30) to obtain a TSP instance with cost vector \(\hat{{\varvec{c}}}\).

  3. 3.

    By Lemma 2, an oracle machine for H-OPT that takes \(n^{O(1)}\) time implies that we can decide whether \(\hat{{\varvec{c}}}\in M\) in \(n^{O(1)}\) time. If \(\hat{{\varvec{c}}}\in M\) then G does not admit a Hamiltonian cycle. Otherwise, it admits a Hamiltonian cycle.

\(\square \)

We remark that our proof only yields to NP-hardness under Turing reductions, as we have shown that there is an NP-complete decision problem, namely HC, that can be Turing-reduced to H-OPT. However, this also implies “hardness” for our problem, as an existence of a polynomial time algorithm for H-OPT would imply \(P=NP\).

Note also that since we have removed from our problem the dual slackness constraints, the relation (19) is not necessarily satisfied when \({\varvec{c}}^*\) is the optimal solution of H-OPT\((\bar{{\varvec{x}}}^{(h)})\). However, we can prove the following lemma which states that the solution of the H-OPT\((\bar{{\varvec{x}}}^{(h)})\) problem provides a cost vector \({\varvec{c}}^*\) corresponding to a TSP instance with an integrality gap greater than or equal to any instance yielding \(\bar{{\varvec{x}}}^{(h)}\).

Lemma 3

Let us consider a TSP instance \({\varvec{c}}_0\) such that \(TOUR({\varvec{c}}_0) = 1\), and let \({\varvec{\bar{x}}}^{(0)}\) be an optimal solution of \(SEP({\varvec{c}}_0)\). We define a second TSP instance by the cost vector

$$\begin{aligned} {\varvec{c}}_1 = \arg \min \text{ H-OPT }({\varvec{\bar{x}}}^{(0)}). \end{aligned}$$

Then, the following relation holds

$$\begin{aligned} \frac{TOUR({\varvec{c}}_1)}{SUBT({\varvec{c}}_1)} \ge \frac{TOUR({\varvec{c}}_0)}{SUBT({\varvec{c}}_0)}. \end{aligned}$$
(32)

Proof

Since \({\varvec{\bar{x}}}^{(0)}\) is a feasible solution of the SEP, we have that

$$\begin{aligned} SUBT({\varvec{c}}_1) \le {\varvec{c}}_1^T {\varvec{\bar{x}}}^{(0)} \end{aligned}$$

By definition of H-OPT\(({\varvec{\bar{x}}}^{(h)})\), we have \({\varvec{c}}_1\) is the cost vector that realizes the minimum of \({\varvec{c}}^T {\varvec{\bar{x}}}^{(h)}\) among all the metric vectors such that \(TOUR({\varvec{c}})=1\). By hypothesis, \(TOUR({\varvec{c}}_0) = 1\) and this implies, \({\varvec{c}}_0 {\varvec{z}} \ge 1\) for all \({\varvec{z}}\) 0-1 incidence vector. Thus, \({\varvec{c}}_0\) is feasible for H-OPT\(({\varvec{\bar{x}}}^{(0)})\) and it holds

$$\begin{aligned} {\varvec{c}}_1^T {\varvec{\bar{x}}}^{(0)} \le {\varvec{c}}_0^T {\varvec{\bar{x}}}^{(0)} = SUBT({\varvec{c}}_0), \end{aligned}$$

where the last equation holds by definition. Thus,

$$\begin{aligned} \dfrac{TOUR({\varvec{c}}_0)}{SUBT({\varvec{c}}_0)} = \dfrac{1}{SUBT({\varvec{c}}_0)} \le \dfrac{1}{{\varvec{c}}_1^T {\varvec{\bar{x}}}^{(0)}} \le \dfrac{1}{SUBT({\varvec{c}}_1)} = \dfrac{TOUR({\varvec{c}}_1)}{SUBT({\varvec{c}}_1)}. \end{aligned}$$

\(\square \)

From large integrality gaps to hard instances. The TSP solver Concorde [2] and the LK-H heuristic that we use for separating TSP constraints [16] can handle only integer costs. However, the H-OPT problem generates fractional cost vectors. Hence, we need to devise a method to transform the fractional costs into integer values, by, for example, multiplying for a large constant \(\tau \) and then rounding to the nearest integer, that is, \(c_{ij} = \text{ round }(\tau c^*_{ij})\). Using this cost transformation, we cannot guarantee that the integrality gap for \(c_{ij}\) remains the same of \(c^*_{ij}\).

The TSP constraints (22) have a right-hand side equal to 1 because we have divided the cost vector \({\varvec{c}}\) by the minimum tour length \(TOUR({\varvec{c}})\). If we divide the cost vector by the quantity \(\left( \frac{TOUR({\varvec{c}})}{\Delta } \right) \), where \(\Delta >0\) is a large positive constant, we have to change the right hand side of (22) to \(\Delta \), while still getting an equivalent LP problem. If \(\Delta \) is large enough (see Sect. 5.5), we can also add the integrality constraint on the variable \(c_{ij}\). In practice, in order to generate integer cost vectors, we have introduced the following ILP problem:

(33)
(34)
(35)
(36)

Clearly, IH-OPT is very challenging, as it is an integer program with as many constraints as TSP tours plus the number of triangle inequalities. We have computational evidence that the integrality gap of the integer problem could be smaller than those obtained by solving H-OPT. Surprisingly, our computational results show that the TSP instances obtained while solving the IH-OPT problem are very challenging for Concorde.

In the next section, we show how we use the IH-OPT problem to search for very challenging instances for Concorde.

4 A sampling procedure for generating hard instances

A standard procedure for searching heuristically for (suboptimal) solutions of an optimization problem is based on uniformly sampling points (i.e., solutions) of the feasible region [5]. In our context, to find a heuristic solution for problem (20), we could sample a fixed number of vertices of \(P_{SEP}\) and retain the vertex yielding the minimum value for IH-OPT, that is, yielding the largest integrality gap. However, directly sampling the vertices of the \(P_{SEP}\) is impractical, due to its exponential number of subtour constraints. We could instead easily sample random cost vectors and generate vertices of \(P_{SEP}\) by solving directly problem (1)–(4).

A possibility for generating random cost vectors consists of generating n random points in a Euclidean space, and then computing all the pairwise distances using a given distance (e.g., a distance induced by the Minkowski norm). As observed from our preliminary tests, this procedure leads very often to an integral, and hence useless, vertex of \(P_{SEP}\). In practice, this procedure takes a long time before returning a fractional vertex of \(P_{SEP}\). Furthermore, it only samples instances of the Euclidean TSP, implicitly excluding some remarkable metric TSP instances, such as the ones provided in [3].

To generate a random cost vector, we have designed a different approach. First, we sample a random point within the metric polytope [21] using the hit-and-run algorithm [34], which generates a random metric TSP instances \({\varvec{c}} \in \mathbb {R}_+^ {|E|}\). Second, we get a (random) vertex \({\varvec{{\bar{x}}}}^{(h)}\) by solving SEP\(({\varvec{c}})\) via the simplex algorithm. Since the cost vector is a uniformly random point of the metric polytope, we expect that \({\varvec{{\bar{x}}}}^{(h)}\) is a random vertex of \(P_{SEP}\). We also expect a good variety among the sampled vertices: for instance, with \(n = 15\), after the sampling of 999 vertices, it took only less than three seconds to find one vertex both non-integer and not yet sampled. In the next paragraphs, we briefly review the hit-and-run algorithm, and we detail how we use the sampling procedure to generate hard metric TSP instances.

The hit-and-run algorithm The hit-and-run algorithm [34] is designed to sample from a bounded set P uniformly. The basic steps of the algorithm are:

  1. 1.

    Pick a point \({\varvec{x}}_k \in P \subset \mathbb {R}^m\).

  2. 2.

    Generate a random direction \({\varvec{d}}_k\) uniformly distributed over the unitary hyper-sphere centered in \({\varvec{x}}_k\).

  3. 3.

    Generate a random point \({\varvec{x}}_{k+1} = {\varvec{x}}_{k} + \lambda {\varvec{d}}_k\) uniformly distributed over the line set \(\{{\varvec{x}} \in P \ \vert \ {\varvec{x}} = {\varvec{x}}_k + \lambda {\varvec{d}}_k, \ \lambda \in \mathbb {R}\}\).

  4. 4.

    If a stopping criterion is met, stop (e.g., terminate after a fixed number of iterations). Otherwise, \({\varvec{x}}_{k} \leftarrow {\varvec{x}}_{k+1}\) and repeat from Step 2.

The points sampled with this procedure converge in total variation to a uniform distribution, as proven in [34]. The time required to have a sample that effectively approximates the uniform distribution is polynomial in the dimension, as proven in [24]. Later, a modification of this algorithm to sample a set of points that converges to an arbitrary target distribution was introduced in [32]. The interested reader can find an in-depth review of hit-and-run algorithms in [37].

The random sampling algorithm As noted in Sect. 2, the constraints (11)–(12) define the metric cone:

$$\begin{aligned} C_{MET}:= \{{\varvec{c}} \in \mathbb {R}^{|E|}_+ \mid c_{ij} - c_{ik} - c_{jk} \le 0, \forall i,j,k \in V \}. \end{aligned}$$
(37)

If we add the perimeter inequality (also known as the homogeneous triangle inequality) we get the metric polytope

$$\begin{aligned} P_{MET}:= C_{MET} \cap \{ c_{ij} + c_{ik} + c_{jk} \le 2, \forall i,j,k \in V \}. \end{aligned}$$
(38)

Note that for every metric \({\varvec{c}} \in C_{MET}\), there exist an \(\gamma \) such that \(\gamma {\varvec{c}} \in P_{MET}\) [21]. For this reason, without loss of generality, we can sample from \(P_{MET}\) instead of \(C_{MET}\). Sampling from \(P_{MET}\) instead of \(C_{MET}\) guarantees to remain in the framework described in [24], since \(P_{MET}\) is a convex compact set and, hence, a convex body.

Algorithm 1 presents our procedure for sampling metric TSP instances from the set \({P}_{MET}\). Given the size of the metric space \(m=|E|\) and the number of required sampled points r, the algorithm initialized in Step 1 the empty set R. Then, until the number of sampled points is equal to r, steps 3–7 are iterated. Step 3 generates a uniformly distributed random cost vector \({\varvec{c}}\) from the open set \({P}_{MET}\) using the hit-and-run algorithm. Step 4 generates an optimal vertex \({\varvec{x}}\) for SEP\(({\varvec{c}})\), \({\varvec{x}}\in P_{SEP}\). If the vertex \({\varvec{x}}\) is fractional and does not belong to R, it is added to R. Finally, the procedure returns the set R of r vertices of \(P_{SEP}\) of dimension m.

figure a

5 Generating the Hard-TSPLIB

The objective of our computational experiments is to generate the Hard-TSPLIB, a collection of small metric TSP instances having a large integrality gap and are challenging for the Concorde solver. The instances of the Hard-TSPLIB are generated by using a branch-and-cut solver for problem \(\text{ IH-OPT }({\varvec{\bar{x}}}^{(h)})\).

In the following paragraphs, first, we present the hard instances generated starting from the TSPLIB. Second, we present the hard instances generated using the random sampling procedure presented in Sect. 4. Third, we compare the runtime of Concorde for solving our hard instances with the runtime required for solving the Rectilinear 3-Dimensional instances [38]. In this work, the author compares the introduced instances with other works, namely [3, 18, 19, 30], showing that author’s instances are computatinoally harder. Thus, we compare ourselves with [38] and deduce other comparisons from the context. Then, we visually analyze the structure of the hard instances we have generated. Finally, we discuss the implementation details of our algorithms.

Note that all the computations described in the following paragraphs are executed on a single node of an HPC cluster running CentOS, having an Intel CPU with 32 physical cores working at 2.1 GHz, and 64 GB of RAM. We compiled our code with the GNU C++ compiler v8.3, with the following flags -O2 -D_REENTRANT -m64 -ffast-math -DNDEBUG -Wall -march=native.

5.1 Implementation details

Our computational procedure for generating hard metric TSP instances has two core algorithms:

  1. 1.

    The branch-and-cut algorithm for solving problem IH-OPT\(({\varvec{\bar{x}}}^{(h)})\) (see Sect. 3).

  2. 2.

    The sampling procedure from the metric polytope (see Sect. 4).

In the following paragraphs, we first describe our implementation of the two algorithms.

Solving IH-OPT by branch-and-cut The problem IH-OPT\(({\varvec{\bar{x}}}^{(h)})\) is solved by branch-and-cut by using the Gurobi commercial solver and using the C++ programming language. In our implementation, we dynamically add both the triangle inequalities and the TSP constraints, in order to keep the core LP problem as small as possible. Note that a complete enumeration of triangle inequalities could exhaust the memory of a standard computer for medium values of n.

For the separation of triangular inequalities, we follow the strategies used in [15]: at each iteration, we add a fixed number k of the most violated triangular inequalities, until no more violated triangle inequalities exist. The separation of triangle inequalities is carried over both during the solution of the LP relaxation and when encountering a new incumbent integer solution.

The separation of TSP constraints is more challenging since it corresponds to the solution of a TSP instance, as shown in (26). As we only need a violated cut, we first separate the TSP constraints heuristically by solving the TSP instance using the Lin-Kernigan local search procedure [22], as implemented in Concorde.Footnote 1 Note that the LK-H heuristic only handles integer costs. Hence, we use the support of Gurobi for lazy constraints, which allows running our separation procedure only on incumbent integer solutions. Whenever the LK-H heuristic returns a TSP solution whose incidence vector \({\varvec{z}}\) satisfies \(\sum _{\{i,j\}\in E }z_{ij} c_{ij} \ge \Delta \), with \(\Delta \) as described in Sect. 3, we run a second exact TSP algorithm. For the exact separation of TSP constraints we use Concorde compiled using CPLEX 12.8 as LP solver, by using the default parameter settings.

The optimal solution of the LK-H heuristic is given as warm start to the TSP branch-and-cut implementation.

Notice that before starting the solution of the integer problem IH-OPT\(({\varvec{\bar{x}}}^{(h)})\), we solve its LP relaxation H-OPT\(({\varvec{\bar{x}}}^{(h)})\) via cutting planes by only separating the TSP constraints with the LK-H heuristic (see, for instance, the computational results in Table 3). Once the LK-H heuristic does not find any TSP violated constraint, we stop, and we collect all the triangle inequalities violated by no more than a threshold \(\tau =0.05\), and all the generated TSP constraints, to initialize the first pool of cut for our branch-and-cut algorithm.

Sampling the metric polytope by hit-and-run The sampling procedure described in Algorithm 1 is implemented in Python 3.8.2. For the hit-and-run algorithm at Step 3, we have used the implementation provided by [13], which is based on the original algorithm introduced in [34]. The solution of the SEP problem in Step 4 is implemented using the python wrapper of Gurobi. Since the sampling procedure is very fast, we did not port this procedure to C++.

Parameters tuning In our implementation, we had only to decide a value for the parameter \(\Delta \) in (35). If we only consider the H-OPT formulation, namely the formulation without the integer costs, we can imagine to multiply each cost for a fixed quantity \(\omega \) and obtain a tour that costs \(\omega \Delta \). However, once we move to integer costs, the parameter \(\Delta \) becomes more important because it is strongly related to the number of values that can be taken as cost coefficients. For instance, if \(\Delta = n\), the solution having all \(c_{ij}\) equal to 1 is optimal. This does not lead to an hard-to-solve instance. On the contrary, if \(\Delta \) is too big, we miss such “degeneracy” of the costs on some edges. In practice, we have observed that for different values of \(\Delta \) we get TSP instances of different (runtime) difficulty. Table 1 shows the computational results for solving IH-OPT over sampled TSP instances using different values of \(\Delta \). Since the hardest instances were generated while using \(\Delta =1000\), we fix this value in all of the tests reported in this paper. In future work, we plan to investigate further the impact of the parameter \(\Delta \).

Table 1 Impact of the parameter \(\Delta \) on the instances generated by IH-OPT

5.2 Generating hard instances from the TSPLIB

The TSPLIB contains 20 instances with less than 76 nodes. Only 13 of them have a fractional solution for the SEP (i.e., the integrality gap is greater than 1). For 12 of these 13 instances, we have generated a corresponding hard instance by solving H-OPT\(({\varvec{\bar{x}}}^{(h)})\), where \({\varvec{\bar{x}}}^{(h)}\) is the fractional solution of SEP solved via the simplex algorithm.

If we denote by \({\varvec{c}}_0\) the cost vector of the TSPLIB instance, and by \({\varvec{c}}^*\) the optimal solution of H-OPT\(({\varvec{\bar{x}}}^{(h)})\), by Lemma 3, we have

$$\begin{aligned} \frac{TOUR({\varvec{c}}^*)}{SUBT({\varvec{c}}^*)} \ge \frac{TOUR({\varvec{c}}_0)}{SUBT({\varvec{c}}_0)}, \end{aligned}$$

that is, the TSP instance \({\varvec{c}}^*\) has an integrality gap larger than or equal to \({\varvec{c}}_0\). If instead, we solve the problem IH-OPT\(({\varvec{\bar{x}}}^{(h)})\), we cannot guarantee the previous relation, but in practice, we get more challenging instances with almost the same integrality gap. For this reason, all the following results are obtained by solving the IH-OPT problem. Table 2 reports the detailed results for the generation of hard instances from the TSPLIB. The table first reports the name and the dimension \(n=|V|\) of the original instance. The third and fourth columns report the integrality gap of \({\varvec{c}}_0\) (easy instance) and \({\varvec{c}}^*\) (hard instance). In the remaining six columns, the table shows the average runtime in seconds (with the standard deviations), and the average number of branch and bound nodes for solving with Concorde first the TSPLIB instance, and later the corresponding Hard-TSPLIB instance. The averages are computed over 10 independent runs of Concorde, using 10 different seeds. Finally, the last column reports whether the optimal SEP solution of \({\varvec{c}}_0\) is the same optimal solution of \({\varvec{c}}^*\).

The results of Table 2 show that the small TSPLIB instances have a very small integrality gap and are extremely easy for Concorde. They are solved within a fraction of seconds at the root node of the branch-and-cut tree. On the contrary, the Hard-TSPLIB have a significantly larger integrality gap, and they require, on average, several seconds (or up to several hours) to be solved to optimality by Concorde. As expected, a larger integrality gap at the root node implies a larger branch-and-cut tree, that is, a larger number of nodes (column ‘BC n.’). However, this is not always true, because among the three instances with 48 nodes (att48, gr48, and hk48), the instance with the smaller integrality gap requires the largest number of branch-and-cut nodes. Indeed, visiting a larger search tree implies a longer runtime. We remark that the instance brazil58_hard is not solved by Concorde within a timeout of 24 h.

Table 2 Hard-TSPLIB instances generated from the TSPLIB

Tables 3 and 4 report the computational results for generating the Hard-TSPLIB instances while solving H-OPT\(({\varvec{\bar{x}}}^{(h)})\) and IH-OPT\(({\varvec{\bar{x}}}^{(h)})\), respectively. For each instance, the table reports the number of cuts generated and the runtime for the triangle inequalities separation (trian.), the TSP constraints separated by the LKH heuristic (LKH-cuts), and the TSP constraints separated by Concorde (TSP-cuts). For the solution of IH-OPT\(({\varvec{\bar{x}}}^{(h)})\), the table gives also the total number of branch-and-bound nodes, the lower bounds (LB), and the upper bounds (UB): when the LB and UB are equal the instance is solved to optimality. Concerning the separation algorithms, the violated triangle inequalities are identified in a very short time, and they have almost no impact on the overall runtime. The heuristic separation of TSP constraints using the LKH heuristic is very effective, but the runtime begins to be important. The exact separation of TSP constraints is one of the two runtime bottlenecks for the generation of hard instances. For example, for the instance swiss42, most of the time is spent on the exact separation of a TSP cut. Finally, notice that for the instances pr76, eil76, rat99, kroB100, kroC100, the solution by branch-and-cut of IH-OPT\(({\varvec{\bar{x}}}^{(h)})\) hits the time limit of 24 h (86400 s). Among those instances, only for pr76, we do generate a hard TSP instances; in all other cases, we were not able to find an integer cost vector satisfying all triangle inequalities and all TSP constraints, that is, an optimal integer solution for problem IH-OPT\(({\varvec{\bar{x}}}^{(h)})\).

Table 3 Computational results for the heuristic solution of the LP problem H-OPT\(({\varvec{\bar{x}}}^{(h)})\) by cutting planes
Table 4 Computational results for the solution of the integer problem IH-OPT\(({\varvec{\bar{x}}}^{(h)})\) by branch-and-cut

5.3 Generating hard instances by sampling

We have also generated a collection of instances for the Hard-TSPLIB by using the random sampling procedure discussed in Sect. 4. First, we run Algorithm 1 to generate a random set R of vertices of \(P_{SEP}\), for a fixed size n of the TSP. When generating a random vertex \({\varvec{\bar{x}}}^{(h)}\) in Algorithm 1, we also store the cost vector \({\varvec{c}}_0^{(h)}\) sampled from the metric cone which yields the vertex \({\varvec{\bar{x}}}^{(h)}\). Hence, we can compute the integrality gap of the initial easy TSP instances. Later, for each random vertex \({\varvec{\bar{x}}}^{(h)}\) in R, we solve the IH-OPT\(({\varvec{\bar{x}}}^{(h)})\) problem to get an instance with a larger integrality gap.

Table 5 reports the three hardest instances we are able to provide for each \(n \in \{10, 15, 20,25, 30, 35,40\}\), generating 10 random vertices for each value of n. The table reports first the integrality gap of the sampled cost vector \({\varvec{c}}_0\), of the optimal cost vector \({\varvec{c}}^*\) obtained after solving IH-OPT\(({\varvec{\bar{x}}}^{(h)})\), and the gap \(\alpha ^*_n\) conjectured in [3], that is, to the best of our knowledge, the highest integrality gap available in the literature. Then, in the remaining columns, the table reports the average runtime (with the standard deviation) and the average number of BC nodes for solving the instances to optimality using Concorde, first for the instance \({\varvec{c}}_0\) (random cost vector) and then for \({\varvec{c}}^*\) (the optimal solution of IH-OPT). Similarly to the results of the Hard-TSPLIB, Table 5 shows that we are able to generate very hard instances by solving the IH-OPT problem.

We run a second experiment to study the effect of generating 1000 random instances for \(n=20\). For each random instance \({\varvec{c}}_0^{(h)}\), with \(h = 1,\ldots , 1000\), we measure the average solution runtime of Concorde. Then, using the corresponding vertex \({\varvec{\bar{x}}}^{(h)}\) and solving IH-OPT\(({\varvec{\bar{x}}}^{(h)})\), we generate the harder instance, and we measure the runtime again. Figure 1a shows the runtime distribution for the random instances \({\varvec{c}}_0^{(h)}\), while Fig. 1b shows the runtime distribution for \({\varvec{c}}^*\). In the top plot, we have a runtime close to zero: we barely get close to 0.030 s. On the other hand, for the bottom plot, we have mean runtime of 10 s, with a maximum of around 50 s. In practice, if we sample a large number of vertices we are able to get very challenging small instances.

Table 5 Results for the Hard-TSP instances generated by sampling
Fig. 1
figure 1

Top: distribution of the computational times for \(n = 20\) of 1000 sampled TSP instances. Bottom: distribution of the computational times for \(n = 20\) of 1000 instances obtained by the IH-OPT procedure

Table 6 reports the average runtime (in seconds) to solve the IH-OPT problem on 1000 different vertices sampled for each n from 10 to 20. The solution to this problem exhibits a very large runtime variability. For instance, with \(n = 20\), the minimum runtime is of 2 s, while the maximum is of 3445 s, that is, three orders of magnitude larger. In addition, we have fitted a linear regression on the \(\log _{10}\) of the runtime, getting the following equation

$$\begin{aligned} t = 10^{0.146\cdot n -1.86}. \end{aligned}$$

Using this regression, we try to predict the runtime for each n. For example, with \(n = 40\), we would expect a runtime of around 3 h. However, we have not observed any significant correlation between the runtime for solving the IH-OPT problem and the average runtime for solving the hard instance with Concorde.

Table 6 Average computational time and standard deviation of 1000 \(\text {IH-OPT}\) on different \({\varvec{x}}\) at a fixed n number of nodes

5.4 A comparison with the 3D-rectilinear instances

To the best of our knowledge, the current hardest instances of the literature are the Rectilinear 3D instances provided by Zhong [38]. In this work, the author proposes a family of 3-paths instances with a different configuration of nodes on the three paths and suggests the node configuration that makes them hard to solve for Concorde. For this reason, we tried to generate, using our computational procedure, metric TSP instances as hard as those 3D Rectilinear instances. We have tried the following two strategies.

  1. (a)

    We massively sampled the metric polytope for small n with the algorithm described in Sect. 4 and then apply IH-OPT to the correspondent vertices, until we find an instance with an average runtime competitive with the 3D-Rectilinear instances. We call this found instance \({\varvec{c}}_n^S\).

  2. (b)

    We tried to use our computational procedure to generate hard instances starting from the optimal solution of the instances provided by Zhong [38]. Let \({\varvec{c}}^R_n\) the cost vector of a 3D rectilinear instance with n nodes [38], and let \({\varvec{x}}^R_n\) the corresponding optimal solution of SEP\(({\varvec{c}}^R_n)\). Similarly, let \({\varvec{c}}^{IH}_n = \arg \min \text{ IH-OPT } ({\varvec{x}}_n^R)\) and \({\varvec{x}}^{IH}_n = \arg \min \text{ SEP }({\varvec{c}}^{IH}_n)\).

We compared the average runtime of solving TSP\(({\varvec{c}}^R_n)\), TSP\(({\varvec{c}}^{IH}_n)\), and TSP\(({\varvec{c}}^S_n)\). We have observed from our preliminary tests that Concorde has great runtime variance in solving the same instance: See, for example, the standard deviation values in Table 2 and Table 5, where we performed 10 independent runs on each instance, using every time a different seed. This variability is typical of Mixed Integer Programming and branch and cut algorithm (see e.g [12, 23]). For this reason, the runtime comparison instance by instance is meaningless unless the difference is of at least one order of magnitude. Thus, we present the runtime comparison using linear regression on the logarithm (\(\log _{10}\)) of the runtime measured in seconds (again, using 10 independent runs for each instance).

Fig. 2
figure 2

Linear regression for the runtime \(t^R_n\), \(t^S_n\), and \(t^{IH}_n\), corresponding to 3D rectilinear instances, sampled instances, and instances which are solution of IH-OPT\(({\varvec{x}}_n^R)\), respectively

Figure 2 shows the three regression lines for comparing the runtime for solving (i) the 3D rectilinear instances, marked with R, (ii) the instances produced using the sampling procedure mentioned above in point (a), marked with S, and (iii) the instances used with the procedure described in the point (b), marked with IH. The three runtimes are denoted by \(t^{R}_n\), \(t_n^S\), \(t^{IH}_n\), respectively. The fitted regression lines are the following:

$$\begin{aligned} t^{R}_n = 10^{ 0.168\, n -2.209}, \quad t^S_n = 10^{ 0.171\, n -2.157 }, \quad t^{IH}_n = 10^{ 0.137\, n -1.794}. \end{aligned}$$

By looking at the regression, the instances that we have found by massively sampling the space are a bit harder than the Rectilinear 3-D instance. On the contrary, the instances directly obtained from vertices are easier. Interestingly, in the logarithm scale, the three instances have all the same order of magnitude: by looking at Fig. 2 it is possible to observe that the three families are competitive to each other, with the differences mostly related to the variability of Concorde. Noteworthy, by observing the structure of costs of the instances created by hand as [38] with respect to the one obtained by the two heuristic procedures, we note less regularity. This fact will be discussed wider in the next subsection. In terms of the integrality gap, we do not notice any significant evidence. For \(n = 12\), the instance we sample has an integrality gap of 1.164 while the Rectilinear 3-D instance has 1.193 and the two runtimes are competitive. On the contrary, for \(n = 20\), we sample an instance with an integrality gap of 1.242, while the Rectilinear 3-D instance has 1.240 and yet the two instances are competitive. Lastly, we run the computation as reported by Zhong, namely by multiplying the instances by 1000 and rounding to the nearest integer. We verified that this procedure might lead to non-metric instances as, due to rounding errors, triangular inequalities might not be satisfied.

Fig. 3
figure 3

Easy (top) and hard (bottom) support graph associated to one of the sampled vertex with \(n = 15\). The cost value on the edge of the support graph is also provided

5.5 Structure of small hard instances

The hard instances recently introduced in [19] and [38] are characterized, by construction, by half-integer solutions for SEP solution, and by support graphs having two triangles where each edge has a weight equal to \(\frac{1}{2}\). Hence, we have looked at the structure of the support graph of our hard instances. For some instances, such as bays29 and eil51, the optimal vertex moves from a complicated structure to a 3-path configuration with two triangles having \(\frac{1}{2}\)-vertices. For the instance gr48, in the original version, the vertices have entries of values in the set \(\{0.0, 0.25, 0.5, 0.75, 1.0\}\), while in the corresponding hard version, they have entries in the set \(\{0.0, 0.5, 1.0\}\).

In addition, for three small instances, namely gr24 and two sampled instances with \(n=15\) (\(s_{15}\)) and \(n=20\) (\(s_{20}\) ), we studied “by hand” the cost structure of the support graph of the optimal solution of SEP. We selected these 3 instances since they are challenging for Concorde. The motivation of this study is to investigate if the hard instances share some common cost patterns and/or structures of the support graph. Figures 3, 4, and 5 show the support graph of the easy and hard version of the two instances, where the easy refer to the original cost vector \({\varvec{c}}_0\) with its SEP vertex solution, and the hard to the optimal solution \({\varvec{c}}^*\) of IH-OPT. In the support graphs, the dotted edges correspond to the solution of SEP having value \(x_e = \frac{1}{2}\), while the solid edges correspond to \(x_e = 1\). The missing edges have \(x_e = 0\). The label of each edge gives its cost in the TSP instance.

Fig. 4
figure 4

Easy (top) and hard (bottom) support graph associated to one of the sampled vertex with \(n = 20\). The cost value on the edge of the support graph is also provided

Fig. 5
figure 5

Easy (top) and hard (bottom) support graph associated to the instance gr24 of the TSPLIB. The cost value on the edge of the support graph is also provided

We observe that the main difference between the easy and hard instances is in the pattern of the edge length. In the easy instances, the edge costs look randomly distributed, while in the hard instances obtained after solving IH-OPT there is a clear cost pattern. The edges lying along the same path have nearly the same cost, while the dashed edges connecting two distinct paths generally have a cost nearly equal to the sum of the costs of a single edge on a path. In addition, the overall length of the 3-paths are almost equal: for instance, in \(s_{15}\) (Fig. 3), in the easy version, we have on each path a sum of, respectively, 707, 48, 1633, while in the hard one, we find 144, 147, 147. Finally, notice that since in IH-OPT we have removed the slackness constraints introduced in OPT\(_h\), we do not have any guarantee that the optimal solution for the SEP associated to \({\varvec{c}}^0\) and \({\varvec{c}}^*\) is the same. However, in \(s_{15}\) we have the \({\varvec{x}}^0 = {\varvec{x}}^*\), while for \(s_{20}\) and gr24 we get \({\varvec{x}}^* \ne {\varvec{x}}^0\), as shown in Figs. 4 and  5.

6 Conclusions

In this work, we have introduced a computational procedure to generate metric TSP instances which have large integrality gaps and are challenging for Concorde, the state-of-the-art TSP solver. As a by-product, we have introduced the Hard-TSPLIB, a collection of small but challenging metric TSP instances, which are not generated explicitly exploiting specific cost structures, as in [19, 38]. Notice that, to the best of our knowledge, all the hard instances from the literature have half-integer optimal SEP solutions. On the contrary, the instances of the Hard-TSPLIB have a larger variety of fractional optimal vertices (i.e., they are not only half integers). We expect our new instances will serve as a benchmark for designing new exact and heuristic methods for solving the TSP problem.

Curiously, we have observed that the most challenging instances generated using our computational procedure have regular cost patterns, with several edges sharing the same costs, and several paths on the support graphs having the same length. These types of cost patterns are in common with the manually-generated hard instances recently introduced in [19] and [38]. Hence, we believe that our Hard-TSPLIB instances could help further studies in the cost structures of TSP instances.

We emphasize that our framework is really general and can be applied to any combinatorial optimization problem. There are several combinatorial optimization problems that have metric costs, such as, for instance, the metric Steiner tree [4] or the shortest Euclidean minimum spanning tree [11]. The main difference would be to change the TSP constraints into different inequalities for the corresponding combinatorial objects.