On the generation of metric TSP instances with a large integrality gap by branch-and-cut

This paper introduces a computational method for generating metric Travelling Salesman Problem (TSP) instances having a large integrality gap. The method is based on the solution of an integer programming problem, called IH-OPT, that takes as input a fractional solution of the Subtour Elimination Problem (SEP) on a TSP instance and computes a TSP instance having an integrality gap larger than or equal to the integrality gap of the first instance. The decision variables of IH-OPT are the entries of the TSP cost matrix, and the constraints are defined by the intersection of the metric cone with an exponential number of inequalities, one for each possible TSP tour. Given the very large number of constraints, we have implemented a branch-and-cut algorithm for solving IH-OPT. Then, by sampling cost vectors over the metric polytope and by solving the corresponding SEP, we can generate random fractional vertices of the SEP polytope. If we solve the IH-OPT problem for every sampled vertex using our branch-and-cut algorithm, we can select the generated TSP instance (i.e., cost vector), yielding the longest runtime for Concorde, the state-of-the-art TSP solver. Our computational results show that our method is very effective in producing challenging instances. As a by-product, we release the Hard-TSPLIB, a library of 41 small metric TSP instances which have a large integrality gap and are challenging in terms of runtime for Concorde.


Introduction
The Branch-and-Cut (B&C) algorithms are powerful tools to solve NP-hard problems.A core component of these algorithms is the solution of several Linear Programming (LP) relaxations that appear while searching for an integral optimal solution.Given an instance of an ILP model, its integrality gap for one linear relaxation is the ratio between the optimal solution of the integer problem and the corresponding LP relaxation.In practice, this is a measure of how much the relaxation is good to approximate the original problem.As a rule of thumb, given an Integer Linear Programming (ILP) model for an NP-hard problem, the larger is the integrality gap of the initial LP relaxation, the longer is the runtime for a B&C algorithm to prove that the best solution found is optimal.In practice, a large integrality gap at the root node very often implies that a great number of branch-and-bound nodes must be visited before proving the optimality of an integral solution.For this reason, a significant effort in designing efficient B&C algorithms is spent looking for tight LP relaxations [26].
In this paper, we introduce a new challenging integer programming problem, herein called the Integer Heuristic-OPT (IH-OPT) problem, whose optimal solution provides a Metric TSP instance with a large integrality gap.Our approach is purely computational, and we do not restrict to the generation of Euclidean or Rectilinear TSP instances as in [18,19,38], but we still require that the generated instance are metric.The decision variables of our problem are the entries of the symmetric cost matrix C C C, while the constraints are defined by the intersection of the metric cone [21] with an exponential number of inequalities, one for each possible permutation of V , that is, one for each possible solution of the TSP.For example, starting from an instance C 0 of the TSPLIB, taking an optimal solution of the LP relaxation of the corresponding subtour elimination model, and by solving the IH-OPT problem, we can generate a new TSP instance C * having an integrality gap larger than or equal to the integrality gap of the original instance C 0 (e.g., see Table 2).In addition, by integrating the solution of the IH-OPT problem into a sampling procedure, we can generate several TSP instances that have a large integrality gap and are challenging in terms of runtime for Concorde (e.g., see Table 5).As a by-product of our work, we introduce the Hard-TSPLIB, a collection of 41 small TSP instances (i.e., with n ≤ 76) which are very challenging for Concorde in terms of runtime and number of branch-and-bound nodes.
The outline of this paper is as follows.In Section 2, we review the background material, and we fix the notation.Section 3 formally introduces the IH-OPT problem and discusses how it is related to previous works.Section 4 presents the sampling procedure that we use to generate TSP instances which are hard for Concorde in terms of runtime.In Section 5, we present our extensive computational results and describe how we generated the instances that we have included in the Hard-TSPLIB.Finally, in Section 6, we conclude with a discussion on future works.

Background material
In this section, we review the main formulation for the symmetric TSP and we introduce the Subtour Elimination Problem polytope.We formally define the integrality gap for the SEP, presenting the work of [3], which is the foundation of our work.
An instance of the symmetric TSP can be completely defined by the symmetric matrix C C C. Otherwise, we can define a TSP instance using a complete undirected graph K n = (V, E) along with a cost vector c c c ∈ R |E| + , which is given by the upper triangular matrix of C C C (without the diagonal).While in this paper we focus on the general Metric TSP, and we denote it only with TSP, as defined in Section 1, two special cases that are relevant to compare our work with the literature: 1.The Euclidean TSP, where the input is a collection of n points in R d , whose reciprocal distances are computed using the Euclidean norm.These are the type of TSP instances used in [18,19], with d = 2.
2. The Rectilinear TSP, where the input is again a collection of n points in R d , but where the Manhattan norm (or city block or L 1 norm) is used to compute the distances.These are the type of TSP instances used in [38], with d = 3.
In both cases, using the collection of points given as input, and the corresponding distance function, it is possible to define a complete graph K n with the set of nodes V = {1, . . ., n} and the set of edges E = V ×V , with the corresponding cost vector c c c ∈ R |E| + .In the following, we use δ (S) with S ⊂ V , to denote the set of weighted edges e = {v, w}, v = w with either v ∈ S, w ∈ S or either w ∈ S, v ∈ S. We use later the two following collection of subsets of vertices: Given a connected graph G = (V, E) and the cost vector c c c ∈ R |E| + , the Travelling Salesman Problem, originally proposed in [9], is formulated as follows: ∑ e∈δ ({v}) ∑ e∈δ (S) The decision variable x e is equal to 1 if the edge e = {v, w} is part of an optimal tour.The objective function (1) minimizes the overall tour length.Constraints (2) state that each node v must have two incident edges.The Subtour Elimination Constraints (3) force the cut set δ (S) of every proper subset S of V to contain at least two edges.
If we relax the integrality constraints (5), we can define the Subtour Elimination Problem (SEP), that, given a cost vector c c c, provides a lower bound of the optimal solution.In the next sections, we will denote as polytope of the SEP the set If we optimize the objective function (1) over P SEP , we get exactly the Subtour Elimination Problem.
Let us consider the complete graph K n .Let us denote by T OUR(c c c) and SUBT (c c c), respectively the optimal value of the TSP and SEP instance defined by the cost vector c c c. Similarly to [3], we denote by α n the integrality gap of SEP for K n , that is, the largest possible ratio between T OUR(c c c) and SUBT (c c c): For n ≤ 5, we have that α n = 1.The exact value for 6 ≤ n ≤ 12 was computed in [3,7].Williamson's conjecture states that α n ≤4 3 for any n [35].Let's call α n (c c c), the integrality gap of a specific TSP instance on n nodes with cost vector c c c, that is We can thus write The families of instances introduced in [18,19,38] have all the properties that α n (c c c) < 4 3 and lim n→∞ α n (c c c) = 4 3 .However, the largest integrality gap for the instances in the TSPLIB is equal to 1.095 [30], which is largely less than Following the idea presented in [3], we can divide the cost vector c c c by the optimum tour value T OUR(c c c), obtaining a new cost vector c c c that still satisfies the triangle inequalities, and which leaves α n (c c c) unchanged.Hence, we can restrict ourselves to metric cost vectors c c c such that T OUR(c c c) = 1, and we can transform the maximization problem into a minimization: 1 Problem ( 8) can be formulated as a mixed integer quadratic problem, where the decision variables are both the cost vector c c c and the incidence vector x x x of vertices of P SEP .Unfortunately, despite the recent improvements in the implementation of commercial optimization solvers, the quadratic model is intractable even for small values of n.In [3], the authors propose a clever idea for bypassing the quadratic model using the vertex representation of P SEP .That is, they represent P SEP as the convex combinations of its vertices {x x x (1) , . . ., x x x (t) }, which are finitely many.Then, for each vertex x x x (h) of P SEP , they define an LP problem, called OPT(x x x (h) ), having an exponential number of variables and constraints.Theoretical results in [3,7] guarantee that only a subset of vertices is necessary to compute the integrality gap.By solving the OPT(x x x (h) ) subproblem on each vertex of the previously mentioned subset, they were able to compute α n for n ≤ 10 in [3], and for n ≤ 12 in [7].
Given the complete graph on n nodes K n = (V, E), we introduce one vector z z z ∈ R |E| for each π permutation of nodes, such that Let T n be the collection of the incidence vectors z z z ∈ R |E| of all the possible tours of K n .Given a vertex x x x (h) of P SEP , the OPT(x x x (h) ) problem is defined in [3] as Constraints (10) ensure that the optimal solution c c c * of OPT( x(h) ) for every x x x (h) is such that T SP(c c c * ) = 1, as discussed in [3].Herein, we call the inequalities (10) the TSP constraints.Constraints (11) and (12) ensure that the cost vectors represent a semi-metric.Constraints (13)- (15) are the dual constraints associated to the dual problem of (1)-(4).Constraints ( 16)- (18) ensure that the vertex x x x (h) remains the optimal solution of the SEP.If we denote by C * the set of the optimal solutions of OPT(x x x (h) ), the dual slackness constraints are introduced to guarantee ) Note that, in [3,7], the authors observed that, for 6 ≤ n ≤ 12, |C * | = 1.
Hence, given the list of the vertices {x x x (1) , . . ., x x x (t) } of P SEP , the integrality gap α n of K n is computed by solving 1 Part of the original contribution presented in [3] is to prove that we need to consider only a subset of the t vertices of P SEP , namely, such vertices which support graphs satisfy certain properties.We refer the reader to [3] for the details.
In the following section, we modify the single problem OPT(x x x (h) ) in order to introduce a new NP-hard problem that we use to generate metric TSP instances with a large integrality gap.

The Integer Heuristic-OPT Problem
The goal of our work is to devise an efficient computational procedure to generate instances with a large integrality gap for the Metric TSP.We are not interested in computing the exact value α n for a fixed n as in [3], but we focus on finding a heuristic solution to problem (20).
Notice that problem ( 9)-( 15) has an exponential number of variables due to the dual variables d S .Furthermore, the number of constraints ( 10) is equal to the number of tours, that is (n−1)! 2 , and the number of triangular inequalities constraints (11) is O(n 3 ).In practice, the exact solution of OPT(x x x (h) ) is intractable even for small values of n.Note that the computation of α n for n = 12 required around 24 days [7].In order to solve problem (20) heuristically, we introduce a new problem, called Heuristic-OPT (H-OPT), which is related to problem OPT(x x x (h) ), but can be solved for larger values of n.The two key ideas for introducing the new problem are: (i) To generate a TSP instance with a sufficiently large integrality gap it is unnecessary to enumerate all vertices {x x x (1) , . . ., x x x (t) } of P SEP .We can sample a subset of vertices and take the instance providing the largest integrality gap.
(ii) Since we do not perform exhaustive vertex enumeration, it is unnecessary to impose the complementary slackness constraints ( 13)- (15) to force that a given vertex x x x (h) remains the same vertex of P SEP .Hence, we can remove all the variables and constraints related to the slackness conditions.For these two reasons, given a vertex x x x (h) ∈ P SEP , we define the following LP problem: In practice, we have relaxed the problem OPT(x x x (h) ) by removing constraints ( 13)- (15).Notice that in H-OPT we have only |E| cost variables, (n−1)! 2 TSP constraints (22) (one for each tour), and O(n 3 ) triangles inequalities (23) that define the metric cone [21].We can solve this LP problem by cutting planes, by separating both families of constraints.
A critical point in the solution of the H-OPT problem by branch-and-cut is the separation of (maximally violated) TSP constraints (22).The separation problem SP is defined as follows.Given a cost vector c c c ∈ R |E| + , we look for a tour whose incidence vector z z z verify the following: Notice that any tour that satisfies the previous relation gives a violated TSP constraint.However, to prove that no tour violates the TSP constraint, we need to verify the following: Thus, to add a new TSP constraint, we solve a TSP instance for a specific cost vector c c c.In our implementation, we separate TSP constraints by first solving the TSP instance given by c c c * using the LK-H heuristic [16,17], and whenever the heuristic fails to find a violated tour, we solve (26) by embedding Concorde in the code.Section 5.5 describes the details of our implementation.Clearly, the TSP constraints make this problem challenging, as stated in the following lemma.Lemma 3.1.The H-OPT problem is NP-hard.
First of all, we prove the following Lemma.Lemma 3.2.Let n = |V |.If H-OPT can be solved in n O (1) time, then the separation problem SP can be solved in n O (1)  time.
Sketch.Let P be the polyhedron defined by equations ( 22)- (24).The proof is implicit in the proof of a well-known theorem of Grötschel et al. [15] which states that for well-described polyhedron P the optimization problem can be solved in polynomial time if and only if the separation problem SP can be solved in polynomial time.Note that the requirement of well-described polyhedral P is only used for obtaining a time complexity polynomial in the input size of the problem.For example, if the length L of the input needed to describe the polyhedral P does not satisfy n ≤ L, then even if the separation could be resolved in time n O (1) we could not claim that the separation problem SP can be resolved in polynomial time.
To see how the proof of this lemma is implicit in [15], it is sufficient to note that in [15] it is not asked to provide the input size as tight as possible.Actually, it is possible to make the encoding dimension L of a polyhedron P greater than n by adding, for example, a string with n zeros.This ensures n ≤ L, and the claim follows.
Lemma 3.1.The claim follows by showing that the Hamiltonian Cycle problem (HC) can be solved in polynomial time by an oracle machine with an oracle for H-OPT.More precisely, we reduce in polynomial time HC to the separation problem SP of H-OPT.Then, by Lemma 3.2 an oracle machine for H-OPT that takes n O (1) time implies that the separation problem SP (and therefore HC) can be solved in n O (1) time.
Consider the following sets: Let G = (V, E) be an undirected graph on n nodes, that is |V | = n.This graph defines an instance of the HC problem, namely the decision problem that searches for a cycle in a graph.We can perform a standard polynomial time reduction to get a Metric TSP and then normalize costs as follows: where 1 > β (n) > ε > 0, and the definition of β (n) will be clarified later in the proof of this theorem.
Note that this instance of the TSP is metric: identity and symmetry are obvious, and the triangle inequalities can be verified case-by-case.Note also that, the graph contains a Hamiltonian cycle if and only of the optimal solution of the TSP is 1 − ε 2 < 1, and thus such cost vector is in M \ T SP.On the opposite side, if the graph does not admit Hamiltonian cycle, then the TSP solution must contain at least one edge of length (2 − ε)/n.Then, the value of the optimal tour would be at least We observe that if ε < 2 1 + n , then (31) is greater than 1.Thus, we can set . Note that from now on we have proved that a graph G = (V, E) on n nodes admits a Hamiltonian cycle if and only if the solution of the associated TSP with cost vector ĉ c c is less than 1.By putting everything together, we see that the Hamiltonian Cycle problem (HC) can be solved in polynomial time by an oracle machine with an oracle for H-OPT as follows: 1. Let G = (V, E) be an undirected graph on n nodes.(30) to obtain a TSP instance with cost vector ĉ c c.

Use equation
3. By Lemma 3.2, an oracle machine for H-OPT that takes n O (1) time implies that we can decide whether ĉ c c ∈ M in n O (1) time.If ĉ c c ∈ M then G does not admit a Hamiltonian cycle.Otherwise, it admits a Hamiltonian cycle.
We remark that our proof only yields to NP-hardness under Turing reductions, as we have shown that there is an NP-complete decision problem, namely HC, that can be Turing-reduced to H-OPT.However, this also implies "hardness" for our problem, as an existence of a polynomial time algorithm for H-OPT would imply P = NP.
Note also that since we have removed from our problem the dual slackness constraints, the relation ( 19) is not necessarily satisfied when c c c * is the optimal solution of H-OPT(x x x (h) ).However, we can prove the following lemma which states that the solution of the H-OPT(x x x (h) ) problem provides a cost vector c c c * corresponding to a TSP instance with an integrality gap greater than or equal to any instance yielding x x x (h) .
Lemma 3.3.Let us consider a TSP instance c c c 0 such that T OUR(c c c 0 ) = 1, and let x x x (0) be an optimal solution of SEP(c c c 0 ).We define a second TSP instance by the cost vector c c c 1 = arg min H-OPT(x x x (0) ).
Then, the following relation holds T OUR(c c c 1 ) Proof.Since x x x (0) is a feasible solution of the SEP, we have that By definition of H-OPT(x x x (h) ), we have c c c 1 is the cost vector that realizes the minimum of c c c T x x x (h) among all the metric vectors such that T OUR(c c c) = 1.By hypothesis, T OUR(c c c 0 ) = 1 and this implies, c c c 0 z z z ≥ 1 for all z z z 0-1 incidence vector.Thus, c c c 0 is feasible for H-OPT(x x x (0) ) and it holds where the last equation holds by definition.Thus, .
From large integrality gaps to hard instances.The TSP solver Concorde [2] and the LK-H heuristic that we use for separating TSP constraints [16] can handle only integer costs.However, the H-OPT problem generates fractional cost vectors.Hence, we need to devise a method to transform the fractional costs into integer values, by, for example, multiplying for a large constant τ and then rounding to the nearest integer, that is, c i j = round(τc * i j ).Using this cost transformation, we cannot guarantee that the integrality gap for c i j remains the same of c * i j .The TSP constraints (22) have a right-hand side equal to 1 because we have divided the cost vector c c c by the minimum tour length T OUR(c c c).If we divide the cost vector by the quantity T OUR(c c c) ∆ , where ∆ > 0 is a large positive constant, we have to change the right hand side of (22) to ∆, while still getting an equivalent LP problem.If ∆ is large enough (see Section 5.5), we can also add the integrality constraint on the variable c i j .In practice, in order to generate integer cost vectors, we have introduced the following ILP problem: Clearly, IH-OPT is very challenging, as it is an integer program with as many constraints as TSP tours plus the number of triangle inequalities.We have computational evidence that the integrality gap of the integer problem could be smaller than those obtained by solving H-OPT.Surprisingly, our computational results show that the TSP instances obtained while solving the IH-OPT problem are very challenging for Concorde.
In the next section, we show how we use the IH-OPT problem to search for very challenging instances for Concorde.

A sampling procedure for generating hard instances
A standard procedure for searching heuristically for (suboptimal) solutions of an optimization problem is based on uniformly sampling points (i.e., solutions) of the feasible region [5].In our context, to find a heuristic solution for problem (20), we could sample a fixed number of vertices of P SEP and retain the vertex yielding the minimum value for IH-OPT, that is, yielding the largest integrality gap.However, directly sampling the vertices of the P SEP is impractical, due to its exponential number of subtour constraints.We could instead easily sample random cost vectors and generate vertices of P SEP by solving directly problem (1)-(4).
A possibility for generating random cost vectors consists of generating n random points in a Euclidean space, and then computing all the pairwise distances using a given distance (e.g., a distance induced by the Minkowski norm).As observed from our preliminary tests, this procedure leads very often to an integral, and hence useless, vertex of P SEP .In practice, this procedure takes a long time before returning a fractional vertex of P SEP .Furthermore, it only samples instances of the Euclidean TSP, implicitly excluding some remarkable metric TSP instances, such as the ones provided in [3].
To generate a random cost vector, we have designed a different approach.First, we sample a random point within the metric polytope [21] using the hit-and-run algorithm [34], which generates a random metric TSP instances c c c ∈ R |E| + .Second, we get a (random) vertex x x x (h) by solving SEP(c c c) via the simplex algorithm.Since the cost vector is a uniformly random point of the metric polytope, we expect that x x x (h) is a random vertex of P SEP .We also expect a good variety among the sampled vertices: for instance, with n = 15, after the sampling of 999 vertices, it took only less than three seconds to find one vertex both non-integer and not yet sampled.In the next paragraphs, we briefly review the hit-and-run algorithm, and we detail how we use the sampling procedure to generate hard metric TSP instances.
The hit-and-run algorithm The hit-and-run algorithm [34] is designed to sample from a bounded set P uniformly.The basic steps of the algorithm are: The points sampled with this procedure converge in total variation to a uniform distribution, as proven in [34].The time required to have a sample that effectively approximates the uniform distribution is polynomial in the dimension, as proven in [24].Later, a modification of this algorithm to sample a set of points that converges to an arbitrary target distribution was introduced in [32].The interested reader can find an in-depth review of hit-and-run algorithms in [37].
The random sampling algorithm As noted in Section 2, the constraints ( 11)-( 12) define the metric cone: ) If we add the perimeter inequality (also known as the homogeneous triangle inequality) we get the metric polytope ) Note that for every metric c c c ∈ C MET , there exist an γ such that γc c c ∈ P MET [21].For this reason, without loss of generality, we can sample from P MET instead of C MET .Sampling from P MET instead of C MET guarantees to remain in the framework described in [24], since P MET is a convex compact set and, hence, a convex body.Algorithm 1 presents our procedure for sampling metric TSP instances from the set P MET .Given the size of the metric space m = |E| and the number of required sampled points r, the algorithm initialized in Step 1 the empty set R. Then, until the number of sampled points is equal to r, steps 3-7 are iterated.Step 3 generates a uniformly distributed random cost vector c c c from the open set P MET using the hit-and-run algorithm.Step 4 generates an optimal vertex x x x for SEP(c c c), x x x ∈ P SEP .If the vertex x x x is fractional and does not belong to R, it is added to R. Finally, the procedure returns the set R of r vertices of P SEP of dimension m.

Generating the Hard-TSPLIB
The objective of our computational experiments is to generate the Hard-TSPLIB, a collection of small metric TSP instances having a large integrality gap and are challenging for the Concorde solver.The instances of the Hard-TSPLIB are generated by using a branch-and-cut solver for problem IH-OPT(x x x (h) ).
Algorithm 1: Sampling vertices of the P SEP by using the metric polytope.Input: m = |E|, the size of the TSP instance Input: r, the required number of vertices Output: R, a collection of vertices of In the following paragraphs, first, we present the hard instances generated starting from the TSPLIB.Second, we present the hard instances generated using the random sampling procedure presented in Section 4. Third, we compare the runtime of Concorde for solving our hard instances with the runtime required for solving the Rectilinear 3-Dimensional instances [38].In this work, the author compares the introduced instances with other works, namely [31,3,18,19], showing that author's instances are computatinoally harder.Thus, we compare ourselves with [38] and deduce other comparisons from the context.Then, we visually analyze the structure of the hard instances we have generated.Finally, we discuss the implementation details of our algorithms.
Note that all the computations described in the following paragraphs are executed on a single node of an HPC cluster running CentOS, having an Intel CPU with 32 physical cores working at 2.1 GHz, and 64 GB of RAM.We compiled our code with the GNU C++ compiler v8.3, with the following flags -O2 -D_REENTRANT -m64 -ffast-math -DNDEBUG -Wall -march=native.

Implementation details
Our computational procedure for generating hard metric TSP instances has two core algorithms: 1.The branch-and-cut algorithm for solving problem IH-OPT(x x x (h) ) (see Section 3).
2. The sampling procedure from the metric polytope (see Section 4).
In the following paragraphs, we first describe our implementation of the two algorithms.
Solving IH-OPT by branch-and-cut.The problem IH-OPT(x x x (h) ) is solved by branch-and-cut by using the Gurobi commercial solver and using the C++ programming language.In our implementation, we dynamically add both the triangle inequalities and the TSP constraints, in order to keep the core LP problem as small as possible.Note that a complete enumeration of triangle inequalities could exhaust the memory of a standard computer for medium values of n.
For the separation of triangular inequalities, we follow the strategies used in [14]: at each iteration, we add a fixed number k of the most violated triangular inequalities, until no more violated triangle inequalities exist.The separation of triangle inequalities is carried over both during the solution of the LP relaxation and when encountering a new incumbent integer solution.
The separation of TSP constraints is more challenging since it corresponds to the solution of a TSP instance, as shown in (26).As we only need a violated cut, we first separate the TSP constraints heuristically by solving the TSP instance using the Lin-Kernigan local search procedure [22], as implemented in Concorde 1 .Note that the LK-H heuristic only handles integer costs.Hence, we use the support of Gurobi for lazy constraints, which allows running our separation procedure only on incumbent integer solutions.Whenever the LK-H heuristic returns a TSP solution whose incidence vector z z z satisfies ∑ {i, j}∈E z i j c i j ≥ ∆, with ∆ as described in Section 3, we run a second exact TSP algorithm.For the exact separation of TSP constraints we use Concorde compiled using CPLEX 12.8 as LP solver, by using the default parameter settings.
The optimal solution of the LK-H heuristic is given as warm start to the TSP branch-and-cut implementation.
1 http://www.math.uwaterloo.ca/tsp/concorde/downloads/codes/src/co031219.tgz.Notice that before starting the solution of the integer problem IH-OPT(x x x (h) ), we solve its LP relaxation H-OPT(x x x (h) ) via cutting planes by only separating the TSP constraints with the LK-H heuristic (see, for instance, the computational results in Table 3).Once the LK-H heuristic does not find any TSP violated constraint, we stop, and we collect all the triangle inequalities violated by no more than a threshold τ = 0.05, and all the generated TSP constraints, to initialize the first pool of cut for our branch-and-cut algorithm.
Sampling the metric polytope by hit-and-run.The sampling procedure described in Algorithm 1 is implemented in Python 3.8.2.For the hit-and-run algorithm at Step 3, we have used the implementation provided by [13], which is based on the original algorithm introduced in [34].The solution of the SEP problem in Step 4 is implemented using the python wrapper of Gurobi.Since the sampling procedure is very fast, we did not port this procedure to C++.
Parameters tuning.In our implementation, we had only to decide a value for the parameter ∆ in (35).If we only consider the H-OPT formulation, namely the formulation without the integer costs, we can imagine to multiply each cost for a fixed quantity ω and obtain a tour that costs ω∆.However, once we move to integer costs, the parameter ∆ becomes more important because it is strongly related to the number of values that can be taken as cost coefficients.For instance, if ∆ = n, the solution having all c i j equal to 1 is optimal.This does not lead to an hard-to-solve instance.On the contrary, if ∆ is too big, we miss such "degeneracy" of the costs on some edges.In practice, we have observed that for different values of ∆ we get TSP instances of different (runtime) difficulty.Table 1 shows the computational results for solving IH-OPT over sampled TSP instances using different values of ∆.Since the hardest instances were generated while using ∆ = 1000, we fix this value in all of the tests reported in this paper.In future work, we plan to investigate further the impact of the parameter ∆.

Generating hard instances from the TSPLIB
The TSPLIB contains 20 instances with less than 76 nodes.Only 13 of them have a fractional solution for the SEP (i.e., the integrality gap is greater than 1).For 12 of these 13 instances, we have generated a corresponding hard instance by solving H-OPT(x x x (h) ), where x x x (h) is the fractional solution of SEP solved via the simplex algorithm.
If we denote by c c c 0 the cost vector of the TSPLIB instance, and by c c c * the optimal solution of H-OPT(x x x (h) ), by Lemma 3.3, we have that is, the TSP instance c c c * has an integrality gap larger than or equal to c c c 0 .If instead, we solve the problem IH-OPT(x x x (h) ), we cannot guarantee the previous relation, but in practice, we get more challenging instances with almost the same integrality gap.For this reason, all the following results are obtained by solving the IH-OPT problem.Table 2 reports the detailed results for the generation of hard instances from the TSPLIB.The table first reports the name and the dimension n = |V | of the original instance.The third and fourth columns report the integrality gap of c c c 0 (easy instance) and c c c * (hard instance).In the remaining six columns, the table shows the average runtime in seconds (with the standard deviations), and the average number of branch and bound nodes for solving with Concorde first the TSPLIB instance, and later the corresponding Hard-TSPLIB instance.The averages are computed over 10 independent runs of Concorde, using 10 different seeds.Finally, the last column reports whether the optimal SEP solution of c c c 0 is the same optimal solution of c c c * .
The results of Table 2 show that the small TSPLIB instances have a very small integrality gap and are extremely easy for Concorde.They are solved within a fraction of seconds at the root node of the branch-and-cut tree.On the contrary, the Hard-TSPLIB have a significantly larger integrality gap, and they require, on average, several seconds (or up to several hours) to be solved to optimality by Concorde.As expected, a larger integrality gap at the root node implies a larger branch-and-cut tree, that is, a larger number of nodes (column 'BC n.').However, this is not always true, because among the three instances with 48 nodes (att48, gr48, and hk48), the instance with the smaller integrality gap requires the largest number of branch-and-cut nodes.Indeed, visiting a larger search tree implies a longer runtime.We remark that the instance brazil58_hard is not solved by Concorde within a timeout of 24 hours.
Table 3 and Table 4 report the computational results for generating the Hard-TSPLIB instances while solving H-OPT(x x x (h) ) and IH-OPT(x x x (h) ), respectively.For each instance, the table reports the number of cuts generated and the runtime for the triangle inequalities separation (trian.), the TSP constraints separated by the LKH heuristic (LKH-cuts), and the TSP constraints separated by Concorde (TSP-cuts).For the solution of IH-OPT(x x x (h) ), the table gives also the total number of branch-and-bound nodes, the lower bounds (LB), and the upper bounds (UB): when the LB and UB are equal the instance is solved to optimality.Concerning the separation algorithms, the violated triangle inequalities are identified in a very short time, and they have almost no impact on the overall runtime.The heuristic separation of TSP constraints using the LKH heuristic is very effective, but the runtime begins to be important.The exact separation of TSP constraints is one of the two runtime bottlenecks for the generation of hard instances.For example, for the instance swiss42, most of the time is spent on the exact separation of a TSP cut.Finally, notice that for the instances pr76, eil76, rat99, kroB100, kroC100, the solution by branch-and-cut of IH-OPT(x x x (h) ) hits the time limit of 24 hours (86400 seconds).Among those instances, only for pr76, we do generate a hard TSP instances; in all other cases, we were not able to find an integer cost vector satisfying all triangle inequalities and all TSP constraints, that is, an optimal integer solution for problem IH-OPT(x x x (h) ).

Generating hard instances by sampling
We have also generated a collection of instances for the Hard-TSPLIB by using the random sampling procedure discussed in Section 4. First, we run Algorithm 1 to generate a random set R of vertices of P SEP , for a fixed size n of the TSP.When generating a random vertex x x x (h) in Algorithm 1, we also store the cost vector c c c (h) 0 sampled from the metric cone which yields the vertex x x x (h) .Hence, we can compute the integrality gap of the initial easy TSP instances.Later, for each random vertex x x x (h) in R, we solve the IH-OPT(x x x (h) ) problem to get an instance with a larger integrality gap.The table reports first the integrality gap of the sampled cost vector c c c 0 , of the optimal cost vector c c c * obtained after solving IH-OPT(x x x (h) ), and the gap α * n conjectured in [3], that is, to the best of our knowledge, the highest integrality gap available in the literature.Then, in the remaining columns, the table reports the average runtime (with the standard deviation) and the average number of BC nodes for solving the instances to optimality using Concorde, first for the instance c c c 0 (random cost vector) and then for c c c * (the optimal solution of IH-OPT).Similarly to the results of the Hard-TSPLIB, Table 5 shows that we are able to generate very hard instances by solving the IH-OPT problem.
We run a second experiment to study the effect of generating 1000 random instances for n = 20.For each random instance c c c (h) 0 , with h = 1, . . ., 1000, we measure the average solution runtime of Concorde.Then, using the corresponding vertex x x x (h) and solving IH-OPT(x x x (h) ), we generate the harder instance, and we measure the runtime again.Figure 1(a) shows the runtime distribution for the random instances c c c (h) 0 , while Figure 1(b) shows the runtime distribution for c c c * .In the top plot, we have a runtime close to zero: we barely get close to 0.030 seconds.On the other hand, for the bottom plot, we have mean runtime of 10 seconds, with a maximum of around 50 seconds.In practice, if we sample a large number of vertices we are able to get very challenging small instances.Table 6 reports the average runtime (in seconds) to solve the IH-OPT problem on 1000 different vertices sampled for each n from 10 to 20.The solution to this problem exhibits a very large runtime variability.For instance, with n = 20, the minimum runtime is of 2 seconds, while the maximum is of 3445 seconds, that is, three orders of magnitude larger.In addition, we have fitted a linear regression on the log 10 of the runtime, getting the following equation t = 10 0.146•n−1.86 .
Using this regression, we try to predict the runtime for each n.For example, with n = 40, we would expect a runtime of around 3 hours.However, we have not observed any significant correlation between the runtime for solving the IH-OPT problem and the average runtime for solving the hard instance with Concorde.

A comparison with the 3D-Rectilinear instances
To the best of our knowledge, the current hardest instances of the literature are the Rectilinear 3D instances provided by Zhong [38].In this work, the author proposes a family of 3-paths instances with a different configuration of nodes on the three paths and suggests the node configuration that makes them hard to solve for Concorde.For this reason, we tried to generate, using our computational procedure, metric TSP instances as hard as those 3D Rectilinear instances.We have tried the following two strategies.
(a) We massively sampled the metric polytope for small n with the algorithm described in Section 4 and then apply IH-OPT to the correspondent vertices, until we find an instance with an average runtime competitive with the 3D-Rectilinear instances.We call this found instance c c c S n .(b) We tried to use our computational procedure to generate hard instances starting from the optimal solution of the instances provided by Zhong [38].Let c c c R n the cost vector of a 3D rectilinear instance with n nodes [38], and let x x x R n the corresponding optimal solution of SEP(c c c R n ).Similarly, let c c c IH n = arg min IH-OPT(x x x R n ) and x x x IH n = arg min SEP(c c c IH n ).
We compared the average runtime of solving TSP(c c c R n ), TSP(c c c IH n ), and TSP(c c c S n ).We have observed from our preliminary tests that Concorde has great runtime variance in solving the same instance: See, for example, the standard deviation values in Table 2 and Table 5, where we performed 10 independent runs on each instance, using every time a different seed.This variability is typical of Mixed Integer Programming and branch and cut algorithm (see e.g [12], [23]).For this reason, the runtime comparison instance by instance is meaningless unless the difference is of at least one order of magnitude.Thus, we present the runtime comparison using linear regression on the logarithm (log 10 ) of the runtime measured in seconds (again, using 10 independent runs for each instance).Figure 2 shows the three regression lines for comparing the runtime for solving (i) the 3D rectilinear instances, marked with R, (ii) the instances produced using the sampling procedure mentioned above in point (a), marked with S, and (iii) the instances used with the procedure described in the point (b), marked with IH.The three runtimes are denoted by t R n , t S n , t IH n , respectively.The fitted regression lines are the following: t R n = 10 0.168 n−2.209 , t S n = 10 0.171 n−2.157 , t IH n = 10 0.137 n−1.794 .By looking at the regression, the instances that we have found by massively sampling the space are a bit harder than the Rectilinear 3-D instance.On the contrary, the instances directly obtained from vertices are easier.Interestingly, in the logarithm scale, the three instances have all the same order of magnitude: by looking at Figure 2 it is possible to observe that the three families are competitive to each other, with the differences mostly related to the variability of Concorde.Noteworthy, by observing the structure of costs of the instances created by hand as [38] with respect to the one obtained by the two heuristic procedures, we note less regularity.This fact will be discussed wider in the next subsection.In terms of the integrality gap, we do not notice any significant evidence.For n = 12, the instance we sample has an integrality gap of 1.164 while the Rectilinear 3-D instance has 1.193 and the two runtimes are competitive.On the contrary, for n = 20, we sample an instance with an integrality gap of 1.242, while the Rectilinear 3-D instance has 1.240 and yet the two instances are competitive.Lastly, we run the computation as reported by Zhong, namely by multiplying the instances by 1000 and rounding to the nearest integer.We verified that this procedure might lead to non-metric instances as, due to rounding errors, triangular inequalities might not be satisfied.

Structure of small hard instances
The hard instances recently introduced in [19] and [38] are characterized, by construction, by half-integer solutions for SEP solution, and by support graphs having two triangles where each edge has a weight equal to 1  2 .Hence, we have looked at the structure of the support graph of our hard instances.For some instances, such as bays29 and eil51, the optimal vertex moves from a complicated structure to a 3-path configuration with two triangles having 1  2 -vertices.For the instance gr48, in the original version, the vertices have entries of values in the set {0.0, 0.25, 0.5, 0.75, 1.0}, while in the corresponding hard version, they have entries in the set {0.0, 0.5, 1.0}.
In addition, for three small instances, namely gr24 and two sampled instances with n = 15 (s 15 ) and n = 20 (s 20 ), we studied "by hand" the cost structure of the support graph of the optimal solution of SEP.We selected these 3 instances since they are challenging for Concorde.The motivation of this study is to investigate if the hard instances share some common cost patterns and/or structures of the support graph.Figures 3, 4   hard version of the two instances, where the easy refer to the original cost vector c c c 0 with its SEP vertex solution, and the hard to the optimal solution c c c * of IH-OPT.In the support graphs, the dotted edges correspond to the solution of SEP having value x e = 1 2 , while the solid edges correspond to x e = 1.The missing edges have x e = 0.The label of each edge gives its cost in the TSP instance.
We observe that the main difference between the easy and hard instances is in the pattern of the edge length.In the easy instances, the edge costs look randomly distributed, while in the hard instances obtained after solving IH-OPT there is a clear cost pattern.The edges lying along the same path have nearly the same cost, while the dashed edges connecting two distinct paths generally have a cost nearly equal to the sum of the costs of a single edge on a path.In addition, the overall length of the 3-paths are almost equal: for instance, in s 15 (Figure 3), in the easy version, we have on each path a sum of, respectively, 707, 48, 1633, while in the hard one, we find 144, 147, 147.Finally, notice that since in IH-OPT we have removed the slackness constraints introduced in OPT h , we do not have any guarantee that the optimal solution for the SEP associated to c c c 0 and c c c * is the same.However, in s 15 we have the x x x 0 = x x x * , while for s 20 and gr24 we get x x x * = x x x 0 , as shown in Figure 4 and Figure 5.

Conclusions
In this work, we have introduced a computational procedure to generate metric TSP instances which have large integrality gaps and are challenging for Concorde, the state-of-the-art TSP solver.As a by-product, we have introduced the Hard-TSPLIB, a collection of small but challenging metric TSP instances, which are not generated explicitly exploiting specific cost structures, as in [19,38].Notice that, to the best of our knowledge, all the hard instances from the literature have half-integer optimal SEP solutions.On the contrary, the instances of the Hard-TSPLIB have a larger variety of fractional optimal vertices (i.e., they are not only half integers).We expect our new instances will serve as a benchmark for designing new exact and heuristic methods for solving the TSP problem.
Curiously, we have observed that the most challenging instances generated using our computational procedure have regular cost patterns, with several edges sharing the same costs, and several paths on the support graphs having the same length.These types of cost patterns are in common with the manually-generated hard instances recently introduced in [19] and [38].Hence, we believe that our Hard-TSPLIB instances could help further studies in the cost structures of TSP instances.

1 .
Pick a point x x x k ∈ P ⊂ R m .2. Generate a random direction d d d k uniformly distributed over the unitary hyper-sphere centered in x x x k .3. Generate a random point x x x k+1 = x x x k +λ d d d k uniformly distributed over the line set {x x x ∈ P | x x x = x x x k +λ d d d k , λ ∈ R}. 4. If a stopping criterion is met, stop (e.g., terminate after a fixed number of iterations).Otherwise, x x x k ← x x x k+1 and repeat from Step 2.

Figure 1 :
Figure 1: Top: distribution of the computational times for n = 20 of 1000 sampled TSP instances.Bottom: distribution of the computational times for n = 20 of 1000 instances obtained by the IH-OPT procedure.
, and 5 show the support graph of the easy and

Figure 3 :
Figure 3: Easy (top) and hard (bottom) support graph associated to one of the sampled vertex with n = 15.The cost value on the edge of the support graph is also provided.

Figure 4 :
Figure 4: Easy (top) and hard (bottom) support graph associated to one of the sampled vertex with n = 20.The cost value on the edge of the support graph is also provided.

Figure 5 :
Figure 5: Easy (top) and hard (bottom) support graph associated to the instance gr24 of the TSPLIB.The cost value on the edge of the support graph is also provided.

Table 1 :
Impact of the parameter ∆ on the instances generated by IH-OPT.

Table 2 :
Hard-TSPLIB instances generated from the TSPLIB.The columns give the instance name, number of nodes |V |, the integrality gap for the TSPLIB instance c c c 0 and the HardTSPLIB c c c * ; average runtime (and standard deviation) and number of BC nodes over 5 independent runs of Concorde, compiled with CPLEX 12.8.The last column reports if equation (19) holds.The last three instances marked with (*) are solved only once due to the large running time.The instance brazil58 reached a timeout of 24 hours.

Table 3 :
Computational results for the heuristic solution of the LP problem H-OPT(x x x (h) ) by cutting planes.At this stage, we only separate the TSP cuts with the LK-H heuristic.Columns 3 and 4 report the number of triangular inequality and TSP constraints added.Columns 5 and 6 report the runtime for separating those inequalities.The last column reports the overall runtime in seconds.

Table 4 :
Computational results for the solution of the integer problem IH-OPT(x x x (h) ) by branch-and-cut.Columns 3, 4, and 5 report the number of cuts added for each type.Columns 6, 7, and 8 report the runtime of the separation algorithms for triangular, LKH and exact TSP cuts.Column 9 reports the runtime and column 10 the number of branch-and-bound nodes required.The last two columns gives the lower (LB) and upper bounds (UB).

Table 5 :
Results for the Hard-TSP instances generated by sampling.Table 5 reports the three hardest instances we are able to provide for each n ∈ {10, 15, 20, 25, 30, 35, 40}, generating 10 random vertices for each value of n.

Table 6 :
Average computational time and standard deviation of 1000 IH-OPT on different x x x at a fixed n number of nodes.