1 Introduction

Convex integer nonlinear programs (CINLPs) are optimization problems in which the objective function is convex and the continuous relaxation of the feasible region is a convex set. Nonlinearities in CINLPs can appear in both the objective function and/or the constraints. Motivated by their numerous applications and their ability to generalize several well-known problem classes, CINLPs have been studied for decades. In this paper we focus on a specific class of CINLPs: the integer semidefinite programs (ISDPs). These problems can be formulated as:

$$\begin{aligned} \sup ~ \textbf{b}^\top \textbf{x} \quad \text {s.t.} \quad \textbf{C} - \sum _{i = 1}^m \textbf{A}_ix_i \succeq \textbf{0}, \quad \textbf{x} \in \mathbb {Z}^m, \end{aligned}$$
(1)

with \(\textbf{b} \in \mathbb {R}^m\), \(\textbf{C}, \textbf{A}_i \in \mathcal {S}^n\), where \(\mathcal {S}^n\) denotes the cone of symmetric matrices of order n. Note that \(\textbf{C} - \sum _{i = 1}^m \textbf{A}_ix_i \succeq \textbf{0}\) is referred to as a linear matrix inequality (LMI) and it is the SDP analogue of a system of linear inequalities defining a polyhedron. Since integer linear programs belong to the family of ISDPs, problems of the form (1) are generally \(\mathcal{N}\mathcal{P}\)-hard to solve.

Although CINLPs have been studied extensively, see e.g., the survey of Bonami et al. [11], the special case of ISDPs has received attention only very recently. This is remarkable, as the mixture of positive semidefiniteness and integrality leads naturally to a broad range of applications, e.g., in architecture [16, 71], signal processing [39, 55] and combinatorial optimization [40, 60]. For a more detailed overview of applications of ISDPs, we refer the reader to [40, 46].

Only a few solution approaches for solving SDPs with integrality constraints have been considered. Gally et al. [40] propose a general framework called SCIP-SDP for solving mixed integer semidefinite programs (MISDPs) using a branch-and-bound (B &B) procedure with continuous SDPs as subproblems. They show that strict duality of the relaxations is maintained in the B &B tree and study several solver components. Alternatively, Kobayashi and Takano [46] propose a cutting-plane algorithm that initially relaxes the positive semidefinite (PSD) constraint and solves a mixed integer linear programming problem, where the PSD constraint is imposed dynamically via cutting planes. This leads to a general branch-and-cut (B &C) algorithm for solving MISDPs. A third project that encounters general ISDPs is YALMIP [49]. However, it is noted by the authors of [40, 46] that the branch-and-bound ISDP solver in YALMIP is not yet competitive to the performance of the other two methods. Recently, Matter and Pfetsch [50] study different presolving strategies for MISDPs for both the B &B and B &C approach.

Apart from solution methods for solving general ISDPs or MISDPs, there are several other approaches in the literature that aim to solve integer problems by utilizing SDP relaxations in a B &B framework. Although these approaches are very related to problems of the form (1) in the sense that they also combine semidefinite programs with a branching strategy, they differ in the sense that the problem at hand is not necessarily formulated as a MISDP. Examples are the BiqCrunch solver for constrained binary quadratic problems [47] and the Biq Mac solver for unconstrained binary quadratic problems [60].

In the light of improving the performance of the B &C algorithm of [46], we consider the exploitation of cutting planes for ISDPs. Practical algorithms for CINLPs have benefited a lot from the addition of strong cutting planes, see e.g., [4, 5, 8, 65], where many of these cutting plane frameworks are based on generalizations from integer linear programming. Among the most well-known cutting planes for integer linear programs (ILPs) are the Chvátal–Gomory (CG) cuts [17, 43]. Gomory [43] introduced these cuts to design the first finite cutting plane algorithm for ILPs. Chvátal [17] later generalized this notion and introduced the closure of all such cuts that leads to a hierarchy of relaxations of the ILP with increasing strength. Chvátal [17] and Schrijver [62] prove that this hierarchy is finite for bounded real polyhedra and rational polyhedra, respectively. Later on, the CG procedure is introduced for more general convex sets, see e.g., [12, 21, 22, 25, 26]. In particular, Çezik and Iyengar [15] show how to generate CG cuts for CINLPs where the continuous relaxation of the feasible region is conic representable.

A leading application in this work is a combinatorial optimization problem that can be modelled as an ISDP: the quadratic traveling salesman problem (QTSP). Jäger and Molitor [45] introduce the QTSP as the problem of finding a Hamiltonian cycle in a graph that minimizes the total interaction costs among consecutive arcs. The problem is motivated by an important application in bioinformatics [35, 45], but has also applications in telecommunication, precision farming and robotics, see e.g., [1, 30, 68]. The QTSP is \(\mathcal{N}\mathcal{P}\)-hard in the strong sense and is currently considered as one of the hardest combinatorial optimization problems to solve in practice.

Several papers have studied the QTSP. In [33, 34, 37] the polyhedral structure of the asymmetric and symmetric QTSP—polytope is discussed. Rostami et al. [61] provide several lower bounding procedures for the QTSP, including a column generation approach. Woods and Punnen [69] provide different classes of neighbourhoods for the QTSP, while Staněk et al. [63] discuss several heuristics for the quadratic traveling salesman problem in the plane. The linearization problem for the QTSP is studied in [56]. Fischer et al. [35, 36] introduce several exact algorithms and heuristics for the asymmetric QTSP, while Aichholzer et al. [2] consider exact solution methods for the minimization and maximization version of the symmetric QTSP.

1.1 Main results and outline

In this paper we consider the Chvátal–Gomory procedure for ISDPs from a theoretical as well as a practical point of view. On the theoretical side, we derive several results on the elementary closure of all CG cuts for spectrahedra. On the practical side, we show how to apply these cuts in a generic branch-and-cut algorithm for ISDPs that exploits both the positive semidefiniteness and the integrality of the problem. We extensively study the application of this new approach to the QTSP, which confirms the practical strength of the proposed method.

We start by reformulating a CG cut for a spectrahedron in terms of its data matrices in combination with the elements from the dual cone. This leads to a constructive description of the elementary closure of spectrahedra rather than the implicit description that is known for general convex sets. Equivalent to the case of polyhedra, the elementary closure operation can be repeated, leading to a hierarchy of stronger approximations of the integer hull of the spectrahedron. For the case of bounded spectrahedra, we provide a compact proof of a homogeneity property for the elementary closure operation that is based on a theorem of alternatives and Dirichlet’s approximation theorem. We prove this property for halfspaces that are sufficient to describe any compact convex set. Homogeneity is the cornerstone in showing that the elementary closure of a bounded spectrahedron is polyhedral. Although the latter result is known in the literature, our proof significantly simplifies compared to the general proofs given in [12, 22]. Finally, we exploit the recently introduced notion of total dual integrality for SDPs [13] to derive a closed-form expression for the elementary closure of spectrahedra defined by a totally dual integral linear matrix inequality. We additionally provide a characterization of bounded spectrahedra with this property and several more general sufficient conditions.

It is known that the practical strength of CG cuts in integer linear programming is mainly due to their application in branch-and-bound methods. In this vein, we propose a generic branch-and-cut (B &C) framework for ISDPs. Our algorithm initially relaxes the PSD constraint and solves a mixed integer linear program (MILP), where the PSD constraint is imposed iteratively via CG and/or strengthened CG cuts. To derive strengthened CG cuts, we use a similar approach to the one for rational polyhedra by Dash et al. [24]. Our B &C algorithm is an extension of the algorithm of [46], in which separation is only based on positive semidefiniteness without taking into account the integrality of the variables. Our approach also builds up on the work by Çezik and Iyengar [15], in which the authors leave the separation of CG cuts for conic problems as an open problem and do not include these cuts in their computational study. We provide an example of our approach for a common class of binary SDPs that frequently appears in combinatorial optimization.

In the third part of this paper we apply our results to a difficult-to-solve combinatorial optimization problem: the quadratic traveling salesman problem. We derive two ISDP formulations of this problem based on the notion of algebaic connectivity. To solve these models using our B &C algorithm, we propose several CG separation routines and show that various of these routines lead to well-known cuts for the QTSP. Computational results on a large set of benchmark QTSP instances show that the practical potential of our new method is twofold. The method significantly outperforms the ISDP solvers from the literature, whereas it also provides competitive results to the state-of-the-art QTSP solution method of [35].

The paper is organized as follows. In Sect. 2 we study the Chvátal–Gomory procedure for spectrahedra. Section 3 provides a CG-based B &C framework for general ISDPs and provides specific CG separation routines for two classes of binary SDPs. In Sect. 4 we formally define the QTSP and present two ISDP formulations of this problem. Numerical results are given in Sect. 5.

1.2 Notation

A directed graph is given by \(G = (N,A)\), where N is a set of nodes and \(A \subseteq N \times N\) is a set of arcs. We use \(K_n\) to denote the complete directed graph on n nodes, i.e., a directed graph in which every pair of nodes is connected by a bidirectional edge.

We denote by \(\textbf{0}_n \in \mathbb {R}^n\) the vector of all zeros, and by \(\textbf{1}_n \in \mathbb {R}^n\) the vector of all ones. The identity matrix and the matrix of ones of order n are denoted by \(\textbf{I}_{\textbf{n}}\) and \(\textbf{J}_{\textbf{n}}\), respectively. We omit the subscripts of these matrices when there is no confusion about the order. The i-th elementary vector is denoted by \(\textbf{e}_i\) and we define \(\mathbf{E_{ij}}:= \textbf{e}_i\textbf{e}_j^\top \). For any two matrices \(\textbf{A}\) and \(\textbf{B}\), the direct sum is defined as .

The set of integer numbers and non-negative integer numbers is denoted by \(\mathbb {Z}\) and \(\mathbb {Z}_+\), respectively. For any integer vector \({\varvec{c}} \in \mathbb {Z}^m\), we let \(\gcd ({\varvec{c}})\) denote the greatest common divisor of the entries in \({\varvec{c}}\). We define the floor (resp. ceil) operator \(\lfloor \cdot \rfloor \) (resp. \(\lceil \cdot \rceil \)) as the largest (resp. smallest) integer smaller (resp. larger) than or equal to the input number. For \(n \in \mathbb {Z}_+\), we define the set \([n]:= \{1, \ldots , n\}\). Also, for any \(S \subseteq [n]\), we let \(\mathbb {1}_{\textbf{S}}\) be the binary indicator vector of S.

We let \(\mathcal {S}^n\) be the set of all \(n \times n\) real symmetric matrices and denote by \(\textbf{X} \succeq \textbf{0}\) that a symmetric matrix \(\textbf{X}\) is positive semidefinite. We use \(\textbf{X} \succneqq \textbf{0}\) to denote that \(\textbf{X}\) is positive semidefinite, but not equal to the zero matrix. The cone of symmetric positive semidefinite matrices is defined as \(\mathcal {S}^n_+:= \{ \textbf{X} \in \mathcal {S}^n \,: \, \, \textbf{X} \succeq \textbf{0} \}\). The trace of a square matrix \(\textbf{X}=(x_{ij})\) is given by \(\text {tr}(\textbf{X})=\sum _{i}x_{ii}\). For any \(\textbf{X},\textbf{Y} \in \mathbb {R}^{n \times n}\) the trace inner product is defined as \(\langle \textbf{X}, \textbf{Y} \rangle := \text {tr}(\textbf{X}^\top \textbf{Y}) = \sum _{i = 1}^n\sum _{j = 1}^n x_{ij}y_{ij}\).

The operator \(\text {diag}: \mathbb {R}^{n \times n} \rightarrow \mathbb {R}^n\) maps a square matrix to a vector consisting of its diagonal elements. We denote by \(\text {Diag}: \mathbb {R}^n \rightarrow \mathbb {R}^{n \times n}\) its adjoint operator.

2 The Chvátal–Gomory procedure for ISDPs

In this section we study the extension of the cutting-plane procedure by Chvátal [17] and Gomory [43] for integer linear programs to the class of integer semidefinite programs. We show that several concepts, such as the Chvátal–Gomory closure and the Chvátal rank, can be generalized to ISDPs. We start by recollecting the procedure for general convex sets.

2.1 The Chvátal–Gomory procedure

Let \(C \subseteq \mathbb {R}^m\) be a non-empty closed convex set and let \(C_I\) be its integer hull, i.e., \(C_I:= \text {Conv}(C \cap \mathbb {Z}^m )\). The Chvátal–Gomory cutting-plane procedure is introduced by Chvátal [17] and Gomory [43] and is regarded to be among the most celebrated results in integer programming. The CG procedure aims at systematically identifying valid inequalities for C that cut off non-integer solutions. By adding these new cuts to the relaxation and repeating this process, one obtains a hierarchy of stronger relaxations that converges to \(C_I\).

The CG procedure relies on the notion of rational halfspaces. A rational halfspace is of the form \(H = \{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} \le d \}\) for some \(\textbf{c} \in \mathbb {Q}^m, d \in \mathbb {Q}\). It is known that all such halfspaces can be represented by \(\textbf{c} \in \mathbb {Z}^m\) such that the entries of \(\textbf{c}\) are relatively prime. If \(H = \{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} \le d \}\) with \(\textbf{c} \in \mathbb {Z}^m\), \(\gcd (\textbf{c}) = 1\), then \(H_I = \{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} \le \lfloor d \rfloor \}\).

Definition 1

The elementary closure of a closed convex set C is the set

$$\begin{aligned} \text {cl}_{CG}(C) := \bigcap _{\begin{array}{c} (\textbf{c}, d) \in \mathbb {Q}^m \times \mathbb {Q} \\ C \subseteq H = \{\textbf{x} \, : \, \, \textbf{c}^\top \textbf{x} \le d \} \end{array}} H_I. \end{aligned}$$
(2)

Equivalently, the elementary closure of C can be written as:

$$\begin{aligned} \text {cl}_{CG}(C) = \bigcap _{\begin{array}{c} (\textbf{c}, d) \in \mathbb {Z}^m \times \mathbb {R} \\ C \subseteq \{\textbf{x} \, : \, \, \textbf{c}^\top \textbf{x} \le d \} \end{array}} \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \textbf{c}^\top \textbf{x} \le \lfloor d \rfloor \right\} , \end{aligned}$$
(3)

and we will primarily use this form in this work. The inequalities that define \(\text {cl}_{CG}(C)\) in (3) are known as CG cuts [43]. One can verify that \(C_I \subseteq \text {cl}_{CG}(C)\). When C is compact, we can exploit the following proposition due to Dadush et al. [21] and De Carli Silva and Tunçel [13].

Proposition 1

If \(C \subseteq \mathbb {R}^m\) is a compact convex set, then

$$\begin{aligned} C = \bigcap _{\begin{array}{c} (\textbf{c}, d) \in \mathbb {Z}^m \times \mathbb {R} \\ C \subseteq \{\textbf{x} \, : \, \, \textbf{c}^\top \textbf{x} \le d \} \end{array}} \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \textbf{c}^\top \textbf{x} \le d\right\} . \end{aligned}$$

It follows from Proposition 1 that for compact convex sets C we have \(\text {cl}_{CG}(C) \subseteq C\). We can now repeat the procedure by defining \(C^{(0)}:= C\) and \(C^{(k+1)}:= \text {cl}_{CG}(C^{(k)})\) for all integer \(k \ge 0\), where \(C^{(k)}\) is referred to as the kth CG closure of C. For any compact convex set C this leads to the hierarchy \(C_I \subseteq \ldots \subseteq C^{(k+1)} \subseteq C^{(k)} \subseteq \ldots \subseteq C^{(0)} = C\). The smallest k for which \(C_I = C^{(k)}\) is known as the Chvátal rank of C. In the same vein, the Chvátal rank of an inequality \(\textbf{c}^\top \textbf{x} \le d\) valid for \(C_I\) is defined as the smallest k such that \(C^{(k)} \subseteq \{ \textbf{x} \in \mathbb {R}^m \,: \,\, \textbf{c}^\top \textbf{x} \le d\}\).

Remark 1

Observe that for an unbounded closed convex set C, \(\text {cl}_{CG}(C) \subseteq C\) does not have to hold. For instance, the irrational halfspace \(\{ \textbf{x} \in \mathbb {R}^2 \,: \, \, x_1 + \sqrt{2} x_2 \le 0\}\) is not contained in any halfspace of the form \(\{\textbf{x} \in \mathbb {R}^2 \,: \, \, \textbf{c}^\top \textbf{x} \le d\}\) with \(\textbf{c} \in \mathbb {Z}^2\). Therefore, \(\text {cl}_{CG}(C)\) is the intersection over an empty set of halfspaces, resulting in \(\text {cl}_{CG}(C) = \mathbb {R}^2\).

The finiteness of the Chvátal rank is proven in the literature for bounded real polyhedra [17], unbounded rational polyhedra [62] and conic representable sets in the 0/1-cube [15]. However, the Chvátal rank for unbounded real polyhedra can be infinite as shown by Schrijver [62]. Schrijver also shows that the elementary closure of a rational polyhedron is a rational polyhedron. This result is later generalized to irrational polytopes [26], bounded rational ellipsoids [25], strictly convex bodies [21] and general compact convex sets [12, 22]. As a consequence, the Chvátal rank of these sets is also known to be finite.

2.2 The elementary closure of spectrahedra

We now apply the notions from Sect. 2.1 to integer semidefinite programming problems in standard primal and dual forms. On top of the general definition given in the previous section, we derive alternative formulations of the elementary closure of spectrahedra.

Let \(\textbf{b} \in \mathbb {R}^m\), \(\textbf{C} \in \mathcal {S}^n\) and \(\mathbf{A_i} \in \mathcal {S}^n\) for all \(i \in [m]\). An ISDP in standard primal form is given by:

$$\begin{aligned} (P_{ISDP})&\left\{ \begin{aligned} \inf \quad&\langle \textbf{C}, \textbf{X} \rangle \\ \text {s.t.} ~~&\langle \mathbf{A_i} , \textbf{X} \rangle = b_i \quad \forall i \in [m], ~~\textbf{X} \succeq \textbf{0}, \,\, \textbf{X} \in \mathbb {Z}^{n \times n}, \end{aligned} \right. \end{aligned}$$
(4)

while an ISDP in standard dual form is given by:

$$\begin{aligned} (D_{ISDP})&\left\{ \begin{aligned} \sup \quad&\textbf{b}^\top \textbf{x} \\ \text {s.t.} \quad&\textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0}, ~~\textbf{x} \in \mathbb {Z}^m. \end{aligned} \right. \end{aligned}$$
(5)

Using standard techniques, one can syntactically rewrite an integer SDP from primal form to dual form and vice versa. Consistent with most of the literature, we mainly consider, but not restrict ourselves to, ISDPs in dual form.

The continuous relaxation of the feasible set of (5) is defined as follows:

$$\begin{aligned} P := \left\{ \textbf{x} \in \mathbb {R}^m \, \, : \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i} x_i \succeq \textbf{0} \right\} . \end{aligned}$$
(6)

The set P is a spectrahedron that is closed, semialgebraic and convex, which we assume to be non-empty. Throughout the paper, we make the following non-restrictive assumption on the linear matrix inequality defining P. In case P is not full-dimensional, i.e., the subspace \(\mathcal {L}:= \text {Aff}(P)^\perp \) is nontrivial, we extend \(\textbf{C}\) and \(\mathbf{A_i}\), \(i \in [m]\), to

$$\begin{aligned} \textbf{C} \oplus \text {Diag}(\textbf{L}{} \mathbf{x_0}) \oplus -\text {Diag}(\mathbf{Lx_0}) \quad \text {and}\quad \mathbf{A_i} \oplus \text {Diag}(\mathbf{\ell _i}) \oplus -\text {Diag}(\mathbf{\ell _i})~\text {for all } i \in [m] \end{aligned}$$

where \(\textbf{L}:= [\mathbf{\ell _1}~\dots ~\mathbf{\ell _m}] \in \mathbb {R}^{\dim (\mathcal {L}) \times m}\) is a matrix whose rows form a basis for \(\mathcal {L}\) and \(\mathbf{x_0} \in P\). Observe that the resulting extended map has no effect on the spectrahedron P itself. We only include it to obtain a more proper algebraic representation, see also [51, 57]. We define the integer hull of P to be \(P_I:= \text {Conv}(P \cap \mathbb {Z}^m)\), i.e., the convex hull of the integral points in P. We briefly consider some illustrative examples of spectrahedra and their integer hulls.

Example 1

(Examples in \(\mathbb {R}^2\)) Let . Then, the induced spectrahedron P in the dual form (6) is the semialgebraic set of points in \(\mathbb {R}^2\) described by the quadratic inequality \(4x_1^2 + x_2^2 \le 15x_1 + 4\frac{1}{2}x_2 - 1\frac{1}{2}x_1x_2 - 9\). This spectrahedron is bounded and given in Fig. 1a.

Let Q be described by (6) with . The spectrahedron Q is the unbounded semialgebraic set \(\{\textbf{x} \in \mathbb {R}^2 \,: \, \, x_2 \ge \frac{1}{2}x_1^2\}\), see Fig. 1b.

Fig. 1
figure 1

Spectrahedra P and Q defined in Example 1. Their corresponding integer hulls are given by the dark gradient areas

Example 2

(Example in \(\mathbb {R}^3\)) Let and let P be the induced spectrahedron of the form  (6). Then, P is the semialgebraic set in \(\mathbb {R}^3\) described by the inequalities \(\frac{5}{4}x_1^2 + \frac{9}{100}x_2^2 + \frac{11}{2}x_3^2 \le -2 + 3x_1 + \frac{12}{5}x_2 + 10x_3 - \frac{9}{10} x_1x_2 + \frac{3}{5}x_2x_3 + \frac{3}{2}x_1x_3\), \(1 + x_1 + \frac{3}{5}x_2 - \frac{1}{2}x_3 \ge 0\), \(2 - x_1 + 3x_3 \ge 0\), \(-5 \le x_2\) and \(x_2 \le 5\), see Fig. 2.

Fig. 2
figure 2

Spectrahedron P in \(\mathbb {R}^3\) defined in Example 2

In the remaining part of this section we study the elementary closure, see Definition 1, of spectrahedra in primal and dual standard forms.

Using the fact that a matrix \(\textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i\) is positive semidefinite if and only if \(\langle \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i, \textbf{U} \rangle \ge 0\) for all \(\textbf{U} \in \mathcal {S}^n_+\), we can rewrite P as follows:

$$\begin{aligned} P&= \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \langle \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i, \textbf{U} \rangle \ge 0 , \, \textbf{U} \in \mathcal {S}^n_+ \right\} \nonumber \\&= \bigcap _{\textbf{U} \in \mathcal {S}^n_+} \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \sum _{i = 1}^m x_i \langle \mathbf{A_i}, \textbf{U} \rangle \le \langle \textbf{C}, \textbf{U} \rangle \right\} . \end{aligned}$$
(7)

Moreover, since P is a closed convex set, we can write P as the intersection of the halfspaces that contain it:

$$\begin{aligned} P = \bigcap _{\begin{array}{c} (\textbf{c},d) \in \mathbb {R}^{m+1} \\ P \subseteq \{\textbf{x} \, : \, \, \textbf{c}^\top \textbf{x} \le d \} \end{array}} \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \textbf{c}^\top \textbf{x} \le d \right\} . \end{aligned}$$
(8)

It is clear that all halfspaces in the intersection of (7) are contained in the intersection (8). The converse statement is also true, as stated by the following theorem. This theorem is proven in [51] and the result is related to the algebraic polar studied in [57].

Theorem 1

[51, 57] Let \(P = \{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0} \}\) be a non-empty spectrahedron. Let \((\textbf{c},d) \in \mathbb {R}^{m+1}\) be such that \(P \subseteq \{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} \le d \}\). Then there exists a matrix \(\textbf{U} \in \mathcal {S}^n_+\) such that \(\langle \mathbf{A_i}, \textbf{U} \rangle = c_i\) for all \(i \in [m]\) and \(\langle \textbf{C}, \textbf{U} \rangle \le d\).

Using the representation of P given by (7) and the result of Theorem 1, we now provide an alternative formulation of the elementary closure for spectrahedra of the form P. We have,

$$\begin{aligned} \text {cl}_{CG}(P) = \bigcap _{\begin{array}{c} \textbf{U} \in \mathcal {S}^n_+ \, \text {s.t.} \\ \langle \mathbf{A_i}, \textbf{U} \rangle \in \mathbb {Z}, ~i \in [m] \end{array}} \left\{ x \in \mathbb {R}^m \, : \, \, \sum _{i = 1}^m x_i \langle \mathbf{A_i}, \textbf{U} \rangle \le \lfloor \langle \textbf{C}, \textbf{U} \rangle \rfloor \right\} . \end{aligned}$$
(9)

Hence, any possible CG cut for a spectrahedron is constructed by a matrix \(\textbf{U} \in \mathcal {S}^n_+\) such that \(\langle \mathbf{A_i}, \textbf{U} \rangle \in \mathbb {Z}\) for \(i \in [m]\).

A similar alternative definition of the elementary closure of spectrahedra in standard primal form can be obtained. Let \(Q \subseteq \mathcal {S}^n\) denote the continuous relaxation of the feasible set of (4), i.e.,

$$\begin{aligned} Q&= \left\{ \textbf{X} \in \mathcal {S}^n \, : \, \, \langle \mathbf{A_i}, \textbf{X} \rangle = b_i, i \in [m], \, \textbf{X} \succeq \textbf{0} \right\} \\&= \left\{ \textbf{X} \in \mathcal {S}^n \, : \, \, \langle \mathbf{A_i}, \textbf{X} \rangle = b_i, i \in [m], \, \langle \textbf{X}, \textbf{U} \rangle \ge 0, \, \textbf{U} \in \mathcal {S}^n_+ \right\} \\&= \left\{ \textbf{X} \in \mathcal {S}^n \, : \, \, \left\langle \textbf{X}, \textbf{U} + \sum _{i = 1}^m \mathbf{A_i} \lambda _i \right\rangle \ge \sum _{i = 1}^m b_i \lambda _i, \, \textbf{U} \in \mathcal {S}^n_+, \, \varvec{\lambda } \in \mathbb {R}^m \right\} , \end{aligned}$$

where the last equality follows from the fact that the choices \((\textbf{U}, \varvec{\lambda }) = (\textbf{0}, \textbf{e}_i )\) and \((\textbf{U}, \varvec{\lambda } ) = (\textbf{0}, -\textbf{e}_i)\) lead to the cuts \(\langle \mathbf{A_i}, \textbf{X} \rangle \ge b_i\) and \(\langle \mathbf{A_i}, \textbf{X} \rangle \le b_i\), respectively. Now, the elementary closure of Q can be described by the following intersection of CG cuts:

$$\begin{aligned} \text {cl}_{CG}(Q) = \bigcap _{\begin{array}{c} (\textbf{U},\varvec{\lambda }) \in \mathcal {S}^n_+ \times \mathbb {R}^m \, \text {s.t.} \\ \textbf{U} + \sum _{i =1}^m \mathbf{A_i} \lambda _i \in \mathbb {Z}^{n \times n} \end{array}} \left\{ \textbf{X} \in \mathcal {S}^n \, : \, \, \left\langle \textbf{X}, \textbf{U} + \sum _{i = 1}^m \mathbf{A_i} \lambda _i \right\rangle \ge \Bigg \lceil \sum _{i = 1}^m b_i \lambda _i \Bigg \rceil \right\} . \end{aligned}$$
(10)

For many SDPs resulting from applications the spectrahedra that define the feasible sets are contained in the cone of non-negative vectors or matrices. When \(P \subseteq \mathbb {R}^m_+\) or \(Q \subseteq \{\textbf{X} \in \mathbb {R}^{n \times n} \,: \, \, \textbf{X} \ge \textbf{0}\}\), alternative equivalent formulations of the elementary closure can be given, see also [15].

Theorem 2

Let \(P = \left\{ \textbf{x} \in \mathbb {R}^m_+ \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0} \right\} \) be a non-empty spectrahedron. Then \(\text {cl}_{CG}(P)\) can equivalently be written as

$$\begin{aligned} \text {cl}_{CG}(P) = \bigcap _{\textbf{U} \in \mathcal {S}^n_+} \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \sum _{i = 1}^m x_i \lfloor \langle \mathbf{A_i}, \textbf{U} \rangle \rfloor \le \lfloor \langle \textbf{C}, \textbf{U} \rangle \rfloor \right\} . \end{aligned}$$
(11)

Similarly, let \(Q = \left\{ \textbf{X} \in \mathcal {S}^n \,: \, \, \langle \mathbf{A_i}, \textbf{X} \rangle = b_i, i \in [m], \textbf{X} \succeq \textbf{0}, \textbf{X} \ge \textbf{0} \right\} \). Then \(\text {cl}_{CG}(Q)\) can equivalently be written as

$$\begin{aligned} \text {cl}_{CG}(Q) = \bigcap _{(\textbf{U},\varvec{\lambda }) \in \mathcal {S}^n_+ \times \mathbb {R}^m } \left\{ \textbf{X} \in \mathcal {S}^n \, : \, \, \left\langle \textbf{X}, \Bigg \lceil \textbf{U} + \sum _{i = 1}^m \mathbf{A_i} \lambda _i \Bigg \rceil \right\rangle \ge \Bigg \lceil \sum _{i = 1}^m b_i \lambda _i \Bigg \rceil \right\} . \end{aligned}$$
(12)

Proof

We prove the statement for the dual form (11). The proof for the primal form is similar.

Let \(\overline{\text {cl}_{CG}(P)}:= \bigcap _{\textbf{U} \in \mathcal {S}^n_+} \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, \sum _{i = 1}^m x_i \lfloor \langle \mathbf{A_i}, \textbf{U} \rangle \rfloor \le \lfloor \langle \textbf{C}, \textbf{U} \rangle \rfloor \right\} \) and let \(\text {cl}_{CG}(P)\) be as given in (9). The inclusion \(\overline{\text {cl}_{CG}(P)} \subseteq \text {cl}_{CG}(P)\) is obvious, as any halfspace in the intersection defining \(\text {cl}_{CG}(P)\) is also in the intersection defining \(\overline{\text {cl}_{CG}(P)}\). Now, consider a halfspace \(\bar{H} = \{ \textbf{x} \in \mathbb {R}^m \,: \, \, \sum _{i = 1}^m x_i \lfloor \langle \mathbf{A_i}, \textbf{U} \rangle \rfloor \le \lfloor \langle \textbf{C}, \textbf{U} \rangle \rfloor \}\) for some \(\textbf{U} \in \mathcal {S}^n_+\), that is included in the intersection defining \(\overline{\text {cl}_{CG}(P)}\). Since \(P \subseteq \mathbb {R}^n_+\), we know

$$\begin{aligned} P \subseteq \left\{ \textbf{x} \in \mathbb {R}^m_+ \, : \, \, \sum _{i = 1}^m x_i \langle \mathbf{A_i}, \textbf{U} \rangle \le \langle \textbf{C}, \textbf{U} \rangle \right\}&\subseteq \left\{ \textbf{x} \in \mathbb {R}^m_+ \, : \, \, \sum _{i = 1}^m x_i \lfloor \langle \mathbf{A_i}, \textbf{U} \rangle \rfloor \le \langle \textbf{C}, \textbf{U} \rangle \right\} \\&\subseteq \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \sum _{i = 1}^m x_i \lfloor \langle \mathbf{A_i}, \textbf{U} \rangle \rfloor \le \langle \textbf{C}, \textbf{U} \rangle \right\} . \end{aligned}$$

Now we apply Theorem 1 to the latter halfspace. It follows that there exists a matrix \(\textbf{V} \in \mathcal {S}^n_+\) such that

$$\begin{aligned} \langle \mathbf{A_i}, \textbf{V} \rangle = \lfloor \langle \mathbf{A_i}, \textbf{U} \rangle \rfloor \quad \text {for all } i \in [m], \quad \text {and} \quad \langle \textbf{C}, \textbf{V} \rangle \le \langle \textbf{C}, \textbf{U} \rangle . \end{aligned}$$

We define the halfspace \(H:= \{ \textbf{x} \in \mathbb {R}^m \,: \, \, \sum _{i = 1}^m x_i \langle \mathbf{A_i}, \textbf{V} \rangle \le \lfloor \langle \textbf{C}, \textbf{V}\rangle \rfloor \}\). Since \(\lfloor \langle \textbf{C}, \textbf{V} \rangle \rfloor \le \lfloor \langle \textbf{C}, \textbf{U} \rangle \rfloor \), it follows that the halfspace \(\bar{H}\) contains the halfspace H, while H is contained in the intersection of \(\text {cl}_{CG}(P)\) given in (9). Since this construction can be repeated for all halfspaces in the intersection (11) defining \(\overline{\text {cl}_{CG}(P)}\), it follows that \(\text {cl}_{CG}(P) \subseteq \overline{\text {cl}_{CG}(P)}\). \(\square \)

Example 3

Let us reconsider the bounded spectrahedron P defined in Example 1. The elementary closure \(\text {cl}_{CG}(P)\) of this spectrahedron is the intersection of six rational halfspaces, represented by the dashed lines in Fig. 3. Each such halfspace is obtained from a rational halfspace \(\{\textbf{x} \in \mathbb {R}^2 \,: \, \, \textbf{c}^\top \textbf{x} \le d\}\) containing P, where d is shifted towards \(P_I\) until the corresponding hyperplane hits an integral point. The integer hull \(P_I\) is the intersection of only five halfspaces. Thus, for this example we have \(P_I \subsetneq \text {cl}_{CG}(P) \subsetneq P\).

Fig. 3
figure 3

Spectrahedron P, its integer hull \(P_I\) and its elementary closure \(\text {cl}_{CG}(P)\)

In Sect. 2.4 we provide a polyhedral description of the elementary closure of spectrahedra that satisfy the notion of total dual integrality.

2.3 The Chvátal rank of bounded spectrahedra

In this section we derive several results on the sequence of relaxations resulting from the Chvátal–Gomory procedure. Although some of these results are already known for general compact convex sets, we provide simplified proofs for the case of bounded spectrahedra. Throughout this section we assume P to be a spectrahedron of the form (6) that is bounded. For unbounded sets it is in general not even clear whether \(C^{(k+1)} \subseteq C^{(k)}\).

It is known that the Chvátal rank of a compact convex set is finite, including the special case of bounded spectrahedra. This result follows from the polyhedrality result of Dadush et al. [22] and the folklore that the Chvátal rank of a rational polytope is finite due to Chvátal [17].

Proposition 2

[17, 22]) Let \(P = \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0} \right\} \) be bounded. Then, \(P^{(k)} = P_I\) for some finite k.

Next, we aim to prove a homogeneity property of the CG procedure for bounded spectrahedra, which states that the elementary closure operation commutes with taking the intersection with supporting hyperplanes. This property plays a key role in showing that the elementary closure of P is a rational polytope, following the proof of Braun and Pokutta [12]. We provide a simplified proof of this property for bounded spectrahedra, which can be seen as the conic analogue to a polyhedral result of Schrijver [62]. In the proof we restrict ourselves to halfspaces of the form \(\{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{w}^\top \textbf{x} \le d\}\) where \(\textbf{w} \in \mathbb {Z}^m\) and \(d \in \mathbb {R}\). It follows from Proposition 1 that these halfspaces are sufficient to describe a compact convex set.

Before we show the main theorem, we need a chain of intermediate results, starting with a proposition regarding the condition of Proposition 7.

Proposition 3

Let \(P = \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0} \right\} \) be a non-empty and bounded spectrahedron. Then there does not exist an \(\textbf{x} \in \mathbb {R}^m\) such that \(\sum _{i = 1}^m \mathbf{A_i}x_i \succneqq \textbf{0}\).

Proof

See “Appendix 1”. \(\square \)

We also need Dirichlet’s approximation theorem and its weakened version.

Proposition 4

(Dirichlet’s Approximation Theorem) Let \(d \in \mathbb {R}\) and \(N \ge 2\) be a positive integer. Then there exist integers p and q with \(1\le p \le N\) such that \(|pd - q| \le \frac{1}{N}\).

We now derive its one-sided variant below.

Corollary 1

(One-sided Approximation Theorem) Let \(d \in \mathbb {R}\) and \(N \ge 2\) be a positive integer number. Then there exists an integer \(p \in \mathbb {Z}_+\) such that \(d - \lfloor p d \rfloor \le \frac{1}{N}.\)

Proof

See “Appendix 1”. \(\square \)

We are now ready to present a simplified proof of Braun and Pokutta [12] for the homogeneity property of the elementary closure of bounded spectrahedra, see also Proposition 1 in [22].

Theorem 3

(Homogeneity property of elementary closure) Let \(P = \{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0} \}\) be a bounded spectrahedron that is contained in a halfspace \(\{\textbf{x} \in \mathbb {R}^m: \textbf{w}^\top \textbf{x} \le d\}\) with \(\textbf{w} \in \mathbb {Z}^m\) and \(d \in \mathbb {R}\). Let \(K:= \{\textbf{x} \in \mathbb {R}^m: \textbf{w}^\top \textbf{x} = d\}\). Then \(\text {cl}_{CG}(P) \cap K = \text {cl}_{CG}(P \cap K)\).

Proof

See “Appendix 1”. \(\square \)

The result of Theorem 3 holds for any halfspace \(\{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{w}^\top \textbf{x} \le d\}\) with \(\textbf{w} \in \mathbb {Z}^m\) containing P. In particular, it holds for all such halfspaces that support P, meaning that \(P \cap K \ne \emptyset \), where K is the corresponding hyperplane. In such case, the set \(P \cap K\) defines a face of the spectrahedron. It is known that all proper faces of spectrahedra are exposed, meaning that they can be obtained as the intersection of P with a supporting hyperplane. Note, however, that for the faces of bounded spectrahedra these hyperplanes are not necessarily such that the entries in \(\textbf{w}\) are integral, even if the data matrices describing the spectrahedron are rational (as is the case for polyhedra).

Homogeneity plays a key role in Braun and Pokutta’s [12] proof for the polyhedrality of the elementary closure of compact convex sets. For the sake of completeness, we include this result here for the case of bounded spectrahedra.

Theorem 4

(Dadush et al. [22], Braun and Pokutta [12]) The elementary closure \(\text {cl}_{CG}(P)\) of a bounded spectrahedron P is a rational polytope.

From Theorem 4 and the fact that the elementary closure of a rational polytope is again a rational polytope [62], it follows that the finite sequence \( P = P^{(0)} \supseteq P^{(1)} \supseteq \ldots \supseteq P^{(k)} \supseteq P^{(k+1)} \supseteq \ldots \supseteq P_I, \) consists of rational polyhedra from the first closure onwards. Observe that the boundedness assumption cannot be relaxed. Indeed, if P is unbounded, it is not even clear whether \(P_I\) is a polyhedron, as the following example suggests.

Example 4

Consider the spectrahedron Q in Example 1. The integer hull \(Q_I\) is the convex hull of the integer points in the epigraph of \(f(x_1) = \frac{1}{2}x_1^2\). This convex hull is not polyhedral. To verify this, observe that the recession cone of \(Q_I\) is contained in the recession cone of Q, which is \( \text {rec}(Q):= \{\textbf{x} \in \mathbb {R}^2 \,: \, \, x_2 \ge 0,~x_1 = 0\}\). Since \(Q_I\) is unbounded and \(\text {rec}(Q)\) has only one ray, the recession cone of \(Q_I\) must also be \(\text {rec}(Q)\). If \(Q_I\) would be polyhedral, this implies that the halfspace \(x_1 \le N\) supports \(Q_I\) for some finite value of N. However, this cannot be true as \(Q_I\) contains integral points \((x_1, x_2) \in \mathbb {Z}^2\) for arbitrarily large \(x_1\).

One can verify that \(\text {cl}_{CG}(Q) = Q_I\). Namely, each facet of \(Q_I\) is induced by a line between the points \((2k, 2k^2), (2(k-1),2(k-1)^2) \in \mathbb {Z}^2\) for any \(k \in \mathbb {Z}\). Let such line for a fixed k be described by \(x_2 = cx_1 + d\) with \(c,d \in \mathbb {Z}\). Then, the parallel line \(x_2 = cx_1 + d - 1\) lies strictly below Q. This implies that the halfspace \(x_2 \ge cx_1 + d - 1 + \epsilon \) for any \(\epsilon > 0\) contains Q and that its integer hull is \(x_2 \ge cx_1 + d\). Therefore, all facet-defining inequalities of \(Q_I\) have Chvátal rank one and \(\text {cl}_{CG}(Q) = Q_I\). This shows that \(\text {cl}_{CG}(Q)\) is not a polyhedron.

2.4 The elementary closure of spectrahedra and total dual integrality

In this section we derive a class of spectrahedra for which we can find an explicit expression for the elementary closure. For rational polyhedra such an expression can be derived from a totally dual integral representation of the linear system [62]. It is therefore not surprising that a similar construction can be applied for bounded spectrahedra, albeit with a bit more technicalities. After connecting total dual integrality for SDPs to the elementary closure, we derive a characterization and several sufficient conditions for a linear matrix inequality to be totally dual integral.

Recently, De Carli Silva and Tunçel [13] introduced a notion of total dual integrality for SDPs. The authors of [13] argue that the term integrality in SDPs should be defined with care. For instance, the rank-one property that is sometimes used in the literature as the notion of SDP integrality is proven to be primal-dual asymmetric and therefore not the favoured choice. Instead, the authors of [13] propose a notion of SDP integrality that is based on a set of integer generating matrices.

Definition 2

(Property \((\text {P}\mathbb {Z})_{\mathcal {V}}\)) Let \(\mathcal {V}:= \{\mathbf{V_1}, \ldots , \mathbf{V_k}\} \subseteq \mathcal {S}^n_+\) be a finite set of integer PSD matrices. A matrix \(\textbf{X} \in \mathcal {S}_n^+\) satisfies integrality property \((\text {P}\mathbb {Z})_{\mathcal {V}}\) if

figure a

The authors of [13] restricted to the set \(\mathcal {V} = \{ \mathbb {1}_S\mathbb {1}_S^\top \,: \, \, S \subseteq [n]\}\), which could be seen as a natural embedding for the combinatorial problems that are considered in [13]. One could argue, however, that this embedding is rather arbitrary. For that reason, we consider a general set of generating matrices. Note that the matrices \(\textbf{X}\) that satisfy property \((\text {P}\mathbb {Z})_{\mathcal {V}}\) are also integral in the sense that \(\textbf{X} \in \mathbb {Z}^{n \times n}\). To overcome confusion between these definitions, we will always explicitly refer to property \((\text {P}\mathbb {Z})_{\mathcal {V}}\) if that notion is meant.

Now we present the definition of total dual integrality for SDPs, see also [13].

Definition 3

(Total dual integrality) Let \(Z \subseteq \mathbb {Z}^m\). A linear matrix inequality \(\textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0}\) is called totally dual integral (TDI) on Z if there exists some finite set of integer PSD matrices \(\mathcal {V}\) such that, for every \(\textbf{b} \in Z\), the SDP dual to \(\sup \left\{ \textbf{b}^\top \textbf{x} \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq 0 \right\} \) has an optimal solution satisfying property \((\text {P}\mathbb {Z})_{\mathcal {V}}\) whenever it has an optimal solution.

A main difference with the original definition of total dual integrality for polyhedra, see e.g. [28], is that we restrict the objective vectors for which dual integrality should hold to a subset Z of \(\mathbb {Z}^m\). As explained in [13], this follows from the fact that semidefinite programs often follow from lifted formulations. For instance, Z could be the range of a linear lifting map, e.g., \(Z = \{ 0 \oplus \textbf{b}' \,: \, \, \textbf{b}' \in \mathbb {Z}^{m-1}\}.\)

Based on this restriction to vectors in Z, it makes sense to consider a relaxed version of the CG closure in which we take the intersection of halfspaces induced by coefficient vectors in Z. More precisely, we define the CG closure with respect to Z as

$$\begin{aligned} \text {cl}_{CG}(P, Z) := \bigcap _{\begin{array}{c} (\textbf{c}, d) \in Z \times \mathbb {R} \\ P \subseteq \{\textbf{x} \, : \, \, \textbf{c}^\top \textbf{x} \le d \} \end{array}} \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \textbf{c}^\top \textbf{x} \le \lfloor d \rfloor \right\} . \end{aligned}$$
(13)

This relaxation of the CG closure is also considered in the literature, see e.g., [21, 22]. The standard CG closure \(\text {cl}_{CG}(P)\) that we considered so far equals \(\text {cl}_{CG}(P, \mathbb {Z}^m)\).

The following theorem shows that if a spectrahedron is defined by an LMI that is TDI on Z, its (relaxed) CG closure \(\text {cl}_{CG}(P,Z)\) can be explicitly defined.

Theorem 5

Let \(P = \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0} \right\} \) be such that the LMI \(\textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0}\) is TDI on Z and satisfies Slater’s condition. Let \(\mathcal {V} = \{\mathbf{V_1}, \ldots , \mathbf{V_k}\}\) denote the corresponding generating set of integer PSD matrices and suppose \(\begin{bmatrix} \langle \mathbf{V_j}, \mathbf{A_1} \rangle&\cdots&\langle \mathbf{V_j}, \mathbf{A_m} \rangle \end{bmatrix}^\top \in Z\) for all \(j \in [k]\). Define \(\textbf{B} \in \mathbb {Z}^{k \times m}\) and \(\textbf{d} \in \mathbb {Z}^{k}\) such that: \( B_{j,i}:= \left\langle \mathbf{A_i}, \mathbf{V_j} \right\rangle \text { and } d_j:= \left\lfloor \left\langle \textbf{C}, \mathbf{V_j} \right\rangle \right\rfloor , \) for all \(j \in [k]\) and \(i \in [m]\). Then, \(\text {cl}_{CG}(P, Z) = Q:= \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{B}{} \textbf{x} \le \textbf{d}\right\} .\)

Proof

To prove that \(\text {cl}_{CG}(P, Z) \subseteq Q\), observe that \(\mathbf{V_j} \succeq \textbf{0}\) with \(\begin{bmatrix} \langle \mathbf{V_j}, \mathbf{A_1} \rangle&\cdots&\langle \mathbf{V_j}, \mathbf{A_m} \rangle \end{bmatrix}^\top \in Z\) for all \(j \in [k]\). Consequently, we know that \(P \subseteq \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, \sum _{i = 1}^m x_i \langle \mathbf{A_i}, {\mathbf{V_j} }\rangle \le \langle \textbf{C}, {\mathbf{V_j} } \rangle \right\} \). It follows from (13) that \(\text {cl}_{CG}(P,Z) \subseteq \{ \textbf{x} \in \mathbb {R}^m \,: \, \, \sum _{i = 1}^m x_i \langle \mathbf{A_i}, {\mathbf{V_j} } \rangle \le \left\lfloor \langle \textbf{C}, {\mathbf{V_j} } \rangle \right\rfloor \}\). Since all inequalities in \(\textbf{Bx} \le \textbf{d}\) are of this form, it follows that \(\text {cl}_{CG}(P, Z) \subseteq Q\).

To prove the reverse direction, let \(H:= \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, { \textbf{b}}^\top \textbf{x} \le q\right\} \) be a halfspace containing P with \({\textbf{b}} \in Z\). Since \(P \subseteq H\), we have

$$\begin{aligned}&q \ge \sup _{\textbf{x}} \left\{ {\textbf{b}}^\top \textbf{x} \, : \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0} \right\} \nonumber \\&\quad = \inf _{\textbf{X}} \left\{ \langle \textbf{C}, \textbf{X} \rangle \, : \, \, \langle \mathbf{A_i}, \textbf{X} \rangle = {b_i}, \, i \in [m], \, \textbf{X} \succeq \textbf{0} \right\} , \end{aligned}$$
(14)

where strong duality in (14) holds since the former problem has a Slater feasible point. By the same argument, we know that the infimum in (14) is attained. Since \(\textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0}\) is TDI on Z, it follows that there exists an optimal solution \(\hat{\textbf{X}}\) satisfying property \((\text {P}\mathbb {Z})_{\mathcal {V}}\). In other words, there exists an \(\hat{\textbf{y}} \in \mathbb {Z}_+^{k}\) such that \({\hat{\textbf{X}} = \sum _{j \in [k]}\hat{y}_j \mathbf{V_j}},\) \(\langle \mathbf{A_i}, \hat{\textbf{X}} \rangle = {b_i}\) for all \(i \in [m],\) \(\hat{\textbf{X}} \succeq 0.\)

Consequently, we have \(\lfloor q \rfloor \ge \lfloor \langle \textbf{C}, \hat{\textbf{X}} \rangle \rfloor = \left\lfloor \sum _{j \in [k]} \hat{y}_j \left\langle \textbf{C}, \mathbf{V_j} \right\rangle \right\rfloor \ge \sum _{j \in [k]} \hat{y}_j \left\lfloor \left\langle \textbf{C}, \mathbf{V_j} \right\rangle \right\rfloor = \textbf{d}^\top \hat{\textbf{y}}.\) Now, consider the following linear optimization problem and its corresponding dual:

$$\begin{aligned} \max \{ {\textbf{b}}^\top \textbf{x} \, : \, \, \textbf{B}{} \textbf{x} \le \textbf{d} \} = \min \{ \textbf{d}^\top \textbf{y} \, : \, \, \textbf{y} \ge \textbf{0}, \textbf{y}^\top \textbf{B} = {\textbf{b}}^\top \}. \end{aligned}$$

Since \(\hat{\textbf{y}} \ge \textbf{0}\) and \((\hat{\textbf{y}}^\top \textbf{B} )_i = {\sum _{j \in [k]} \hat{y}_j \langle \mathbf{A_i}, \mathbf{V_j} \rangle } = \langle \mathbf{A_i}, \hat{\textbf{X}} \rangle = { b_i}\), the solution \(\hat{\textbf{y}}\) is feasible for the minimization problem above. This yields \(\max \{ {\textbf{b}}^\top \textbf{x} \,: \, \, \textbf{B}{} \textbf{x} \le \textbf{d} \} \le \textbf{d}^\top \hat{\textbf{y}} \le \lfloor q \rfloor .\) Hence, \(Q \subseteq \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, {\textbf{b}}^\top \textbf{x} \le \lfloor q \rfloor \right\} \). Since this holds for all halfspaces H induced by coefficient vectors in Z, it follows that \(Q \subseteq \text {cl}_{CG}(P, Z)\). \(\square \)

For the special case where \(Z = \mathbb {Z}^m\), Theorem 5 provides a closed-form expression for \(\text {cl}_{CG}(P)\). Observe that for that special case the condition that \(\begin{bmatrix} \langle \mathbf{V_j}, \mathbf{A_1} \rangle&\cdots&\langle \mathbf{V_j}, \mathbf{A_m} \rangle \end{bmatrix}^\top \in Z\) for all \(j \in [k]\) can be simplified to \(\langle \mathbf{A_i}, \mathbf{V_j} \rangle \in \mathbb {Z}\) for all \(i \in [m]\) and \(j \in [k]\).

Besides providing a closed-form expression for \(\text {cl}_{CG}(P)\), Theorem 5 can be used to identify bounded spectrahedra for which \(P = P_I\). Namely, if the matrix \(\textbf{C}\) is such that \(\langle \textbf{C}, \mathbf{V_j} \rangle \in \mathbb {Z}\) for all \(j \in [k]\), then \(P \subseteq Q\). For spectrahedra that are bounded, this implies that the chain \(Q = \text {cl}_{CG}(P) \subseteq P \subseteq Q\) holds with equality, hence \(\text {cl}_{CG}(P) = P\). As \(P^{(k)} = P_I\) for some finite k for all bounded spectrahedra, we must have \(P = P_I\). De Carli Silva and Tunçel [13] show that this, for example, happens for the SDP formulation of the Lovász theta function when the underlying graph is perfect.

A natural question is under which conditions a linear matrix inequality is TDI on a certain set Z. Below we first derive a full characterization of LMIs that are totally dual integral on the full set \(\mathbb {Z}^m\). The characterization relates to the faces of the spectrahedron induced by the LMI. It is well-known that the faces of \(\mathcal {S}^n_+\) are associated with linear subspaces of \(\mathbb {R}^n\), see e.g., [7]. In the same vein, the facial structure of a spectrahedron can be characterized as follows.

Lemma 1

(Ramana and Goldman [57]) Let \(P = \{\textbf{x}\in \mathbb {R}^m \,: \, \, \textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}x_i \succeq \textbf{0}\}\) be a spectrahedron and let \(F \subseteq P\) be a nonempty face of P. Then, there exists a subspace \(\mathcal {R}_F \subseteq \mathbb {R}^n\) such that

$$\begin{aligned} F = \left\{ \textbf{x}\in P \, : \, \, \mathcal {R}_F \subseteq \text {Nul}\left( \textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}x_i \right) \right\} , \end{aligned}$$

where any point \(\textbf{x}\) in the relative interior of F satisfies \(\text {Nul}\big ( \textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}x_i \big ) = \mathcal {R}_F\).

Lemma 1 implies that in the particular case where the face F of P is an extreme point \(\bar{\textbf{x}}\), we have \(\mathcal {R}_{\bar{\textbf{x}}} = \text {Nul}( \textbf{C}- \sum _{i=1}^m \textbf{A}_{\textbf{i}}\bar{x}_i)\). For any nonempty face F of P, we define the cone of objective vectors \(\textbf{b}\) for which the elements in F maximize \(\textbf{b}^\top \textbf{x}\) over P, i.e.,

$$\begin{aligned} K(F) := \left\{ \textbf{b} \in \mathbb {R}^m \, : \, \, \textbf{b}^\top \textbf{y} = \max \{\textbf{b}^\top \textbf{x}\, : \, \, \textbf{x}\in P \} \text { for all } \textbf{y} \in F \right\} . \end{aligned}$$
(15)

For any proper face \(F \subseteq P\), the cone K(F) is nonempty and equals the intersection over all normal cones of P at the points in F.

Next, we recall the definition of a so-called Hilbert basis.

Definition 4

A set \(\{\textbf{v}_1, \ldots , \textbf{v}_k\} \subseteq \mathbb {Z}^m\) is a Hilbert basis if every integral vector \(\textbf{x}\in \text {cone}(\{\textbf{v}_1, \ldots , \textbf{v}_k\})\) can be written as \(\textbf{x}= \sum _{j = 1}^k \alpha _j \textbf{v}_j\), \(\alpha _j \ge 0\), \(\alpha _j \in \mathbb {Z}\), for all \(j \in [k]\).

By abuse of terminology, we will refer to an LMI whose solution set is bounded as a bounded LMI. The following theorem provides a full characterization of bounded LMIs that are TDI on the full set of integer vectors.

Theorem 6

Let the linear matrix inequality \(\textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}x_i \succeq \textbf{0}\) be bounded and assume Slater’s condition holds. Then, \(\textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0}\) is totally dual integral on \(\mathbb {Z}^m\) if and only if there exists some finite set of integer PSD matrices \(\mathcal {V} = \{\mathbf{V_1}, \ldots , \mathbf{V_k}\}\) such that for each extreme point \(\bar{\textbf{x}}\) of the induced spectrahedron \(P = \{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0}\}\) with \(K(\bar{\textbf{x}}) \cap \mathbb {Z}^m \ne \emptyset \), the vectors

$$\begin{aligned} \mathbf{g_j} := \begin{bmatrix} \langle \mathbf{A_1}, \mathbf{V_j} \rangle&\dots&\langle \mathbf{A_m}, \mathbf{V_j} \rangle \end{bmatrix}^\top \quad \text {for } j \in J:= \{ j \in [k] \, : \, \, \text {Col}(\mathbf{V_j}) \subseteq \mathcal {R}_{\bar{\textbf{x}}} \} \end{aligned}$$

form a Hilbert basis of \(K(\bar{\textbf{x}})\).

Proof

Let \(\textbf{b} \in \mathbb {Z}^m\). Since P is bounded, the maximum of \(\textbf{b}^\top \textbf{x}\) over \(\textbf{x}\in P\) is attained at a face of P. Thus, there exists an extreme point \(\bar{\textbf{x}}\) of P with \(\textbf{b} \in K(\bar{\textbf{x}})\). As P contains a Slater feasible point, we have

$$\begin{aligned} \max _\textbf{x}\left\{ \textbf{b}^\top \textbf{x}\, : \, \, \textbf{x}\in P \right\} = \min _{\textbf{X}} \left\{ \langle \textbf{C}, \textbf{X}\rangle \, : \, \, \langle \textbf{A}_{\textbf{i}}, \textbf{X}\rangle = b_i,~i \in [m],~\textbf{X}\succeq \textbf{0}\right\} . \end{aligned}$$
(16)

The point \(\bar{\textbf{x}}\) is optimal for the maximization problem above. Complementary slackness then implies that any \(\textbf{X}\) optimal to the dual problem should satisfy \({(\textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}\bar{x}_i) \textbf{X}= \textbf{0}}\), or equivalently, \(\text {Col}(\textbf{X}) \subseteq \text {Nul}(\textbf{C}- \sum _{i=1}^m \textbf{A}_{\textbf{i}}\bar{x}_i) = \mathcal {R}_{\bar{\textbf{x}}}\). To show that \(\mathbf{g_j}\) is contained in \(K(\bar{\textbf{x}})\) for \(j \in J\), we first observe that \(\mathbf{V_j}\) is feasible for the minimization problem

$$\begin{aligned} \min _{\textbf{X}} \left\{ \langle \textbf{C}, \textbf{X}\rangle \,: \, \, \langle \textbf{A}_{\textbf{i}}, \textbf{X}\rangle = (\mathbf{g_j})_i,~i \in [m],~\textbf{X}\succeq \textbf{0}\right\} . \end{aligned}$$

Then, since \(\text {Col}(\mathbf{V_j}) \subseteq \mathcal {R}_{\bar{\textbf{x}}}\), we know that \((\textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}\bar{x}_i) \mathbf{V_j} = \textbf{0}\). Therefore, \(\bar{\textbf{x}}\) and \(\mathbf{V_j}\) are optimal solutions to \(\max _\textbf{x}\left\{ \mathbf{g_j}^\top \textbf{x}\,: \, \, \textbf{x}\in P \right\} \) and \(\min _{\textbf{X}} \{ \langle \textbf{C}, \textbf{X}\rangle \,: \, \, \langle \textbf{A}_{\textbf{i}}, \textbf{X}\rangle = (\mathbf{g_j})_i,~i \in [m],~\textbf{X}\succeq \textbf{0}\}\), respectively. This implies that \(\mathbf{g_j}\) is indeed contained in \(K(\bar{\textbf{x}})\) for \(j \in J\).

Now, suppose that the vectors \(\mathbf{g_j}\), \(j \in J\) form a Hilbert basis of \(K(\bar{\textbf{x}})\). Then, we have \({\textbf{b} = \sum _{j \in J}\alpha _j \mathbf{g_j}}\) for some \(\alpha _j \ge 0\), \(\alpha _j \in \mathbb {Z}\), \(j \in J\). Consequently, \({\textbf{X}:= \sum _{j \in J} \alpha _j \textbf{V}_j}\) is feasible for the minimization problem in (16) with \(\text {Col}(\textbf{X}) \subseteq \mathcal {R}_{\bar{\textbf{x}}}\). Since this establishes complementary slackness between \(\textbf{X}\) and \(\bar{\textbf{x}}\), it follows that \(\textbf{X}\) is a dual optimal solution that satisfies property \((\text {P}\mathbb {Z})_{\mathcal {V}}\).

Conversely, if the LMI is totally dual integral on \(\mathbb {Z}^m\), it follows that the dual problem in (16) has an optimal solution \(\textbf{X}\) satisfying property \((\text {P}\mathbb {Z})_{\mathcal {V}}\). Therefore, \(\textbf{X}= \sum _{j = 1}^k \alpha _j \mathbf{V_j}\) for some \(\alpha _j \ge 0\), \(\alpha _j \in \mathbb {Z}\), \(j \in [k]\). Now, let \(J^C:= [k] {\setminus } J\). Then, \( \textbf{X}= \sum _{j \in J} \alpha _j \mathbf{V_j} + \sum _{j \in J^C} \alpha _j \mathbf{V_j}. \)

By complementary slackness, we have \(\text {Col}(\textbf{X}) \subseteq \mathcal {R}_{\bar{\textbf{x}}}\), implying that \(\text {Col}( \sum _{j \in J^C} \alpha _j \mathbf{V_j}) = \text {Col}(\textbf{X}- \sum _{j \in J}\alpha _j \mathbf{V_j}) \subseteq \mathcal {R}_{\bar{\textbf{x}}}\). Since the \(\mathbf{V_j}\)’s are positive semidefinite, we also know that \(\text {Col}(\alpha _j \mathbf{V_j}) \subseteq \text {Col}( \sum _{j \in J^C} \alpha _j \mathbf{V_j}) \subseteq \mathcal {R}_{\bar{\textbf{x}}}\) for all \(j \in J^C\). However, by the definition of \(J^C\) we have \(\text {Col}(\mathbf{V_j}) \nsubseteq \mathcal {R}_{\bar{\textbf{x}}}\), so we must have \(\alpha _j = 0\) for all \(j \in J^C\). We conclude that \(\textbf{X}\) is a non-negative integer combination of the matrices \(\mathbf{V_j}\) with \(j \in J\). By the constraints of the minimization problem in (16), it finally follows that \(\textbf{b} = \sum _{j \in J} \alpha _j \mathbf{g_j}\). As the construction can be repeated for all \(\textbf{b} \in \mathbb {Z}^m\) in \(K(\bar{\textbf{x}})\), we conclude that \(\{ \mathbf{g_j} \,: \, \, j \in J\}\) indeed forms a Hilbert basis of \(K(\bar{\textbf{x}})\). The same holds for all other extreme points \(\bar{\textbf{x}}\) for which \(K(\bar{\textbf{x}}) \cap \mathbb {Z}^m \ne \emptyset \). \(\square \)

Theorem 6 has a significant implication on the structure of the induced spectrahedron of a bounded LMI that is TDI on \(\mathbb {Z}^m\).

Corollary 2

If a bounded LMI \(\textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0}\) that satisfies Slater’s condition is totally dual integral on \(\mathbb {Z}^m\), the spectrahedron \({P = \{ \textbf{x}\in \mathbb {R}^m \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0}\}}\) is polyhedral.

Proof

Let \(h_P: \mathbb {R}^m \rightarrow \mathbb {R}\) denote the support function of P, i.e., \(h_P(\textbf{x}):= \sup _{\textbf{a} \in P} \{\textbf{x}^\top \textbf{a}\}\) and let \((\textbf{c},d) \in \mathbb {Z}^m \times \mathbb {R}\) be such that \(P \subseteq \{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} \le d\}\). Then, there exists an extreme point \(\bar{\textbf{x}}\) of P such that \(\textbf{c} \in K(\bar{\textbf{x}})\). By Theorem 6, it follows that there exists a subset \(J \subseteq [k]\) and \(\alpha _j \ge 0\), \(\alpha _j \in \mathbb {Z}\), \(j \in J\) such that \(\textbf{c} = \sum _{j \in J} \alpha _j \mathbf{g_j}\). Obviously, \(h_P(\textbf{c}) = \textbf{c}^\top \bar{\textbf{x}}\) and, since \(\mathbf{g_j} \in K(\bar{\textbf{x}})\), \(h_P(\mathbf{g_j}) = \mathbf{g_j}^\top \bar{\textbf{x}}\) for all \(j \in J\). Now, the conical combination of the inequalities \(\mathbf{g_j}^\top \textbf{x} \le h_P(\mathbf{g_j})\), each with weight \(\alpha _j\), results in \(\sum _{j \in J} \alpha _j \mathbf{g_j}^\top \textbf{x} \le \sum _{j \in J} \alpha _j h_P(\mathbf{g_j}) = \sum _{j \in J}\alpha _j\mathbf{g_j}^\top \bar{\textbf{x}} = \textbf{c}^\top \bar{\textbf{x}} \le d\). Since the left-hand side equals \(\textbf{c}^\top \textbf{x}\), the halfspace \(\{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} \le d\}\) is implied by the inequalities \(\mathbf{g_j}^\top \textbf{x} \le h_P(\mathbf{g_j})\), \(j \in [k]\).

Since this construction can be repeated for all halfspaces of the form \(\{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} \le d\}\) where \((\textbf{c},d) \in \mathbb {Z}^m \times \mathbb {R}\), and P equals the intersection of all such halfspaces, see Proposition 1, it follows that P is contained in the polyhedron induced by \(\mathbf{g_j}^\top \textbf{x} \le h_P(\mathbf{g_j})\), \(j \in [k]\). Since the converse inclusion is also true, P is polyhedral. \(\square \)

Corollary 2 implies that the only bounded linear matrix inequalities that may be TDI on \(\mathbb {Z}^m\) can be described by a finite number of linear inequalities. This is the case, for instance, when the matrices \(\textbf{C}\) and \(\textbf{A}_{\textbf{i}}\), \(i \in [m]\), are diagonal or simultaneously diagonalizable. In general, it is \(\mathcal{N}\mathcal{P}\)-hard to decide whether a spectrahedron is polyhedral [58]. The following result provides a characterization of polyhedral spectrahedra that are full-dimensional. Observe that any spectrahedron can be transformed to a full-dimensional spectrahedron by a restriction to its affine hull.

Theorem 7

(Ramana [58]) Let \(P = \{ \textbf{x}\in \mathbb {R}^m \,: \, \, \textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}x_i \succeq \textbf{0}\}\) be a full-dimensional spectrahedron. Then, P is polyhedral if and only if there exists a non-singular matrix \(M \in \mathbb {R}^{n \times n}\) and \(\textbf{d}, \mathbf{a_i} \in \mathbb {R}^{\ell }\), \(\textbf{C}', \textbf{A}_{\textbf{i}}' \in \mathcal {S}^{n - \ell }\), \(i \in [m]\), with \(\ell \le n\) such that for all \(\textbf{x}\in \mathbb {R}^m\) we have

$$\begin{aligned} M\left( \textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}x_i \right) M^\top = \begin{bmatrix} \textbf{C}' - \sum _{i = 1}^m \textbf{A}_{\textbf{i}}' x_i &{} \textbf{0}\\ \textbf{0}&{} \text {Diag}(\textbf{d}) - \sum _{i = 1}^m \text {Diag}(\mathbf{a_i})x_i \end{bmatrix} \end{aligned}$$
(17)

with \(P = \{\textbf{x}\in \mathbb {R}^m \,: \, \, \text {Diag}(\textbf{d}) - \sum _{i = 1}^m \text {Diag}(\mathbf{a_i})x_i \succeq \textbf{0}\}\).

It is well-known that any rational polyhedron P can be described by a totally dual integral system of linear inequalities, see Giles and Pulleyblank [42]. Hence, if a spectrahedron P satisfies Theorem 7 with rational \(\textbf{d}, \mathbf{a_i}\) for all \(i \in [m]\), then P is totally dual integral on \(\mathbb {Z}^m\) with respect to generating matrices \(\mathcal {V} = \{\text {Diag}(\mathbf{e_1}), \ldots , \text {Diag}(\mathbf {e_{\ell }})\} \subseteq \mathcal {S}^{\ell }_+\).

By relaxing the notion of total dual integrality to a strict subset Z of \(\mathbb {Z}^m\), it might be possible to identify other conditions of TDIness that go beyond polyhedrality. In return, the best one can hope for is a description of \(\text {cl}_{CG}(P,Z)\), see Theorem 5.

As shown by Bhardwaj et al. [10], any full-dimensional spectrahedron P can be expressed by a linear matrix inequality in the form of (17), even if P is non-polyhedral. When the residual linear matrix form \(\textbf{C}' - \sum _{i = 1}^m \textbf{A}_{\textbf{i}}' x_i\) cannot be further diagonalized, the form on the right-hand side of (17) is called the normal form of the linear matrix inequality. Intuitively speaking, the bottom right block of (17) can be viewed as the polyhedral part of the spectrahedron. As an extension of the result by Giles and Pulleyblank [42], the following result shows that the polyhedral part of a spectrahedron can, under mild conditions, be made totally dual integral on an appropriate set Z.

Theorem 8

Let \(P = \{ \textbf{x}\in \mathbb {R}^m \,: \, \, \textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}x_i \succeq \textbf{0}\}\) be a full-dimensional spectrahedron that can be written in the normal form (17) for some non-singular matrix \(M \in \mathbb {R}^{n \times n}\) and \(\textbf{d}, \mathbf{a_i} \in \mathbb {Q}^{\ell }\), \(\textbf{C}', \textbf{A}_{\textbf{i}}' \in \mathcal {S}^{n - \ell }\), \(i \in [m]\) with \( 1 \le \ell \le n\). Let \(Z \subseteq \mathbb {Z}^m\) be such that

$$\begin{aligned} \max _{\textbf{x}} \left\{ \textbf{b}^\top \textbf{x}\, : \, \, \textbf{x}\in P \right\} = \max _{\textbf{x}} \left\{ \textbf{b}^\top \textbf{x}\, : \, \, \text {Diag}(\textbf{d}) - \sum _{i = 1}^m \text {Diag}(\mathbf{a_i})x_i \succeq \textbf{0}\right\} \end{aligned}$$

for all \(\textbf{b} \in Z\). Then there exists a linear matrix inequality describing P that is totally dual integral on Z.

Proof

Let \(Q = \{\textbf{x}\in \mathbb {R}^m \,: \, \, \text {Diag}(\textbf{d}) - \sum _{i=1}^m \text {Diag}(\mathbf{a_i})x_i \succeq \textbf{0}\}\). Since \(\textbf{d}\) and \(\mathbf{a_i}\) are rational for all \(i \in [\ell ]\), it follows from Giles and Pulleyblank [42] that there exists some totally dual integral representation of Q, i.e., \(Q = \{\textbf{x}\in \mathbb {R}^m \,: \, \, \hat{\textbf{A}}\textbf{x}\le \hat{\textbf{d}}\}\) for some \(\hat{\textbf{A}} \in \mathbb {Z}^{\ell ' \times m}, \hat{\textbf{d}} \in \mathbb {Q}^{\ell '}\) with \(\hat{\textbf{A}}\textbf{x}\le \hat{\textbf{d}}\) TDI. For all \(i \in [m]\), let \(\hat{\textbf{a}_i}\) denote the ith column of \(\hat{\textbf{A}}\). Then, P can be written as

$$\begin{aligned} P = \left\{ \textbf{x}\in \mathbb {R}^m \, : \, \, \begin{bmatrix} \textbf{C}' - \sum _{i = 1}^m \textbf{A}_{\textbf{i}}' x_i &{} \textbf{0}\\ \textbf{0}&{} \text {Diag}(\hat{\textbf{d}}) - \sum _{i = 1}^m \text {Diag}(\hat{\textbf{a}_i})x_i \end{bmatrix} \succeq \textbf{0}\right\} . \end{aligned}$$
(18)

We will show that the LMI in (18) is totally dual integral on Z. For any \(\textbf{b} \in Z\), we have that

$$\begin{aligned} \max _{\textbf{x}} \left\{ \textbf{b}^\top \textbf{x}\, : \, \, \textbf{x}\in P \right\} = \max _{\textbf{x}}\left\{ \textbf{b}^\top \textbf{x}\, : \, \, \textbf{x}\in Q \right\} = \min _{\textbf{y}} \left\{ \hat{\textbf{d}}^\top \textbf{y}\, : \, \, \textbf{y}\ge \textbf{0},~\textbf{y}^\top \hat{\textbf{A}} = \textbf{b}^\top \right\} . \end{aligned}$$

By construction, the minimization problem above has an optimal solution \(\hat{\textbf{y}} \in \mathbb {Z}_+^{\ell '}\). Now, we define

It follows from above that and for all \(i \in [m]. \) Therefore, \(\hat{\textbf{X}}\) is optimal to the SDP dual to \(\max _{\textbf{x}}\{\textbf{b}^\top \textbf{x}\,: \, \, \textbf{x}\in P\}\). By construction, \(\hat{\textbf{X}}\) is an integer conical combination of matrices in the set \(\mathcal {V} = \{ \textbf{0}\oplus \text {Diag}(\mathbf{e_i}) \,: \, \, i \in [\ell ']\}\) of integer PSD matrices. We conclude that the LMI given in (18) is totally dual integral on Z. \(\square \)

Our final condition for total dual integrality on a set Z is not related to the polyhedrality of the spectrahedron induced by the linear matrix inequality, but related to the feasible set of its corresponding dual problem to be polyhedral. It is possible for a spectrahedron to be non-polyhedral, while the feasible set of its dual problem is polyhedral. For instance, consider the non-polyhedral spectrahedron \(Q = \{\textbf{x}\in \mathbb {R}^2 \,: \, \, x_2 \ge x_1^2/2 \}\) considered in Example 1. For any \(\textbf{b} \in \mathbb {Z}^2_{-}\), its dual feasible set is given by which is polyhedral. Let us formalize the criterion of polyhedrality of the dual feasible set.

Definition 5

The set \(\{\mathbf{A_1}, \ldots , \mathbf{A_m}\}\) is called finitely generative on \(Z \subseteq \mathbb {Z}^m\) if there exists a finite set of integer PSD matrices \(\mathcal {V} = \{\mathbf{V_1}, \ldots , \mathbf{V_k}\}\) such that \( \left\{ \textbf{X} \,: \, \, \langle \mathbf{A_i}, \textbf{X} \rangle = b_i,~i \in [m],~\textbf{X} \succeq \textbf{0} \right\} \) is contained in \(\text {cone}(\mathcal {V})\) for all integer vectors \(\textbf{b} \in Z\).

The condition of the dual feasible set to be polyhedral is also considered in recent works on SDP exactness [67]. Observe that if \(\{\mathbf{A_1}, \ldots , \mathbf{A_m}\}\) is finitely generative, then \(\{\textbf{X} \,: \, \, \langle \mathbf{A_i}, \textbf{X} \rangle = b_i,~i \in [m],~\textbf{X} \succeq \textbf{0}\}\) is polyhedral for all \(\textbf{b} \in Z\) (since \(\text {cone}(\mathcal {V}) \subseteq \mathcal {S}^n_+\)). Moreover, if \(\{\mathbf{A_1}, \ldots , \mathbf{A_m}\}\) is finitely generative on Z, then \(\{t\mathbf{A_1}, \ldots , t\mathbf{A_m}\}\) is also finitely generative on Z for any scalar \(t > 0\).

As shown below, the constraint matrices being finitely generative and integer is a sufficient condition for the existence of a totally dual integral description of the spectrahedron.

Theorem 9

Let \(\textbf{C}- \sum _{i = 1}^m \textbf{A}_{\textbf{i}}x_i \succeq \textbf{0}\) be an LMI satisfying Slater’s condition with \(\{\mathbf{A_1}, \ldots , \mathbf{A_m}\} \subseteq \mathbb {Z}^{n \times n}\) finitely generative on Z. Then, the spectrahedron \(P = \left\{ \textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \succeq \textbf{0} \right\} \) can be described by a linear matrix inequality that is totally dual integral on Z.

Proof

Let \(\mathcal {V} = \{\mathbf{V_1}, \ldots , \mathbf{V_k}\}\) denote the finite set of integer PSD matrices corresponding to \(\{\mathbf{A_1}, \ldots , \mathbf{A_m}\}\) in Definition 5. Let \(\textbf{b} \in Z\) and let \(t > 0\) be a positive rational number. We consider the following semidefinite program and its dual:

$$\begin{aligned}&\sup _{\textbf{x}} \left\{ \textbf{b}^\top \textbf{x} \, : \, \, t\textbf{C} - \sum _{i =1}^m t\mathbf{A_i}x_i \succeq \textbf{0} \right\} \nonumber \\&\quad = \inf _{\textbf{X}} \left\{ \langle t\textbf{C}, \textbf{X} \rangle \, : \, \, \langle t\mathbf{A_i}, \textbf{X} \rangle = b_i,~i \in [m],~ \textbf{X} \succeq \textbf{0} \right\} . \end{aligned}$$
(19)

Based on the fact that \(\{\mathbf{A_1}, \ldots , \mathbf{A_m}\}\) is finitely generative, we know that the feasible set of the minimization problem in (19) is contained in \(\text {cone}(\mathcal {V})\). Since we also know the minimum is attained due to Slater’s condition, we can rewrite the dual problem as follows:

$$\begin{aligned}&\min _{\textbf{X}} \Big \{\langle t\textbf{C}, \textbf{X} \rangle \, : \, \, \langle t\mathbf{A_i}, \textbf{X} \rangle = b_i,~i \in [m],~ \textbf{X} = \alpha _1 \textbf{V}_1 + \dots + \alpha _k \mathbf{V_k},~ \varvec{\alpha }\ge \textbf{0} \Big \} \\&\quad = \min _{\textbf{X}} \left\{ \langle t \textbf{C}, \textbf{X} \rangle \, : \begin{aligned} \, \, t~\text {triu}(\mathbf{A_i})^\top \text {svec}(\textbf{X}) = b_i,~~ i \in [m],~~\varvec{\alpha } \ge \textbf{0} \\ ~\text {svec}(\textbf{X}) - \alpha _1 \text {svec}(\mathbf{V_1}) - \dots - \alpha _k \text {svec}(\mathbf{V_k})= 0 \end{aligned} \right\} \\&\quad = \min _{\textbf{X}} \left\{ \langle t \textbf{C}, \textbf{X} \rangle \, : \, \, \begin{bmatrix} t\mathbf{A'} &{} \textbf{0} \\ \textbf{I} &{} - \mathbf{V'} \end{bmatrix} \begin{bmatrix} \text {svec}(\textbf{X}) \\ \varvec{\alpha } \end{bmatrix} = \begin{bmatrix} \textbf{b} \\ \textbf{0} \end{bmatrix} ,~ \varvec{\alpha }\ge \textbf{0} \right\} , \end{aligned}$$

where \(\mathbf{A'}:= \begin{bmatrix} \text {triu}(\mathbf{A_1})&\dots&\text {triu}(\mathbf{A_m}) \end{bmatrix}^\top \), \(\mathbf{V'}:= \begin{bmatrix} \text {svec}(\mathbf{V_1})&\dots&\text {svec}(\mathbf{V_k}) \end{bmatrix}\), \(\text {triu}: \mathcal {S}^n \rightarrow \mathbb {R}^{\frac{1}{2}(n^2 + n)}\) is the operator that maps a matrix to a vector containing its upper-triangular entries and \(\text {svec}: \mathcal {S}^n \rightarrow \mathbb {R}^{\frac{1}{2}(n^2 + n)}\) is the symmetric vectorization operator that maps a matrix to a vector containing its upper-triangular part with weight two on the off-diagonal elements and weight one on the diagonal elements. The linear system in the dual problem above can be written as

$$\begin{aligned} t\begin{bmatrix} \mathbf{A'} &{} \textbf{0} \\ \textbf{I} &{} - \mathbf{V'} \end{bmatrix} \begin{bmatrix} \text {svec}(\textbf{X}) \\ \varvec{\alpha } \end{bmatrix} = \begin{bmatrix} \textbf{b} \\ \textbf{0} \end{bmatrix}, \quad \text {or equivalently,} \quad \begin{bmatrix} \mathbf{A'} &{} \textbf{0} \\ \textbf{I} &{} - \mathbf{V'} \end{bmatrix} \begin{bmatrix} \text {svec}(\textbf{X}) \\ \varvec{\alpha } \end{bmatrix} = \frac{1}{t}\begin{bmatrix} \textbf{b} \\ \textbf{0} \end{bmatrix}. \end{aligned}$$

Each basic feasible solution to this system with \(\varvec{\alpha } \ge \textbf{0}\) is the unique solution to one of its non-singular subsystems. Following the proof by Giles and Pulleyblank [42], it is possible to find a rational number \(t^*\) such that for all \(\textbf{b} \in Z\), there exists an optimal solution that satisfies \(\text {svec}(\textbf{X}) \in \mathbb {Z}^{\frac{1}{2}(n^2 +n)}\) and \(\varvec{\alpha } \in \mathbb {Z}^k\). When mapping \(\text {svec}(\textbf{X})\) back to \(\textbf{X} \in \mathcal {S}^n\), it follows that the SDP dual to \( \max \{\textbf{b}^\top \textbf{x} \,: \, \, t^* \textbf{C} - \sum _{i = 1}^m t^* \mathbf{A_i}x_i \succeq \textbf{0} \}\) for all \(\textbf{b} \in Z\) has an optimal solution \(\textbf{X}\) satisfying \( \textbf{X} = \sum _{j \in [k]} \alpha _j \mathbf{V_j},~ \alpha _j \ge 0,~j \in [k]. \) with \(\varvec{\alpha }\) integer. Hence, property \((\text {P}\mathbb {Z})_{\mathcal {V}}\) holds for \(\textbf{X}\). We conclude that \( t^* \textbf{C} - \sum _{i = 1}^m t^* \mathbf{A_i}x_i \succeq \textbf{0}\) is a linear matrix inequality describing P that is totally dual integral on Z. \(\square \)

2.5 Strengthened Chvátal–Gomory cuts

Dash et al. [24] consider a strengthening of the CG cuts for rational polyhedra. We briefly present here their approach that can be applied to general convex sets.

For all \(\textbf{c}\in \mathbb {Z}^m\) such that \(P \subseteq \{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} \le d\}\), the corresponding CG cut is \(\textbf{c}^\top \textbf{x} \le \lfloor d \rfloor \). The validity of this cut follows from the inequality \( \lfloor d \rfloor \ge \max \left\{ \textbf{c}^\top \textbf{x} \,: \, \, \textbf{c}^\top \textbf{x} \le d, \, \textbf{x} \in \mathbb {Z}^m \right\} , \) where equality holds if the entries in \(\textbf{c}\) are relatively prime. However, the gap between \(\lfloor d \rfloor \) and \(\max \{ \textbf{c}^\top \textbf{x} \,: \, \, \textbf{x} \in P \cap \mathbb {Z}^m\}\) can generally be very large. In order to reduce this gap, suppose that we know that \(P \cap \mathbb {Z}^m\) is contained in some set \(S \subseteq \mathbb {Z}^m\). Given a valid inequality \(\textbf{c}^\top \textbf{x} \le d\) for P, we define

$$\begin{aligned} \lfloor d \rfloor _{S,c} := \max \left\{ \textbf{c}^\top \textbf{x} \, : \, \, \textbf{c}^\top \textbf{x} \le d, \, \textbf{x} \in S \right\} . \end{aligned}$$
(20)

By construction, \(\textbf{c}^\top \textbf{x} \le \lfloor d \rfloor _{S,c}\) is valid for \(P \cap \mathbb {Z}^m\). We refer to these type of cuts as S-Chvátal–Gomory (S-CG) cuts. These cuts are at least as strong as standard CG cuts, since taking \(S = \mathbb {Z}^m\) provides the standard CG cut. The geometric interpretation of an S-CG cut is that we shift the hyperplane \(\{\textbf{x} \in \mathbb {R}^m \,: \, \, \textbf{c}^\top \textbf{x} = d\}\) in the direction of \(P \cap \mathbb {Z}^m\) until it hits a point in S. An example for S is the set \(\{0,1\}^{m}\) in the case of binary optimization problems.

3 A CG-based branch-and-cut algorithm for ISDPs

Solving ISDPs is a relatively new field of research for which only a few general-purpose solution approaches have been proposed. Gally et al. [40] present a B &B algorithm called SCIP-SDP for solving (M)ISDPs with continuous SDPs as subproblems. Alternatively, Kobayashi and Takano [46] propose a B &C algorithm that initially relaxes the PSD constraint and solves a mixed integer linear program (MILP), where the PSD constraint is imposed dynamically via cutting planes. Numerical results in [46] show that the B &C algorithm of [46] outperforms the B &B algorithm of [40]. The difference can be explained by the high performance of the current MILP solvers compared to the much less robust conic interior point methods that are used in [40]. It has to be noted, however, that an older version of SCIP-SDP with DSDP [9] as SDP solver was used in the computational results of [46]. The authors of [50] also compare the two approaches and conclude that SCIP-SDP is much faster on average than the approach by Kobayashi and Takano. However, they use Mosek [54] as an SDP solver and an improved implementation of SCIP-SDP. Another project that encounters MISDPs is YALMIP [49], although its performance is inferior compared to the other two methods [40, 46].

In this section we present a generic B &C algorithm for solving ISDPs that exploits CG cuts of the underlying spectrahedron. This algorithm can be seen as an extension of the works of [15, 46]. In Sect. 3.1 we provide a general B &C framework for ISDPs which uses a cut generation routine based on S-CG cuts. Section 2 presents a separation routine for the special class of binary SDPs.

3.1 Generic Branch-and-Cut framework

We start this section by presenting the B &C framework proposed by Kobayashi and Takano [46] for ISDPs in standard dual form, see (5). However, the approach can be extended to problems in primal form in a straightforward way. We define

$$\begin{aligned} \mathcal {F} := \left\{ \textbf{x} \in \mathbb {R}^m \, : \, \, \text {diag}\left( \textbf{C} - \sum _{i = 1}^m \mathbf{A_i}x_i \right) \ge \textbf{0} \right\} , \end{aligned}$$
(21)

which can be seen as the polyhedral part of the spectrahedron P, see (6). We assume that the problem of maximizing \(\textbf{b}^\top \textbf{x}\) over \(\mathcal {F}\) is bounded, which is a non-restrictive assumption whenever the original ISDP is bounded.

The B &C algorithm of [46] is based on a dynamic constraint generation known as a lazy constraint callback. The algorithm starts with optimizing over the set \(\mathcal {F}\cap \mathbb {Z}^m \), i.e.,

$$\begin{aligned} \max \left\{ \textbf{b}^\top \textbf{x} \, : \, \, \textbf{x} \in \mathcal {F} \cap \mathbb {Z}^m \right\} , \end{aligned}$$
(22)

which can be solved using a B &B algorithm. Whenever an integer point \(\hat{\textbf{x}}\) is found in the branching tree, it is verified whether \(\textbf{C} - \sum _{i = 1}^m\mathbf{A_i}\hat{x}_i \succeq \textbf{0}\) is satisfied. If so, the solution is feasible for \((D_{ISDP})\) and provides a possibly better lower bound to prune other nodes in the tree. If not, then \(\langle \textbf{C} - \sum _{i =1}^m \mathbf{A_i}\hat{x}_i, \textbf{dd}^\top \rangle < 0\) where \(\textbf{d}\) is a normalized eigenvector corresponding to the smallest eigenvalue of \(\textbf{C} - \sum _{i =1}^m \mathbf{A_i}\hat{x}_i\). This leads to the following valid constraint for \((D_{ISDP})\):

$$\begin{aligned} \left\langle \textbf{C} - \sum _{i =1}^m \mathbf{A_i}x_i, \textbf{dd}^\top \right\rangle \ge 0, \quad \text {or equivalently,} \quad \sum _{i=1}^m \langle \mathbf{A_i}, \textbf{dd}^\top \rangle x_i \le \langle \textbf{C}, \textbf{dd}^\top \rangle , \end{aligned}$$
(23)

which separates \(\hat{\textbf{x}}\) from P. Now, the algorithm adds to \(\mathcal {F}\) a cut of type (23) to cut off the current point and continues the branching scheme using this additional constraint. This process is iterated until the optimality of a solution for \((D_{ISDP})\) is guaranteed by the B &B procedure.

It follows from the Rayleigh principle that \(\langle \textbf{C} - \sum _{i = 1}^m \mathbf{A_i} \hat{x}_i, \textbf{U} \rangle \) is minimized by taking \(\textbf{U} = \textbf{dd}^\top \) with \(\textbf{d}\) as defined above. In that sense, the cut (23) is the strongest cut with respect to violation in the PSD constraint. However, this type of separator ignores the fact that an optimal solution is also integer. We now propose an alternative stronger separator based on the CG procedure that exploits both the PSD and the integrality constraint.

Let \(S \subseteq \mathbb {Z}^m\) be a set containing the feasible set of \((D_{ISDP})\), with \(S = \mathbb {Z}^m\) in case of no prior knowledge about the problem. If \(\hat{\textbf{x}} \notin P\), and consequently \(\hat{\textbf{x}} \notin \text {cl}_{CG}(P)\), it follows from (9) that there exists a dual multiplier \(\textbf{U} \in \mathcal {S}^n_+\) with \(\langle \mathbf{A_i}, \textbf{U} \rangle \in \mathbb {Z}\) for all \(i \in [m]\), such that \(\sum _{i = 1}^m \langle \mathbf{A_i}, \textbf{U} \rangle \hat{x}_i > \lfloor \langle \textbf{C}, \textbf{U} \rangle \rfloor \). Taking such \(\textbf{U}\) and defining \(\mathbf{v(U)}:= \left( \langle \mathbf{A_1}, \textbf{U} \rangle , \ldots , \langle \mathbf{A_m}, \textbf{U} \rangle \right) ^\top \), we obtain the following S-CG cut:

$$\begin{aligned} \sum _{i = 1}^m \langle \mathbf{A_i}, \textbf{U} \rangle x_i&\le \lfloor \langle \textbf{C}, \textbf{U} \rangle \rfloor _{S,\mathbf{v(U)}}, \end{aligned}$$
(24)

see (20). The cut (24) exploits both the PSD and the integrality constraints in \((D_{ISDP})\) by separating \(\hat{\textbf{x}}\) from \(\text {cl}_{CG}(P)\) instead of only from P. As \(\text {cl}_{CG}(P) \subseteq P\) for bounded spectrahedra, this type of cut is possibly stronger than the eigenvalue cut (23) for all S containing \(P \cap \mathbb {Z}^m\). Figure 4 depicts a simplified example indicating the geometric difference between the cuts (23) and (24).

Fig. 4
figure 4

Simplified example of strengthened separation routine on spectrahedron P from Example 1. The dotted line shows an eigenvalue cut (23) separating \(\hat{\textbf{x}}\) from P, the solid line shows a CG cut (24) separating \(\hat{\textbf{x}}\) from \(\text {cl}_{CG}(P)\), where \(S = \mathbb {Z}^m\)

It is not clear in general how to find an appropriate cut (24) separating \(\hat{\textbf{x}}\) from \(\text {cl}_{CG}(P)\). Indeed, this is closely related to the CG separation problem, which was proven to be \(\mathcal{N}\mathcal{P}\)-hard even for polytopes contained in the unit hypercube, see Cornuéjols et al. [18]. Fischetti and Lodi [38] show how to solve the separation problem for polyhedra using a mixed integer programming problem. Extending their procedure to the class of spectrahedra, implies solving a MISDP. Instead, we can adopt problem-specific separation routines that are efficient and provide strong cuts. For instance, in the next subsection we present a separation routine for binary SDPs in primal form. Moreover, we later provide various separation routines for cuts of the form (24) for the quadratic traveling salesman problem.

Alongside extending the approach of Kobayashi and Takano [46], our framework also continues on the work of Çezik and Iyengar [15]. In [15] CG cuts for binary conic programs are introduced. It is noted that there is no method known for separating CG cuts from fractional points, and consequently the CG cuts are not included in the numerical experiments of [15]. Since our approach separates on integer points only, we partly resolve this issue for certain classes of problems by exploiting the underlying structure of the programs. As a result, we present the first practical algorithm that utilizes CG cuts in conic problems.

We end this section by providing a pseudocode of the B &C framework, see Algorithm 1. Suppose SeparationRoutine is a separation routine for constructing CG cuts of the form (24), where we assume this routine can generate multiple dual matrices at a time. In “Appendix 2” we present a separation routine for binary SDPs.

Algorithm 1
figure b

CG-based B &C algorithm for solving \((D_{ISDP})\)

4 The Chvátal–Gomory procedure for ISDP formulations of the QTSP

In this section we provide an in-depth study on solving the Quadratic Traveling Salesman Problem using our B &C approach. We formally define the QTSP in Sect. 4.1. In Sect. 4.2 we derive two ISDP formulations of the QTSP. Our first ISDP model exploits the algebraic connectivity of a directed tour. Our second formulation exploits the algebraic connectivity of a directed tour and the distance two matrix that originates from the product of a tour matrix with itself. Finally, in Sect. 4.3 we derive CG cuts for the two ISDPs and show that we can obtain various classes of well-known cuts in this way.

4.1 The quadratic traveling salesman problem

Let \(G = (N, A)\) be a directed simple graph on \(n:= |N|\) nodes and \(m:= |A|\) arcs. A directed cycle \(\mathcal {C}\) in G that visits all the nodes exactly once is called a directed Hamiltonian cycle or a directed tour in G. For the sake of simplicity, we often omit the adjective ‘directed’ in the sequel.

A tour in G can be represented by a binary matrix \(\textbf{X} = (x_{ij}) \in \{0,1\}^{n \times n}\) such that \(x_{ij} = 1\) if and only if arc (ij) is used in the tour. We refer to such a matrix as a tour matrix. The set of all tour matrices in G is defined as follows:

$$\begin{aligned} \mathcal {T}_n(G) := \left\{ \textbf{X}^\mathcal {C} \in \{0,1\}^{n \times n} \, : \, \, x^\mathcal {C}_{ij} = 1 \text { if and only if } (i,j) \in \mathcal {C} \text { for Hamiltonian cycle } \mathcal {C} \right\} . \end{aligned}$$
(25)

It follows from (25) that for all \(\textbf{X} \in \mathcal {T}_n(G)\) we have \(x_{ij} = 0\) if \((i,j) \notin A\). In particular, \(\text {diag}(\textbf{X}) = \textbf{0}_n\). Given a distance matrix \(\textbf{D} = (d_{ij}) \in \mathbb {R}^{n \times n}\), the (linear) traveling salesman problem (TSP) is the problem of finding a Hamiltonian cycle \(\mathcal {C}\) of G that minimizes \( \sum _{(i,j) \in \mathcal {C}}d_{ij}\). As G is directed and \(\textbf{D}\) is not necessarily symmetric, this version of the problem is sometimes referred to as the asymmetric traveling salesman problem. Using the set defined in (25), we can state the TSP as follows:

$$\begin{aligned} { TSP}(\textbf{D},G) := \min \left\{ \sum _{i = 1}^n \sum _{j = 1}^n d_{ij}x_{ij} \, : \, \, \textbf{X} \in \mathcal {T}_n(G) \right\} . \end{aligned}$$
(26)

We now define the quadratic version of the TSP, where the total cost is given by the sum of interaction costs between arcs used in the tour. In accordance with most of the literature, we assume that a quadratic cost is incurred only if two arcs are placed in succession on the tour, see e.g., [33,34,35, 45, 61]. To model this problem, we define the set of the so-called 2-arcs of G, i.e.,

$$\begin{aligned} \mathcal {A} := \left\{ (i,j,k) \, : \, \, (i,j), (j,k) \in A, |\{i,j,k\}| = 3 \right\} , \end{aligned}$$
(27)

which consists of all node triples of G that can be placed in succession on a cycle. Now let \(\textbf{Q} = (q_{ijk}) \in \mathbb {R}^{n \times n \times n}\) be a cost matrix such that \(q_{ijk}= 0\) if \((i,j,k) \notin \mathcal {A}\). Then the quadratic traveling salesman problem (QTSP) is formulated as:

$$\begin{aligned} { QTSP}(\textbf{Q},G) := \min \left\{ \sum _{i = 1}^n \sum _{j = 1}^n \sum _{k = 1}^n q_{ijk}x_{ij}x_{jk} \, : \, \, \textbf{X} \in \mathcal {T}_n(G) \right\} . \end{aligned}$$
(28)

Since the in- and outdegree of each node on a Hamiltonian cycle is exactly one, we have \(\textbf{X}\textbf{1} = \textbf{1}\) and \(\textbf{X}^\top \textbf{1} = \textbf{1}\) for all \(\textbf{X} \in \mathcal {T}_n(G)\). The set of square binary matrices that satisfy this property is known as the set of permutation matrices \(\varPi _n\), i.e., \(\varPi _n:= \left\{ \textbf{X} \in \{0,1\}^{n \times n} \,: \, \, \textbf{X}{} \textbf{1} = \textbf{1}, \, \textbf{X}^\top \textbf{1} = \textbf{1} \right\} .\) The permutation matrices that additionally satisfy \(\text {diag}(\textbf{X}) = \textbf{0}_n\) induce a disjoint cycle cover in \(K_n\).

Similar to the definition of \(\mathcal {T}_n(G)\), we can also restrict \(\varPi _n\) to the entries induced by G. That is, \(\varPi _n(G)\) has a zero on position (ij) whenever \((i,j) \notin A\).

4.2 ISDP based on algebraic connectivity in directed graphs

Cvetković et al. [19] derive an ISDP formulation of the symmetric linear TSP based on algebraic connectivity. We now exploit the equivalent of this notion for directed graphs to derive two ISDP formulations of the QTSP. Different from our approach, there was no attempt in [19] to solve the ISDP itself, only its SDP relaxation.

Let \(\mathbf{D_G}\) be an \(n \times n\) diagonal matrix that contains the outdegrees of the nodes of G on the diagonal. Moreover, let \(\mathbf{A_G}\) denote the adjacency matrix of G. That is, \((A_G)_{ij} = 1\) if there exists an arc from i to j in G, and \((A_G)_{ij} = 0\) otherwise. We define the directed out-degree Laplacian matrix of G as \(\mathbf{L_G}:= \mathbf{D_G} - \mathbf{A_G}\). The matrix \(\mathbf{L_G}\) can be asymmetric and has a zero eigenvalue with corresponding eigenvector \(\textbf{1}_n\). Observe that there exist also other ways for defining the directed graph Laplacian of G, see e.g., [14]. Wu [70] generalized Fiedler’s notion of algebraic connectivity of an undirected graph [32] to directed graphs, by exploiting the out-degree Laplacian matrix.

Definition 6

The algebraic connectivity of a directed graph G is given by

$$\begin{aligned} a(G) := \min _{\textbf{x} \in S} \textbf{x}^\top \mathbf{L_G} \textbf{x} = \min _{\begin{array}{c} \textbf{x} \in \mathbb {R}^n \\ \textbf{x} \ne \textbf{0}, \textbf{x} \perp \textbf{1}_n \end{array}} \frac{\textbf{x}^\top \mathbf{L_G} \textbf{x}}{\textbf{x}^\top \textbf{x}} = \lambda _{\min } \left( \frac{1}{2} \textbf{W}^\top \left( \mathbf{L_G} + \mathbf{L_G}^\top \right) \textbf{W} \right) , \end{aligned}$$

where \(S:= \left\{ \textbf{x} \in \mathbb {R}^n \,: \, \, \textbf{x} \perp \textbf{1}_n \,, \, \, \Vert \textbf{x} \Vert _2 = 1 \right\} \) and \(\textbf{W} \in \mathbb {R}^{n \times (n-1)}\) is a matrix whose columns form an orthonormal basis for \(\textbf{1}_n^\perp \).

The last equality in Definition 6 follows from the Courant-Fischer theorem. Observe that a(G) is not necessarily equal to the second smallest eigenvalue of the directed Laplacian matrix, which is the definition of its undirected counterpart. The algebraic connectivity a(G) as defined in Definition 6 is a real number that can be negative.

A directed graph is called balanced if for each node its indegree is equal to its outdegree. Let \(\textbf{B} \in \{-1,0,1\}^{n \times m}\) be the signed incidence matrix of G, i.e., \(B_{i,e} = -1\) if arc leaves node i, \(B_{i,e} = 1\) if e enters node i and \(B_{i,e}=0\) otherwise. One can verify that G is balanced if and only if \(\mathbf{L_G} + \mathbf{L_G}^\top = \textbf{BB}^\top \). This implies that for balanced graphs the matrix \(\frac{1}{2}(\mathbf{L_G} + \mathbf{L_G}^\top )\) is positive semidefinite. Wu [70] observes that if G is balanced, then \(a(G) = \lambda _2 ( ( \mathbf{L_G} + \mathbf{L_G}^\top )/2) \ge 0.\) A directed graph is called strongly connected if for every pair of distinct nodes \(u, v \in N\) there exists a directed path from u to v in G. The balanced graphs that are strongly connected are characterized by their algebraic connectivity, see Proposition 5 below. Connectedness of directed graphs is also studied in [14, 66].

Proposition 5

(Wu [70]) Let a directed graph G be balanced. Then, \(a(G) >0\) if and only if G is strongly connected.

This characterization can be exploited to derive a certificate for a tour matrix via a linear matrix inequality. In order to do so, we consider the spectrum of a Hamiltonian cycle. Let \(\mathcal {C}\) be a Hamiltonian cycle in G corresponding to the tour matrix \(\textbf{X} \in \mathcal {T}_n(G)\), see (25). We then have \(\frac{1}{2}\left( \mathbf{L_{\mathcal {C}}} + \mathbf{L_{\mathcal {C}}}^\top \right) = \textbf{I}_{\textbf{n}} - \frac{1}{2}(\textbf{X} + \textbf{X}^\top )\). The matrix \(\textbf{X} + \textbf{X}^\top \) with \(\textbf{X} \in \mathcal {T}_n(G)\) has the same spectrum as the adjacency matrix of the standard undirected n-cycle. As a result, the spectrum of \(\frac{1}{2}(\textbf{X} + \textbf{X}^\top )\) is given by \(\cos \left( \frac{2 \pi j}{n}\right) \) for \(j \in [n]\) see e.g., [19]. From this, it follows that the spectrum of \(\frac{1}{2}\left( \mathbf{L_\mathcal {C}} + \mathbf{L_\mathcal {C}}^\top \right) \) is given by \(1 - \cos \left( {2 \pi j}/{n}\right) \text {for } j \in [n],\) and the algebraic connectivity of a directed Hamiltonian cycle \(\mathcal {C}\) is \(a(\mathcal {C}) = 1 - \cos (2 \pi / n)\). We define:

$$\begin{aligned} k_n := \cos \left( \frac{2\pi }{n}\right) \quad \text {and} \quad h_n := 1 - k_n. \end{aligned}$$
(29)

Next, we extend a result by Cvetković et al. [19] from undirected to directed Hamiltonian cycles.

Theorem 10

Let H be a spanning subgraph of a directed graph G where the in- and outdegree equals one for all nodes in H. Let \(\textbf{X}\) be its adjacency matrix and let \(\alpha , \beta \in \mathbb {R}\) be such that \(\alpha \ge h_n / n\) and \(k_n \le \beta < 1\), with \(k_n, h_n\) as defined in (29). Then, H is a directed Hamiltonian cycle if and only if

$$\begin{aligned} \textbf{Z}:= \beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} - \frac{1}{2}\left( \textbf{X} + \textbf{X}^\top \right) \succeq \textbf{0}. \end{aligned}$$

Proof

Let \(\mathbf{L_H}\) be the Laplacian matrix of H and let \(\textbf{W}\) be as given in Definition 6. Then \(a(H) = \lambda _{\min } \left( \frac{1}{2} \textbf{W}^\top \left( \mathbf{L_H} + \mathbf{L_H}^\top \right) \textbf{W} \right) \). Let \(\textbf{Z} \succeq \textbf{0}\). This implies that \(\textbf{W}^\top \textbf{ZW} \succeq \textbf{0}\), i.e.,

$$\begin{aligned} \textbf{W}^\top \textbf{Z W}&= \textbf{W}^\top \left( \beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} - \frac{1}{2}\left( \textbf{X} + \textbf{X}^\top \right) \right) \textbf{W}\\&= \beta \textbf{W}^\top \textbf{W} + \alpha \textbf{W}^\top \textbf{J}_{\textbf{n}} \textbf{W} - \frac{1}{2} \textbf{W}^\top \left( \textbf{X} + \textbf{X}^\top \right) \textbf{W} \\&= \beta \mathbf{I_{n-1}} - \frac{1}{2} \textbf{W}^\top \left( \textbf{X} + \textbf{X}^\top \right) \textbf{W} = (\beta - 1)\mathbf{I_{n-1}} + \frac{1}{2}{} \textbf{W}^\top \left( \mathbf{L_H} + \mathbf{L_H}^\top \right) \textbf{W} \succeq \textbf{0}, \end{aligned}$$

where we used the fact that \(\mathbf{J_n W} = \textbf{0}\) and \(\frac{1}{2}(\mathbf{L_H} + \mathbf{L_H}^\top ) = \textbf{I}_{\textbf{n}} - \frac{1}{2}(\textbf{X} + \textbf{X}^\top )\). The linear matrix inequality above can be rewritten as

$$\begin{aligned} \frac{1}{2} \textbf{W}^\top \left( \mathbf{L_H} + \mathbf{L_H}^\top \right) \textbf{W} \succeq (1 - \beta )\mathbf{I_{n-1}} \quad \Longrightarrow \quad a(H)&= \lambda _{\min } \left( \frac{1}{2} \textbf{W}^\top \left( \mathbf{L_H} + \mathbf{L_H}^\top \right) \textbf{W} \right) \\&\ge 1 - \beta . \end{aligned}$$

Since \(\beta < 1\), we have \(\alpha (H) > 0\). Because H is balanced, it follows from Proposition 5 that H is strongly connected and, thus, H is a directed Hamiltonian cycle.

Conversely, let H be a directed Hamiltonian cycle. Then, \(a(H) = \lambda _{\min } \left( \frac{1}{2} \textbf{W}^\top \left( \mathbf{L_H} + \mathbf{L_H}^\top \right) \textbf{W} \right) = 1 - k_n\). Since \(\beta \ge k_n\), we have \( \frac{1}{2} \textbf{W}^\top \left( \mathbf{L_H} + \mathbf{L_H}^\top \right) \textbf{W} - (1 - \beta )\mathbf{I_{n-1}} \succeq \textbf{0} \Longleftrightarrow \textbf{W}^\top \textbf{Z W} \succeq \textbf{0}, \) following the same derivation as above. Now, let \(\textbf{x} \in \mathbb {R}^n\). Since the columns of \(\textbf{W}\) form a basis for \(\textbf{1}_n^\perp \), \(\textbf{x}\) can be written as \(\textbf{x} = \textbf{Wy} + \delta \textbf{1}_n\) for some \(\textbf{y} \in \mathbb {R}^{n-1}\) and \(\delta \in \mathbb {R}\). This yields:

$$\begin{aligned} \textbf{x}^\top \textbf{Z x}&= \textbf{y}^\top \textbf{W}^\top \textbf{Z W y} + 2 \delta \textbf{y}^\top \textbf{W}^\top \textbf{Z} \textbf{1}_n + \delta ^2 \textbf{1}_n^\top \textbf{Z} \textbf{1}_n \\&= \underbrace{\textbf{y}^\top \textbf{W}^\top \textbf{Z W} \textbf{y}}_{\ge 0} + \underbrace{2 \delta \textbf{y}^\top \textbf{W}^\top \left( (\beta -1) \textbf{1}_n + \alpha n \textbf{1}_n \right) }_{= 0} + \underbrace{\delta ^2 n \left( (\beta -1) + \alpha n \right) }_{\ge 0}, \end{aligned}$$

where we used the facts that \(\textbf{W}^\top \textbf{Z W} \succeq \textbf{0}, \textbf{W}^\top \textbf{1}_n = \textbf{0}\) and \(\beta -1 + \alpha n \ge k_n -1 + n\frac{1 - k_n}{n} = 0\). Thus, \(\textbf{Z} \succeq \textbf{0}\). \(\square \)

In order to present our first ISDP formulation of the QTSP, we derive an explicit expression for the set \(\mathcal {T}_n(G)\) and linearize the objective function. The former can be done using Theorem 10. The set \(\mathcal {T}_n(G)\) can be fully characterized by the permutation matrices that satisfy a linear matrix inequality. That is,

$$\begin{aligned} \mathcal {T}_n(G) = \varPi _n(G) \cap \left\{ \textbf{X} \in \mathcal {S}^n \, : \, \, \beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} - \frac{1}{2}(\textbf{X} + \textbf{X}^\top ) \succeq \textbf{0} \right\} , \end{aligned}$$
(30)

for all \(\alpha \ge h_n / n\) and \(k_n \le \beta < 1\). Recall that \(\varPi _n(G)\) is the set of permutation matrices implied by G, see Sect. 4.1.

To linearize the objective function, we follow the same construction as proposed by Fischer et al. [35]. For all two-arcs \((i,j,k) \in \mathcal {A}\), see (27), we define a variable \(y_{ijk}:= x_{ij}x_{jk}\). This equality can be guaranteed by the introduction of the following set of linear coupling constraints:

$$\begin{aligned} x_{ij} = \sum _{\begin{array}{c} k \in N : \\ (k,i,j) \in \mathcal {A} \end{array}} y_{kij} = \sum _{\begin{array}{c} k \in N : \\ (i,j,k) \in \mathcal {A} \end{array}} y_{ijk} \text { for all } (i,j) \in A \quad \text {and} \quad y_{ijk} \ge 0 \text { for all } (i,j,k) \in \mathcal {A}. \end{aligned}$$

We define the following set:

$$\begin{aligned} \mathcal {F}_1 := \left\{ (\textbf{y}, \textbf{X}) \in \{0,1\}^{\mathcal {A}} \times \varPi _n(G) \, : \, \, x_{ij} = \sum _{\begin{array}{c} k \in N : \\ (k,i,j) \in \mathcal {A} \end{array}} y_{kij} = \sum _{\begin{array}{c} k \in N : \\ (i,j,k) \in \mathcal {A} \end{array}} y_{ijk} \quad \forall (i,j) \in A \right\} . \end{aligned}$$
(31)

Now, our first ISDP formulation of the QTSP is as follows:

figure c

where \(\alpha \ge h_n /n\) and \(k_n \le \beta < 1\). One can verify that setting \(\alpha = h_n /n\) and \(\beta = k_n\) leads to the strongest linear matrix inequality among all possible values for \(\alpha \) and \(\beta \). Thus, we use these values in the computational results of Sect. 5.

Remark 2

In fact, we do not need to enforce integrality on \(\textbf{y}\) explicitly. Namely, if \(\textbf{X} \in \mathcal {T}_n(G)\), it follows from the integrality of \(\textbf{X}\) and the coupling constraints that \(y_{ijk} = 1\) if \((i,j,k) \in \mathcal {A}\) is used in the tour and 0 otherwise. Hence, when optimizing over \(\mathcal {F}_1\) using a B &B or B &C algorithm, we relax the integrality constraint on \(\textbf{y}\) and branch on \(\textbf{X}\) only.

In what follows, we further exploit properties of tour matrices to derive our second ISDP formulation of the QTSP. Let \(\textbf{X} \in \mathcal {T}_n(G)\) be a tour matrix and define \(\textbf{X}^{\mathbf{(2)}} = (x^{(2)}_{ij}):= \textbf{X}\cdot \textbf{X}\). For \(i,k \in N\) we have \(x^{(2)}_{ik}= \sum _{j=1}^n x_{ij}x_{jk} = \sum _{j \in N: (i,j,k) \in \mathcal {A}} y_{ijk}\), where the last equality follows from the definition of \(\textbf{y}\). Thus, \(\textbf{X}^{\mathbf{(2)}}\) is a binary matrix and \(x^{(2)}_{ik} =1\) if and only if the length of the shortest directed path from i to k in the subgraph induced by \(\textbf{X}\) is equal to two.

We can again characterize a tour matrix as in Theorem 10 by combining the variables \(\textbf{X}\) and \(\textbf{X}^{\mathbf{(2)}}\). Observe that the directed graph induced by \(\textbf{X}^{\mathbf{(2)}}\) is balanced with in- and outdegree one, and circulant (but not strongly connected for even n). Moreover, the circulant graph \(\mathcal {C}_2\) corresponding to \(\textbf{X}+\textbf{X}^{\mathbf{(2)}}\) is strongly connected and balanced with in- and outdegree two. The spectrum of \( \frac{1}{2}( (\textbf{X}+\textbf{X}^{\mathbf{(2)}}) +(\textbf{X}+\textbf{X}^{\mathbf{(2)}} )^\top )\) for any \(\textbf{X} \in \mathcal {T}_n(G)\) and \(\textbf{X}^{\mathbf{(2)}} = \textbf{X} \cdot \textbf{X}\) is given by

$$\begin{aligned} \cos \left( \frac{2 \pi j}{n}\right) + \cos \left( \frac{4 \pi j}{n}\right) \quad \text {for } j \in [n], \end{aligned}$$
(32)

which results in the algebraic connectivity of \(\mathcal {C}_2\) being \(a(\mathcal {C}_2) = 2 - ( \cos (2 \pi / n)+\cos (4 \pi / n))\). We define

$$\begin{aligned} k_n^{(2)} := \cos \left( \frac{2\pi }{n}\right) + \cos \left( \frac{4\pi }{n}\right) \quad \text {and} \quad h_n^{(2)} := 2 - k_n^{(2)}. \end{aligned}$$
(33)

Now, we are ready to state the following theorem.

Theorem 11

Let H be a spanning subgraph of a directed graph G where the in- and outdegree equals one for all nodes in H. Let \(\textbf{X}\) be its adjacency matrix and let \(\textbf{X}^{\mathbf{(2)}}:= \textbf{X}\cdot \textbf{X}\) be the distance two adjacency matrix. Let \(\alpha ^{(2)}, \beta ^{(2)} \in \mathbb {R}\) be such that \(\alpha ^{(2)} \ge h_n^{(2)} / n\) and \(k_n^{(2)} \le \beta ^{(2)} < 2\), with \(k_n^{(2)}\), \(h_n^{(2)}\) as defined in (33). Then H is a directed Hamiltonian cycle if and only if

$$\begin{aligned} \textbf{Z}:= \beta ^{(2)} \textbf{I}_{\textbf{n}} + \alpha ^{(2)} \textbf{J}_{\textbf{n}} - \frac{1}{2}\left( (\textbf{X} + \textbf{X}^{\mathbf{(2)}}) +(\textbf{X} + \textbf{X}^{\mathbf{(2)}} )^\top \right) \succeq \textbf{0}. \end{aligned}$$

Proof

See “Appendix 1”. \(\square \)

We define the set \(\mathcal {F}_2\) as follows:

$$\begin{aligned} \mathcal {F}_2 := \left\{ \left( \textbf{y}, \textbf{X}, \textbf{X}^{\mathbf{(2)}} \right) \in \mathcal {F}_1 \times \varPi _n(G^2) \, : \, \, x^{(2)}_{ik} = \sum _{ \begin{array}{c} j \in N : \\ (i,j,k) \in \mathcal {A} \end{array} }^n y_{ijk} \,\, \forall (i,k) \in A^2 \right\} , \end{aligned}$$
(34)

where

$$\begin{aligned} \varPi _n(G^2)&:=\left\{ \textbf{X}^{\mathbf{(2)}} \in \{0,1\}^{n \times n} \, : \, \, \textbf{X}^{\mathbf{(2)}} \textbf{1} = \textbf{1}, \, (\textbf{X}^{\mathbf{(2)}})^\top \textbf{1} = \textbf{1}, \, \text {diag}(\textbf{X}^{\mathbf{(2)}}) = \textbf{0}, \right. \\&\qquad \qquad \left. x^{(2)}_{ij} = 0 \,\, \forall (i,j) \notin A^2 \right\} , \end{aligned}$$

and \(A^2\) is the set of node pairs (ij) for which there exists a directed path from i to j of length 2. The set \(\mathcal {F}_2\) and the result of Theorem 11 lead to our second ISDP formulation of the QTSP:

figure d

where \(\alpha \ge h_n /n\), \(k_n \le \beta < 1\), \(\alpha ^{(2)} \ge h_n^{(2)} / n\) and \(k_n^{(2)} \le \beta ^{(2)} < 2\). Again the choice of \(\alpha , \beta , \alpha ^{(2)}\) and \(\beta ^{(2)}\) equal to their lower bounds provides the strongest continuous relaxation.

It follows from Theorem 11 that one can remove the first linear matrix inequality in \((ISDP_{2})\) and still obtain an exact formulation of the QTSP. However, the bound obtained from the SDP relaxation of \((ISDP_{2})\) dominates the bound obtained from the SDP relaxation of \((ISDP_{1})\). In that sense, the formulation (\(ISDP_{2}\)) can be seen as a level two formulation of the QTSP, whose continuous relaxation is stronger than that of the first level formulation. An additional advantage of the level two formulation is that both linear matrix inequalities may be used to generate CG cuts, as we show in the following section.

In the same vein, one can construct level k formulations of the QTSP for \(k = 3, \ldots , n\). This leads to a hierarchy of formulations, whose SDP relaxations are of increasing strength and complexity.

4.3 Chvátal–Gomory cuts for the ISDPs of the QTSP

In order to solve \((ISDP_1)\) and \((ISDP_2)\) using our B &C algorithm, we study various CG-based separation routines for the QTSP. We first derive a general CG cut generator for the formulations (\(ISDP_{1}\)) and (\(ISDP_{2}\)). Thereafter, we show how different types of well-known inequalities for the QTSP can be derived as CG cuts of the formulations \((ISDP_{1})\) and \((ISDP_{2})\).

Let us consider \((ISDP_{1})\). The set \(\mathcal {F}_1\), see (31), consists of all tuples \((\textbf{y},\textbf{X})\) where \(\textbf{X}\) represents a node-disjoint cycle cover in G. Our B &C algorithm starts with optimizing over the set \(\mathcal {F}_1\), where we are allowed to relax the integrality of \(\textbf{y}\) at no cost, see Remark 2. If an integer point \((\hat{\textbf{y}}, \hat{\textbf{X}}) \) is found in the branching tree, it is verified whether \(\lambda _{\min } \left( \beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} - \frac{1}{2}\left( \hat{\textbf{X}} + \hat{\textbf{X}}^\top \right) \right) \ge 0\). If so, then \(\hat{\textbf{X}} \in \mathcal {T}_n(G)\) and we have found a possibly new incumbent solution. If not, then \(\hat{\textbf{X}}\) is the adjacency matrix of a node-disjoint cycle cover that is not a Hamiltonian cycle. Therefore we have to generate dual matrices that cut off the current point.

The first separation routine that we present is based on finding a set of integer eigenvectors corresponding to a negative eigenvalue of \(\beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} - \frac{1}{2}\left( \hat{\textbf{X}} + \hat{\textbf{X}}^\top \right) \).

Proposition 6

Let \(\textbf{X} \in \varPi _n(G)\) be the adjacency matrix of a directed node-disjoint cycle cover consisting of \(k \ge 2\) cycles. Let \(\{S_1, \ldots , S_k \}\) be the partition of the nodes implied by the cycle cover and define for each \(l \in [k]\) the vector

$$\begin{aligned} v^l_i := {\left\{ \begin{array}{ll} n - |S_l| &{} \text {if } i \in S_l \\ - |S_l| &{} \text {if } i \notin S_l. \end{array}\right. } \end{aligned}$$

Then \(\left\langle \mathbf{v^l}(\mathbf{v^l})^\top , \, \beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} - \frac{1}{2}( \textbf{X} + \textbf{X}^\top ) \right\rangle < 0\) for all \(l \in [k]\).

Proof

The vectors \(\mathbf{v^l}\) are eigenvectors of \(\textbf{X}\) and \(\textbf{X}^\top \) corresponding to eigenvalue 1. Therefore we have:

$$\begin{aligned} \left( \beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} - \frac{1}{2}( \textbf{X} + \textbf{X}^\top ) \right) \mathbf{v^l}&= \beta \mathbf{v^l} + \alpha \left( (n - |S_l|)\cdot |S_l| + (n-|S_l|)\cdot (-|S_l|) \right) \textbf{1} \\&\quad - \frac{1}{2}{} \mathbf{v^l} - \frac{1}{2}{} \mathbf{v^l} \\&= (\beta - 1)\mathbf{v^l}, \end{aligned}$$

from where it follows that \(\mathbf{v^l}\) is an eigenvector of \(\beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} - \frac{1}{2}( \textbf{X} + \textbf{X}^\top )\) corresponding to eigenvalue \(\beta - 1\). Since we assume \(\beta < 1\), this eigenvalue is negative, from which the conclusion follows. \(\square \)

The result of Proposition 6 can be used within our B &C algorithm in the following way. Let \(\{S_1, \ldots , S_k\}\) be the partition of the nodes implied by the current solution \(\hat{\textbf{X}}\) and let \(\mathbf{U^l}:= \mathbf{v^l}(\mathbf{v^l})^\top \) where \(\mathbf{v^l}\) is as defined in Proposition 6. Then for each \(l \in [k]\) we construct the following CG cuts:

$$\begin{aligned} \left\langle \mathbf{U^l}, \frac{1}{2}(\textbf{X} + \textbf{X}^\top )\right\rangle \le \left\lfloor \langle \textbf{U}^{\textbf{l}}, \beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} \rangle \right\rfloor \quad \Longleftrightarrow \quad \left\langle \textbf{U}^{\textbf{l}}, \textbf{X} \right\rangle \le \left\lfloor \langle \textbf{U}^{\textbf{l}}, \beta \textbf{I}_{\textbf{n}} + \alpha \textbf{J}_{\textbf{n}} \rangle \right\rfloor , \end{aligned}$$
(35)

which cut off the current point. Observe that the choice \(\alpha = h_n /n\) and \(\beta = k_n\) leads to non-integer values for \(\alpha \) and \(\beta \), i.e., the CG rounding step provides a strengthened eigenvalue cut.

Since the result of Proposition 6 can be repeated for the extended linear matrix inequality in Theorem 11, we also obtain the following CG cuts with respect to \((ISDP_2)\):

$$\begin{aligned} \left\langle \textbf{U}^{\textbf{l}}, \textbf{X} + \textbf{X}^{\mathbf{(2)}} \right\rangle \le \left\lfloor \langle \textbf{U}^{\textbf{l}}, \beta ^{(2)} \textbf{I}_{\textbf{n}} + \alpha ^{(2)} \textbf{J}_{\textbf{n}} \rangle \right\rfloor \qquad \forall l \in [k]. \end{aligned}$$
(36)

Next, we consider the class of subtour elimination constraints. It has been shown by Çezik and Iyengar [15] that the ordinary subtour elimination constraints defined by Dantzig et al. [23] can be obtained as CG cuts for the symmetric TSP, provided that \(\alpha \) and \(\beta \) equal their lower bounds. We extend the result from [15] and present five types of subtour elimination constraints that are in fact (strengthened) CG cuts of \((ISDP_1)\) and/or \((ISDP_2)\), see Table 1. Many of these constraints do not follow directly from the linear matrix inequalities, but require the addition of a positive multiple of a subset of the affine constraints. It is shown by Fischer [34] that the inequalities IV and V of Table 1 define facets of the asymmetric quadratic traveling salesman polytope.

Table 1 Five types of subtour elimination constraints for the QTSP that can be obtained as (strengthened) CG cuts of \((ISDP_1)\) and/or \((ISDP_2)\)

In “Appendix 3”, we explicitly derive these inequalities as (strengthened) CG cuts.

5 Computational results

In this section we test our ISDP formulations of the QTSP, see Sect. 4. We solve the ISDPs using various settings of our CG-based B &C framework, see Algorithm 1, where we include different sets of cuts from Sect. 4.3 in the separation routines. We compare the performance of our approach with the two other ISDP solvers from the literature.

5.1 Design of numerical experiments

In total we compare seven different approaches, among which two from the literature and five variants of our B &C approach. The former class consists of the following:

  • KT: The B &C algorithm of Kobayashi and Takano [46], see Sect. 3.1.

  • SCIP-SDP: The general ISDP solver of Gally et al. [40]. This approach is based on solving continuous SDPs in a B &B framework.

A third project that is known for its ability to solve ISDPs is YALMIP [49]. Preliminary experiments show, however, that the solver of [49] is significantly outperformed by the solvers from [40] and [46]. Therefore, we do not take the solver of YALMIP into account.

On top of the approaches from the literature, we consider five variants of our B &C procedure that differ in the initial feasible set and the type of cuts that we add in the separation routine:

  • CG1 In this setting we solve \((ISDP_1)\) where we initially optimize over \(\mathcal {F}_1\), see (31). In the separation routine we add the CG cut of the form (35) for each subtour present in the current candidate solution.

  • CG2 In this setting we solve the second QTSP formulation \((ISDP_2)\). We initially optimize over \(\mathcal {F}_2\), see (34), and in each callback iteration we add the CG cuts of the form (35) and (36) for each subtour in the current candidate solution.

  • SEC-simple In this setting we solve \((ISDP_1)\) by starting from optimizing over \(\mathcal {F}_1\), see (31). In the callback procedure, we add the ordinary subtour elimination constraints, see Type I in Table 1, for all subtours in the current candidate solution.

  • SEC This setting solves \((ISDP_2)\) with subtour elimination constraints of Type I, IV and V from Table 1. The latter type of constraint is added only for the subtours of size less than \(\frac{1}{2}n\). Since the order two variables \(\textbf{X}^{\mathbf{(2)}}\) in this setting do not appear directly in the cutting planes, we eliminate them also from the initial MILP based on preliminary tests. That is, we start optimizing over \(\mathcal {F}_1\), see (31). Moreover, based on a result by Fischer et al. [35] we also add additional cuts to forbid subtours of three nodes. For a triple ijk of distinct nodes, the following cut is valid for any tour: \(y_{ijk} + y_{kij} \le x_{ij}.\) We add this cut for all distinct \(i, j, k \in S\) in the separation routine whenever a subtour on S with \(|S| = 3\) is present in the current candidate solution. Observe that there are six of them for each triple of nodes.

  • SEC-CG This setting solves \((ISDP_2)\), starting from \(\mathcal {F}_2\), see (34). In the separation routines, we add all the cuts that are included in the previous setting SEC. Moreover, on top of that we also add the CG cuts (35) and (36) in the callback procedure.

Recall that the separation routines are only called at integer points, which represent cycle covers of G. Therefore, the separation of all mentioned cuts boils down to identifying the subtours in the cycle cover. Also, recall that the integrality of \(\textbf{y}\) is relaxed in all settings, see Remark 2.

The setting SEC looks similar to the best exact QTSP solving strategy of Fischer et al. [34]. However, there are two main differences between the methods. First, our separation routine is only called on integer points, while the algorithm of [35] separates on fractional points. The separation on integer points is computationally very cheap compared to the fractional separation method applied by [35]. Consequently, the former separation can lead to superior behavior, as observed by Aichholzer et al. [2] for the symmetric QTSP. Second, our approach results from a more general B &C framework for solving integer SDPs, which is not limited to the QTSP.

Notice that the derived CG cuts of Type II and III from Table 1 are not added in the test settings. Preliminary experiments have shown that the cut-set subtour elimination constraints (Type II of Table 1) have similar practical behaviour compared to the ordinary subtour elimination constraints. Also, preliminary tests show that the addition of one merged Type III cut instead of all separate Type I cuts leads to worse behaviour in terms of overall computation time. We expect this difference to be caused by the sparsity of the Type I cuts, compared to the very dense Type III cuts.

For our tests, we consider three types of instancesFootnote 1:

  • Real instances from bioinformatics: Jäger and Molitor [45], Fischer [34] and Fischer et al. [35, 36] consider an important application of the QTSP in computational biology. In order to recognise transcription factor binding sites or RNA splice sites in a given set of DNA sequences, Permuted Markov (PM) models [29] or Permuted Variable Length Markov (PVLM) models [72] can be used. Finding the optimal order two PM or PVLM model boils down to solving a QTSP instance. We consider three classes of bioinformatics instances used in [33, 34], which are denoted by ‘bma’, ‘map’ and ‘ml’. Each class consists of 38 instances with \(n \in \{3, \ldots , 40\}\).

  • Reload instances: The reload instances are the same as the ones used by Rostami et al. [61] and De Meijer and Sotirov [53]. The reload model [68] is inspired by logistics and energy distribution, where a certain cost is incurred whenever the underlying type of arc in a network changes, e.g., the means of transport. Let G be a directed graph where each arc (ij) is present with probability p. Each arc in G is randomly assigned a color from a color set L with cardinality c. If two successive arcs e and f have colors s and t, respectively, the quadratic cost among e and f equals r(st), where \(r: L \times L \rightarrow \mathbb {R}\) is a reload cost function such that \(r(s,s) = 0\) for all \(s \in L\). We consider two types of reload classes:

    • Reload class 1 For each pair of distinct colors \(s,t \in L\) the reload cost equals \(r(s,t) = 1\);

    • Reload class 2 For each pair of distinct colors \(s,t \in L\), the reload cost r(st) is chosen uniformly at random from \(\{1, \ldots , 10\}\).

    For each class, we consider 10 distinct instances for each possible combination of \(n \in \{10, 15, 20, {25}\}\), \(p \in \{0.5, 1\}\) and \(c \in \{5, 10, 20\}\), except for the combination between \(n = 25\) and \(p = 1\) due to extremely large computation times. Thus, in total we consider 420 reload instances.

  • Turn cost instances: The special case of the QTSP where the nodes are points in Euclidean space and the angle cost of a tour is the sum of the direction changes at the points is called the Angular-Metric Traveling Salesman Problem (Angle-TSP) [1]. The Angle-TSP is motivated by VLSI design and proven to be \(\mathcal{N}\mathcal{P}\)-hard [1]. The problem is in the literature also known as the Minimum Bends Traveling Salesman Problem [64]. We consider two classes of this type:

    • TSPLIB instances: The TSP library (TSPLIB) [59] contains a broad set of TSP test instances, among which a large number of Euclidean instances. We construct a corresponding QTSP instance as follows: Given points \(v_1, \ldots , v_n\) in \(\mathbb {R}^2\), we let G be the complete graph on n vertices. For ijk, \(i \ne j\), \(j \ne k\), \(i \ne k\), we define \(q_{ijk}\) to be proportional to the angle between edges \(\{i, j\}\) and \(\{j, k\}\). More precisely,

      $$\begin{aligned} q_{ijk} := \Bigg \lceil 10 \cdot \left( 1 - \frac{1}{\pi }\arccos \left( \frac{(v_i - v_j)^\top (v_k - v_j)}{\Vert v_i - v_j \Vert \cdot \Vert v_j - v_k \Vert } \right) \right) \Bigg \rceil . \end{aligned}$$

      This cost structure is similar to the angle-distance costs considered in Fischer et al. [35] and De Meijer and Sotirov [52]. In total, we consider 9 TSPLIB instances with n ranging from 15 to 70. Figure 5a depicts one of the TSPLIB instances including its optimal tour with respect to the defined quadratic cost structure.

    • Grid instances: Fekete and Krupke [30, 31] consider problems of computing optimal covering tours and cycle covers under a turn cost model, see also Arkin et al. [3]. These problems have many practical applications, such as pest control and precision farming. Following this line, we consider the Angle-TSP on grid graphs. We construct a 2D connected grid graph using the Type II instance generator of [31]. Given the vertex coordinate vectors \(v^1, \ldots , v^n \in \{0, \ldots , N_1\} \times \{0, \ldots , N_2\}\) for integers \(N_1, N_2\), we include an edge between vertex i and j if and only if (\(v_1^i = v_1^j\) and \(|v^i_2 - v^j_2| = 1\)) or (\(v^i_2 = v^j_2\) and \(|v^i_1 - v^j_1| = 1\)). If two edges \(\{i, j\}\) and \(\{j, k\}\) are present, the quadratic costs are computed similar as for the TSPLIB instances. In total we consider 9 grid instances with \(N_1\) and \(N_2\) running from 20 to 80, corresponding to n ranging from 430 to 2646. An example of a grid instance including its minimum bend tour is given in Fig. 5b.

    Both types of turn cost instances are in fact instances of the symmetric QTSP, as they are defined on undirected graphs. To account for this, we use symmetrized versions of \((ISDP_1)\) and \((ISDP_2)\) instead. We refer to “Appendix 4” for the construction of these formulations.

Fig. 5
figure 5

Optimal tours of two turn instances: the TSPLIB instance ‘kn57’ (\(n = 57\)) and the grid instance ‘grid1’ (\(n = 430\)). Each square in Fig. 2b represents a vertex in the grid graph

All our algorithms, including the algorithm of Kobayashi and Takano [46], are implemented in Julia 1.5.3 using JuMP v0.21.10 [27] to model the mathematical optimization problems. In particular, we exploit the solver-independent lazy constraint callback option of JuMP to include the separation routines. Solving the underlying MILP in the subproblems is done using Gurobi v9.10 [44] in the default settings including built-in cuts. Experiments are carried out on a PC with an Intel(R) Core(TM) i7-8700 CPU, 3.20GHz, 8GB RAM. To run SCIP-SDP, we use SCIP-SDP version 3.2.0 on the NEOS Server [20], where the B &B framework of SCIP 7.0.0 [41] and the SDP solver Mosek 9.2 [54] are combined in the default configuration.

Observe that an older version of SCIP-SDP with DSDP [9] as SDP solver was used in the numerical experiments of [46], which partly explains the poor behaviour of SCIP-SDP compared to the B &C algorithm of [46]. However, our computational study that uses SCIP-SDP with the state-of-the-art SDP solver Mosek [54] also shows superior behaviour of the B &C algorithms.

We test all seven settings on the bioinformatics and reload instances. Since these instance classes give a clear and consistent overview of the superior approaches, we restrict ourselves to the best three settings for the turn cost instances. The maximum computation time for all our approaches is set to 8 h, which is in correspondence with the maximum computation time on the NEOS Server [20].

Table 2 Summary table of the performance on the bioinformatics instances per setting and per instance type
Table 3 Overview of average computation times for the reload instances

5.2 Comparison of approaches

Table 2 and Fig. 6 provide an overview of the performance on the instances from bioinformatics. For each setting, the average values in Table 2 are only computed over the instances that could be solved to optimality for that setting. An extended table on the results per instance can be found in “Appendix 5”. Observe that the percentage of instances solved is quite similar over the three instance classes. This indicates that it is mainly the size rather than the cost structure that determines whether a bioinformatics instance can be solved or not. It is clear that our B &C settings significantly outperform the other two ISDP solvers SCIP-SDP and KT, which can solve at most 60% of the instances to optimality. Since the separation routine of CG1 is based on the identification of an integer eigenvector corresponding to a negative eigenvalue, the settings KT and CG1 are almost identical apart from the CG rounding step. The large decrease in the number of branching nodes of CG1 compared to KT is remarkable. This indicates that the effect of deeper cuts as shown in Fig. 4 is not solely theoretical, but also turns out to be substantial from a practical point of view.

When comparing the five different separation routines of our B &C approach, we also see a clear pattern. The settings SEC and SEC-CG turn out to be superior, being able to solve all instances within short computation times. Although SEC generally provides the fastest algorithm, it sometimes happens that SEC-CG solves the instance faster, see Fig. 6 in “Appendix 5”, due to the smaller number of B &C nodes. This shows that the additional CG cuts can sometimes improve on the subtour elimination constraints. The two approaches are followed by SEC-simple, which is able to solve instances up to \(n = 35\) to optimality. This difference is mainly due to the strengthened subtour elimination cuts (type IV and V in Table 1) that work well for the bioinformatics instances, as also noted by Fischer et al. [35]. Finally, the settings CG1 and CG2 are only able to solve instances up to \(n = 32\) and \(n = 27\), respectively. Although the distance two CG cuts (36) significantly reduce the number of needed branching steps, the overall computation time is larger due to the increase in the number of variables and constraints in CG2.

Next, we discuss the results on the set of reload instances. For both class 1 and 2 and for each value of np and c we consider 10 randomly generated instances. The averaged results for each combination of parameters can be found in “Appendix 5”, see Tables 1213 and 14. In general, we see that the computation times increase with the number of nodes n and the graph density p. On the other hand, if the number of colors c increases, the instances become easier to solve as the number of (optimal) solutions will decrease. Table 3 shows a summary of the results accumulated over the number of colors c. Accordingly, Fig. 7 shows the spread of the computation times, where we also accumulate both reload classes.

When comparing the different settings, we draw similar conclusions as before. Note that SCIP-SDP performs very poorly on the reload instances. The difference between KT and CG1 is not as significant as before, although CG1 is still favourable above KT on almost all instance types. The settings that involve the variables \(\textbf{X}^{\mathbf{(2)}}\) in the root node, i.e., CG2 and SEC-CG, are outperformed by SEC-simple and SEC. Apparently, the increase in the number of variables does not contribute much to the pruning of the branching tree. In fact, the results in “Appendix 5” even suggest that the number of branching nodes sometimes becomes larger. The large spread in computation times for these settings, see Fig. 7 in “Appendix 5”, also suggests that \((ISDP_2)\) leads to a search process that is less robust and that this effect becomes more visible as the instances become larger. However, the S-CG cuts resulting from \((ISDP_2)\) do contribute to the pruning of the tree, as is suggested by the strong performance of SEC. The settings SEC and SEC-simple overall perform best. None of the two algorithms outperforms the other in terms of computation time, even when the problem size goes up, see the additional numerical results in Table 14 of “Appendix 5”.

Finally, we consider the turn cost instances. From the class of bioinformatics and reload instances it is clear that the settings SEC-simple, SEC and SEC-CG generally perform best. Hence, we restrict the numerical results on the turn cost instances to these three settings. Tables 4 and 5 show the computation times and number of branching nodes for the TSPLIB and grid instances, respectively.

The TSPLIB graphs are complete graphs, and hence we can only solve up to \(n = 70\) for this instance type. We are able to solve all TSPLIB instances in a time span 900 s. Since the grid instances are more sparse, we can solve much larger instance sizes to optimality. For this type, instances up to 2646 nodes (!) can be solved to optimality within 15 s. These are currently the largest solved QTSP instances in the literature.

When comparing the three settings, we see that SEC-simple and SEC perform slightly better than SEC-CG on the turn cost instances. Since the different separation routines lead to different relaxations, the branching strategy between the methods can differ. Not surprisingly, the favourable setting is often the one with the smallest number of B &C nodes, regardless of the time per branching node. Taking both the TSPLIB and grid instances into account, this happens slightly more often for the setting SEC-simple.

Table 4 Computation times and number of branching nodes for the TSPLIB instances
Table 5 Computation times and number of branching nodes for the grid instances

6 Conclusions

In this work we study the Chvátal–Gomory cuts for spectrahedra and their strength in solving integer semidefinite programs resulting from combinatorial optimization problems. Accordingly, this paper increases the theoretical understanding of integer semidefinite programming, which in turn contributes to new solution techniques for this type of problems.

In Sect. 2 we study the elementary closure of spectrahedra and the hierarchy obtained by iterating this procedure. Using an alternative formulation of the elementary closure, see (9), we provide simple proofs of several properties, including a homogeneity property for bounded spectrahedra, see Theorem 3. Although some of the here presented results are already known in the literature, the proofs we present are considerably simpler and are mainly based on concepts from mathematical optimization and number theory. We also present the polyhedral description of the elementary closure of spectrahedra whose defining linear matrix inequality is totally dual integral, see Theorem 5. To the best of our knowledge, this is the first such description for the elementary closure of a non-polyhedral set. A full characterization of bounded LMIs that are TDI on \(\mathbb {Z}^m\) is given in Theorem 6. Sufficient conditions for TDI-ness on an appropriate set \(Z \subseteq \mathbb {Z}^m\) are given in Theorems 8 and 9.

A generic B &C algorithm for ISDPs based on strengthened CG cuts is presented in Sect. 3, see Algorithm 1. Our algorithm is a refinement of the algorithm from [46], where the authors use eigenvector based inequalities to separate infeasible integer points. Moreover, our work can be seen as an extension of [15], in which the authors introduce CG cuts for conic programs, but leave the efficient separation of CG cuts as an open problem. Our numerical results indicate the effectiveness of the use of deeper CG cuts. We also provide a separation routine for binary SDPs originating from combinatorial optimization problems, see Sect. 2.

In Sect. 4 we extensively study the application of our approach to the quadratic traveling salesman problem. Based on a generalization of the notion of algebraic connectivity to directed graphs, we present two exact ISDP formulations of the QTSP, see (\(ISDP_{1}\)) and (\(ISDP_{2}\)). We show that the simplest CG separation routine boils down to finding integer eigenvectors of the adjacency matrix of a node-disjoint cycle cover, see Proposition 6. However, more intricate dual multipliers lead to some well-known families of cuts, e.g., the ordinary and strengthened versions of the subtour elimination constraints, see Table 1. We test several variants of our B &C procedure that involve different separation routines.

Numerical results on the QTSP show that our B &C algorithm significantly outperforms the two alternative ISDP solvers of [40, 46]. For the real instances from bioinformatics [35, 36], these solvers are able to solve instances up to only \(n = 15\) and \(n = 25\), respectively, whereas our method can solve all instances up to \(n = 40\) in a short timespan. As one would expect, the extension to CG inequalities leads to deeper cuts, which successfully reduces the size of the branching tree compared to [46]. From all considered separation routines, it turns out that the setting SEC, see page 24, is overall most effective. This setting was able to solve almost all of the 552 tested QTSP instances to optimality within 5 min, where the largest instance contains \(m = 5172\) arcs. This is currently the largest solved QTSP instance in the literature.

Our work inspires several future research directions. It would be interesting to study the performance of our B &C algorithm when applied to other optimization problems that can be formulated as ISDPs. We expect the exploitation of CG cuts in the branching scheme to be effective for such ISDPs. Moreover, as for the QTSP many known classes of cuts turned out to be (strengthened) CG cuts with respect to the ISDP formulation, it would be interesting to know whether this also holds for other problems.