1 Introduction

A binary quadratic problem (BQP) is an optimization problem with binary variables, quadratic objective function and linear constraints. BQPs are NP-hard problems. A wide range of combinatorial optimization problems, including the quadratic assignment problem (QAP), the quadratic shortest path problem (QSPP), the graph partitioning problem, the max-cut problem, clustering problems, etc., can be modeled as a BQP.

We study here the linearization problem for binary quadratic problems. A binary quadratic optimization problem is said to be linearizable if there exists a corresponding cost vector such that the associated costs for both, quadratic and linear problems are equal for every feasible vector. The BQP linearization problem asks whether an instance of the BQP is linearizable. The linearization problem is studied in the context of many combinatorial optimization problems. Kabadi and Punnen (2011) give a necessary and sufficient condition for an instance of the quadratic assignment problem (QAP) to be linearizable, and develop a polynomial-time algorithm to solve the corresponding linearization problem. The linearization problem for the Koopmans-Beckmann QAP is studied in Punnen and Kabadi (2013). Linearizable special cases of the QAP are studied in Adams and Waddell (2014), Çela et al. (2016) and Punnen (2001). In Ćustić et al. (2017) it is shown that the linearization problem for the bilinear assignment problem can be solved in polynomial time. The linearization problem for the quadratic minimum spanning tree problem was considered by Ćustić and Punnen (2018). Punnen et al. (2017) provide necessary and sufficient conditions for which a cost matrix of the quadratic traveling salesman problem is linearizable. Hu and Sotirov (2018) provide a polynomial time algorithm that verifies if a QSPP instance on a directed grid graph is linearizable. The authors of Hu and Sotirov (2018) also present necessary conditions for a QSPP instance on complete digraphs to be linearizable. These conditions are also sufficient when the complete digraph has only four vertices.

There are very few studies concerning applications of the linearization problem. Punnen et al. (2019) show how to derive equivalent representations of a quadratic optimization problem by using linearizable matrices of the problem. They show that equivalent representations might result with different bounds for the optimization problem.

In this paper, we present several interesting applications of the linearization problem. We propose a new lower bounding scheme that uses a simple certificate for a quadratic function to be non-negative on the feasible set. The resulting bounds we call the linearization-based bounds. Each linearization-based bound requires a set of linearizable matrices as an input, and its quality depends on those matrices. To compute a particular linearization-based bound, one needs to solve one linear programming problem. The strongest linearization-based bound is the one that uses the full characterization of the set of linearizable matrices. Further, we show that bounds obtained from an iterative lower bounding strategy for BQPs, known as the Generalized Gilmore–Lawler (GGL) scheme, see e.g., Hahn and Grant (1998), Carraresi and Malucelli (1992), Rostami et al. (2018) and Rostami and Malucelli (2015) are also linearization-based bounds. Note that the well known Gilmore–Lawler bound is the first bound within the Generalized Gilmore–Lawler bounding scheme. Furthermore, we prove that one of the linearization-based bounds with a particular choice of linearizable matrices dominates the GGL bounds. The same linearization-based bound is equivalent to the first level RLT relaxation by Adams and Sherali (1990) for BQPs where upper bounds on the vector of variables are implied by the constraints. Here RLT stands for reformulation linearization technique. This result explains the relation between the Generalized Gilmore–Lawler bounds and the first level RLT bound, which was already observed in the context of the quadratic assignment problem (Frieze & Yadegar, 1983; Hahn et al., 1998) but not in general. Further, we extend the notion of linearizable matrices, which results in the extended linearization-based bounds.

Finally, we provide a polynomial-time algorithm for the linearization problem of the quadratic shortest path problem on directed acyclic graphs (DAGs). We solve the linearization problem for the QSPP on DAGs in \({{\mathcal {O}}}(nm^{3})\) time, where n is the number of vertices and m is the number of arcs in the given graph. Our algorithm also yields a characterization of the set of linearizable matrices, and thus provides the strongest linearization-based bound for the QSPP on DAGs.

The paper is organized as follows. In Sect. 2 and 3, we introduce the binary quadratic problem and its linearization problem, respectively. In Sect. 4, we show how to reformulate a binary quadratic minimization problem into an equivalent maximization problem that is suitable for deriving bounds. In Sect. 5, we introduce the linearization-based scheme. In Sect. 6.1, we show that the Generalized Gilmore–Lawler bounds are also linearization-based bounds. Section 6.2 relates different linearization-based bounds to the first level RLT bound, and Sect. 6.3 demonstrates the strength of the strongest lineariztion-based bound. In Sect. 6.4 introduced the extended linearization-based bounds. In Sect. 7, we present a polynomial-time algorithm that verifies whether a QSPP instance on a directed acyclic graph is linearizable. Conclusion and suggestions for further research are given in Sect. 8.

2 Binary quadratic problems

In this section, we introduce binary quadratic problems and their two special cases; the quadratic assignment problem and the quadratic shortest path problem.

Let K be the set of feasible binary vectors, i.e.,

$$\begin{aligned} K := \{ x \in {\mathbb {R}}^{m} \;|\; Bx=b, ~x \in \{0,1\}^{m} \}, \end{aligned}$$
(1)

where \(B\in {\mathbb {R}}^{n\times m}\) and \(b\in {\mathbb {R}}^{n}\). We are interested in binary quadratic problems of the form

$$\begin{aligned} \min _{x \in K} x^{\mathrm T}Qx, \end{aligned}$$
(2)

where \(Q \in {\mathbb {R}}^{m \times m}\) is the given quadratic cost matrix. Note that we allow here that Q has also negative elements. In the case that Q is a diagonal matrix i.e., \(Q=\mathrm{Diag}(c)\) the objective is linear and we have the following linear optimization problem:

$$\begin{aligned} \min _{x \in K} c^{\mathrm T}x. \end{aligned}$$
(3)

The simple model (2) is notable for representing a wide range of combinatorial optimization problems, including the quadratic assignment problem and the quadratic shortest path problem.

The quadratic assignment problem is one of the most difficult combinatorial optimization problems. It was introduced by Koopmans and Beckmann (1957). It is well-known that the QAP contains the traveling salesman problem as a special case and is therefore NP-hard in the strong sense. The QAP can be described as follows. Suppose that there are n facilities and n locations. The flow between each pair of facilities, say ik, and the distance between each pair of locations, say jl, are specified by \(a_{ik}\) and \(d_{jl}\), respectively. The problem is to assign all facilities to different locations with the goal of minimizing the sum of the distances multiplied by the corresponding flows. The quadratic assignment problem is given by:

$$\begin{aligned} \min \left\{ \sum _{i,j,k,l}a_{ik} d_{jl}x_{ij}x_{kl} : ~X=(x_{ij}), ~X\in \varPi _n \right\} , \end{aligned}$$

where \(\varPi _n\) is the set of \(n\times n \) permutation matrices. If \(A=(a_{ik})\), \(D=(d_{jl})\) and \(x = \text {vec}(X) \in {\mathbb {R}}^{n^{2}}\), then the objective can be written as \(x^{\mathrm T}(A \otimes D)x\). Here, the vec operator stacks the columns of the matrix X, and \(\otimes \) denotes the Kronecker product. The QAP is known as a generic model for various real-life problems, see e.g., Burkard et al. (2012).

The quadratic shortest path problem has a lot of important application in transportation problems, see Murakami and Kim (1997), Rostami et al. (2018), Sen et al. (2001) and Sivakumar and Batta (1994). The QSPP is an NP-hard optimization problem, see Hu and Sotirov (2018) and Rostami et al. (2018), and it can be described as follows. Let \(G = (V,A)\) be a directed graph with n vertices and m arcs, and s, t two distinguished vertices in G. A path is a sequence of distinct vertices \({(v_{1},\ldots ,v_{k})}\) such that \((v_{i},v_{i+1})\in A\) for \(i=1,\ldots ,k-1\). An s-t path is a path \({P =(v_{1},v_{2},\ldots ,v_{k})}\) from the source vertex \(s = v_{1}\) to the target vertex \(t = v_{k}\). Let the interaction cost between two distinct arcs e and f be \(2q_{ef}\), and the linear cost of an arc e be \(q_{e,e}\). The quadratic shortest path problem is given by:

$$\begin{aligned} \text {minimize} \big \{ \sum _{e,f \in A} q_{e,f}x_{e}x_{f} \;|\; x \in {\mathcal {P}} \big \}, \end{aligned}$$
(4)

where \({\mathcal {P}}\) be the set of characteristic vectors of all s-t paths in G.

3 The linearization problem of a BQP

We say that the binary quadratic optimization problem (2) is linearizable, if there exists a cost vector c such that \(x^{\mathrm T}Qx = c^{\mathrm T}x\) for every \(x \in K\). If such a cost vector c exists, then we call it a linearization vector of Q. The cost matrix Q is said to be linearizable if its linearization vector c exists. The linearization problem of a BQP asks whether Q is linearizable, and if yes, provide its linearization vector c.

If Q is linearizable, then (2) can be equivalently formulated as (3), and the latter could be much easier to solve. For instance, in the case of the quadratic assignment problem or the quadratic shortest path problem on directed acyclic graphs this boils down to solve a linear programming problem.

Let us define the spanning set of linearizable matrices for the given BQP.

Definition 1

Let \(\{ Q_{1},\ldots ,Q_{k} \}\) be a set of matrices such that a cost matrix Q is linearizable if and only if \(Q = \sum _{i=1}^{k}\alpha _{i}Q_{i}\) for some \(\alpha \in {\mathbb {R}}^{k}\). Then, we say that \(\{ Q_{1},\ldots ,Q_{k} \}\) span the set of linearizable matrices.

In Sect. 7, we show that the spanning set of linearizable matrices for the quadratic shortest path problem on directed acyclic graphs can be generated efficiently. Thus we have a complete characterization of the set of linearizable matrices for the QSPP on DAGs. This is also the case for the bilinear assignment problem, see Ćustić et al. (2017). In fact, the authors in Lendl et al. (2019) show that the set of linearizable cost matrices for combinatorial optimization problems with interaction costs can be characterized by the so-called constant value property, under certain conditions. For the list of non-trivial binary quadratic problems for which one can find the spanning set of linearizable matrices, see Lendl et al. (2019).

In general, it is not clear whether one can find a complete characterization of the set of linearizable matrices for a given BQP. However, it is not difficult to identify a subset of linearizable matrices. For instance, the sum matrix is often found to be linearizable. We say that a matrix \(M \in {\mathbb {R}}^{m\times n}\) is a sum matrix generated by vectors \(a \in {\mathbb {R}}^{m}\) and \(b \in {\mathbb {R}}^{n}\) if \(M_{i,j} = a_{i} + b_{j}\) for every \(i = 1,\ldots ,m\) and \(j= 1,\ldots ,n\). In the quadratic assignment problem, if A or D is a sum matrix, then the corresponding cost matrix is linearizable, see Burkard et al. (2012). In the quadratic shortest path problem, if every s-t path in the graph has the same length, then a sum-matrix Q is always linearizable, see Hu and Sotirov (2018) and Punnen et al. (2019).

In the latter case, the condition for a matrix to be linearizable depends on the problem structure. Since we are interested in a lower bounding scheme for general binary quadratic problems, we need a condition for linearizability that is independent of the problem. The next result provides an universal sufficient condition for a matrix being linearizable, and it is also the key connecting the linearization problem and some of the existing bounds in the literature.

Lemma 1

(Punnen et al. 2019) Consider the BQP (2). For any \(Y \in {\mathbb {R}}^{n \times m}, z \in {\mathbb {R}}^{m}\), the matrix \(Q = B^{\mathrm T}Y + {{\,\mathrm{Diag}\,}}(z) \in {\mathbb {R}}^{m \times m}\) is linearizable with linearization vector \(c = Y^{\mathrm T}b + z \in {\mathbb {R}}^{m}\).

Proof

For any \(x \in K\), see (1), we have \(x^{\mathrm T}Qx = x^{\mathrm T}(B^{\mathrm T}Y + {{\,\mathrm{Diag}\,}}(z))x = (b^{\mathrm T}Y + z^{\mathrm T})x = c^{\mathrm T}x\). \(\square \)

The next result shows that adding redundant equality constraints to the system \(Bx=b\) does not generate more linearizable matrices in Lemma 1.

Lemma 2

Consider the BQP (2). Assume the equation \(a^{\mathrm T}x = d\) is implied by the system \(Bx=b\), i.e., \(a = B^{\mathrm T}\alpha \in {\mathbb {R}}^{m}\) and \(d = b^{\mathrm T}\alpha \in {\mathbb {R}}\) for some \(\alpha \in {\mathbb {R}}^{n}\). For any \(Y \in {\mathbb {R}}^{n \times m}, y \in {\mathbb {R}}^{m}\), the matrix

$$\begin{aligned} Q= \left( \begin{matrix} B \\ a^{\mathrm T} \end{matrix} \right) ^{\mathrm T} \left( \begin{matrix} Y \\ y^{\mathrm T} \end{matrix} \right) \end{aligned}$$

is linearizable with linearization vector \(c = Y^{\mathrm T}b + dy \in {\mathbb {R}}^{m}\).

Proof

From

$$\begin{aligned} \left( \begin{matrix} B \\ a^{\mathrm T} \end{matrix} \right) ^{\mathrm T} \left( \begin{matrix} Y \\ y^{\mathrm T} \end{matrix} \right) = B^{\mathrm T}Y + ay^{\mathrm T} = B^{\mathrm T}Y + B^{\mathrm T}\alpha y^{\mathrm T} = B^{\mathrm T}(Y + \alpha y^{\mathrm T}) \in {\mathbb {R}}^{m \times m} \end{aligned}$$

it follows that \(B^{\mathrm T}(Y + \alpha y^{\mathrm T})\) is linearizable with linearization vector \(c = (Y + \alpha y^{\mathrm T})^{\mathrm T}b\). \(\square \)

A matrix \(M\in {\mathbb {R}}^{m \times m}\) is called a weak sum matrix if we can find vectors \(a, b\in {\mathbb {R}}^{m}\) such that \(M_{i,j} = a_{i} + b_{j}\) for every \(i\ne j\) and \(i,j = 1,\ldots ,m\). In particular, if M is a symmetric weak sum matrix, then we can assume \(a=b\) in the definition. If \(e^{\mathrm T}x = \sum _{i}x_{i}\) is a constant for every \(x \in K\), then every weak sum matrix in the corresponding BQP is linearizable, see Punnen et al. (2019). In such case, it follows from Lemma 2 that symmetric weak sum matrices can be represented as symmetrized matrices from Lemma 1. In particular, we have the following corollary.

Corollary 1

Consider the BQP (2). Assume \(Bx=b\) implies that \(e^{\mathrm T}x\) is a constant. Then every symmetric weak sum matrix M can be written as \(M = B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z)\) for some Y and z.

Proof

By assumption, we have \(e = B^{\mathrm T}\alpha \in {\mathbb {R}}^{m}\) for some \(\alpha \in {\mathbb {R}}^{n}\). Since M is a symmetric weak sum matrix, we can write \(M = ae^{\mathrm T} + ea^{\mathrm T} + {{\,\mathrm{Diag}\,}}(z)\in {\mathbb {R}}^{m \times m}\) for some vectors \(a\in {\mathbb {R}}^{m},z\in {\mathbb {R}}^{m}\) and all-ones vector \(e\in {\mathbb {R}}^{m}\). Let \(Y = \alpha a^{\mathrm T} \in {\mathbb {R}}^{n \times m}\). Then

$$\begin{aligned} B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z)&= B^{\mathrm T}\alpha a^{\mathrm T} + a\alpha ^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z) \\&= e a^{\mathrm T} + ae^{\mathrm T} + {{\,\mathrm{Diag}\,}}(z) = M. \end{aligned}$$

\(\square \)

Erdoğan and Tansel (2011) identify an additively decomposable class of costs for the quadratic assignment problem that provide a set of linearizable matrices for the QAP. We verified numerically that the linearizable matrices from Erdoğan and Tansel (2011) can be written as \(B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z)\) for some small instances with \(n \le 10\).

4 General bounding approaches

We present here an equivalent reformulation of the BQP (2) and list several possible bounding approaches for this reformulation. The new equivalent reformulation of the BQP is also the basis for deriving our bounding scheme in the next section.

Let \(Q_{1},\ldots ,Q_{k}\) be linearizable matrices for a given BQP with linearization vectors \(c_{1},\ldots ,c_{k}\), respectively. For example, linearizable matrices can be obtained from Lemma 1 for any binary quadratic problems, or from Proposition 5 for the quadratic shortest path problem. Define the linear operator \({\mathcal {A}}: {\mathbb {R}}^{k} \rightarrow {\mathbb {S}}^{m}\) by \({\mathcal {A}}(\alpha ) := \sum _{i=1}^{k}\alpha _{i}Q_{i}\) and \(C := \begin{bmatrix} c_{1},\ldots ,c_{k} \end{bmatrix} \in {\mathbb {R}}^{m \times k}\). Clearly, \(C\alpha \) is a linearization vector of the linearizable cost matrix \({\mathcal {A}}(\alpha )\), for any \(\alpha \in {\mathbb {R}}^{k}\).

Let \(f(x) = x^{\mathrm T}Qx\) and \(h_{\alpha ,\beta }(x) := x^{\mathrm T}{\mathcal {A}}(\alpha )x + \beta \), where \(\alpha \in {\mathbb {R}}^{k}\), \(\beta \in {\mathbb {R}}\). Let us reformulate the binary quadratic optimization problem (2) equivalently as

$$\begin{aligned} \begin{array}{ll} \sup _{\alpha ,\beta } &{}\quad \min _{x \in K} h_{\alpha ,\beta }(x) \\ {{\mathrm{s.t.}}} &{}\quad f -h_{\alpha ,\beta } > 0, \;\; \forall x \in K . \end{array} \end{aligned}$$
(5)

Theorem 1

The binary quadratic program (2) is equivalent to (5).

Proof

Let \(x^{*}\) be an optimal solution of (2) with optimal value \(f^{*}\). If \(\alpha = 0\) and \(\beta = f^{*} - \epsilon \) for \(\epsilon > 0\), then they are feasible for (5) as \(f -h_{\alpha ,\beta } > 0\) for \(x \in K\). As \(f^{*} - \epsilon = \min _{x \in K} h_{\alpha ,\beta }(x)\), taking \(\epsilon \rightarrow 0\), we conclude that the optimal value of (5) is at least \(f^{*}\). Conversely, \(f(x) -h_{\alpha ,\beta }(x) > 0\) on K implies that \(\min _{x \in K} h_{\alpha ,\beta }(x)\) is at most \(f^{*}\). This shows the equivalence between (2) and (5). \(\square \)

We note that Theorem 1 holds for any choice of the linearizable matrices in the definition of \({\mathcal {A}}\). Let

$$\begin{aligned} {\bar{K}}=\{ x\ge 0: {B}x={b}\}. \end{aligned}$$
(6)

From now on we assume w.l.g. that \({B}x={b}\) includes also one as an upper bound on \(x_i\) for all i. For the inner minimization problem of (5) we have:

$$\begin{aligned} \underset{x \in {\mathbb {R}}^{m}}{\min }\{ h_{\alpha ,\beta }(x) \;|\; x \in K \}&= \underset{x \in {\mathbb {R}}^{m}}{\min } \{ (C\alpha )^{\mathrm T}x + \beta \;|\; x \in K \} \nonumber \\&\ge \underset{x \in {\mathbb {R}}^{m}}{\min } \{ (C\alpha )^{\mathrm T}x + \beta \;|\; x\in {\bar{K}} \} \\&= \underset{y\in {\mathbb {R}}^{n}}{\max } \{ b^{\mathrm T}y + \beta \;|\; B^{\mathrm T}y \le C\alpha \}. \nonumber \end{aligned}$$
(7)

Here, the first equality exploits the linearizability of \({\mathcal {A}}(\alpha )\), and the last equality follows from strong duality of linear programming. In the case that K is the convex hull of the feasible integer points, like in the case of the linear assignment problem and the shortest path problem on directed acyclic graphs, the above inequality turns to be equality. However the inequality in (7) can be strict in general. Nevertheless, the above leads to the following optimization problem:

$$\begin{aligned} \begin{array}{ll} \sup _{\alpha ,\beta ,y} &{}\quad b^{\mathrm T}y + \beta \\ {\mathrm{s.t.}} &{}\quad f -h_{\alpha ,\beta } > 0, \;\; \forall x \in K \\ &{}\quad B^{\mathrm T}y \le C\alpha , \end{array} \end{aligned}$$
(8)

that can be exploited to obtain bounds for the BQP (2). Moreover, several approaches may be used to compute bounds for the BQP (2) by exploiting (8). For instance, one can relax the condition \(f - h_{\alpha ,\beta }\ge 0\) on K by a sum of squares (SOS) decomposition of \(f -h_{\alpha ,\beta }\). In, for example, following papers (Lasserre, 2001; Nie & Schweighofer, 2007; Laurent, 2009), the authors do not consider the linearization problem and construct hierarchies of approximations based on sum of squares decompositions for the following optimization problem:

$$\begin{aligned} \sup \{ \beta \;|\; f(x)-\beta \ge 0, \; \forall x \in K \}, \end{aligned}$$

where f(x) is a multivariate polynomial and K the closed semialgebraic set.

Buchheim and Traversi (2018) propose a semidefinite programming relaxation for binary quadratic programs. Their approach can be viewed as a special case of (5), where \({\mathcal {A}}(\alpha )\) is a diagonal matrix, and \(Q-{\mathcal {A}}(\alpha ) \succeq 0\) is the positivity certificate. Alternatively, one can use a simple positivity certificate to derive linear programming lower bounds like we do in the next section.

5 The linearization-based scheme

In this section, we consider a simple but efficient positivity certificate using the fact that both f and \(h_{\alpha ,\beta }\) are quadratic functions. This yields an efficient lower bounding scheme. Here, we use the same notation as in the previous section, namely \(f(x) = x^{\mathrm T}Qx\) and \(h_{\alpha ,\beta }(x) = x^{\mathrm T}{\mathcal {A}}(\alpha )x + \beta \).

Note that if \({\mathcal {A}}(\alpha )\le Q\) and \(\beta \le 0\), then \(f-h_{\alpha ,\beta } \ge 0\) for all \(x\in K\). This leads to the following relaxation for the BQP (2):

$$\begin{aligned} \begin{array}{ll}&\underset{\alpha ,y}{\max } \{ b^{\mathrm T}y \;|\; B^{\mathrm T}y \le C\alpha , \; {\mathcal {A}}(\alpha )\le Q\}, \end{array} \end{aligned}$$
(9)

where we have removed the redundant scalar variable \(\beta \). We call relaxation (9) the linearization-based relaxation. Thus, a linearization-based bound (LBB) is a solution of one linear programming problem. However, the quality of so obtained bound depends on the choice of \({\mathcal {A}}(\alpha )\). For example, one can take linearizable matrices from Lemma 1 for any BQP. de Meijer and Sotirov (2020) choose a particular linearizable matrix for the quadratic cycle problem, which enables them to compute strong bounds fast.

Billionnet et al. (2009) provide a method to reformulate a binary quadratic program into an equivalent binary quadratic program with a convex quadratic objective function. Their approach results with the tightest convex bound. However the authors from Billionnet et al. (2009) need to solve a semidefinite programming relaxation in order to compute their bound, which makes their approach computationally more expensive than the approach we present here.

Next, we consider reformulations of the BQP and their influence to the linearization-based bounds. For any skew-symmetric matrix S and vector d, we can reformulate the objective function of (2) as \(f(x) = x^{\mathrm T}(Q+S+{{\,\mathrm{Diag}\,}}(d))x - d^{\mathrm T}x\) using the fact that \(x\in K\) is binary. This follows from the fact that \(x^{\mathrm T}Sx = 0\) and \(x_{i}^{2}=x_{i}\), see also Theorem 2.4 in Punnen et al. (2019). Thus, we obtain an equivalent representation of (2). Since we have an extra linear term \(d^{\mathrm T}x\), we underestimate f(x) by \(h_{\alpha ,\gamma }(x) := x^{\mathrm T}{\mathcal {A}}(\alpha )x + \gamma ^{\mathrm T}x\). So both f(x) and \(h_{\alpha ,\gamma }(x)\) have an extra linear term now. Then \(x^{\mathrm T}{\mathcal {A}}(\alpha )x + \gamma ^{\mathrm T}x \le f(x)\) if \({\mathcal {A}}(\alpha ) \le Q+S+{{\,\mathrm{Diag}\,}}(d)\) and \(\gamma \le - d\). Under this setting, we can derive a bound in the same way as in (9). In particular, we have

$$\begin{aligned}&\max _{\alpha ,\gamma } \left\{ \min _{x \in K} (C\alpha + \gamma )^{\mathrm T}x \;|\; {\mathcal {A}}(\alpha ) \le Q+S+{{\,\mathrm{Diag}\,}}(d),\; \gamma \le - d \right\} \nonumber \\&\quad = \max _{\alpha ,\gamma ,y} \left\{ b^{\mathrm T}y \;|\; B^{\mathrm T}y \le C\alpha + \gamma , \; {\mathcal {A}}(\alpha ) \le Q+S+{{\,\mathrm{Diag}\,}}(d), \; \gamma \le - d \right\} . \end{aligned}$$
(10)

Below we show that the bound (10), is invariant under a reformulation of the objective function when \({\mathcal {A}}(\alpha )\) is in the special form, i.e., contains every skew-symmetric matrix. Note that skew-symmetric matrices are linearizable.

Proposition 1

Assume that for any skew-symmetric matrix \(S \in {\mathbb {R}}^{m \times m}\) and vector \(d \in {\mathbb {R}}^{m}\) there exists \(\alpha \in {\mathbb {R}}^{k}\) such that \({\mathcal {A}}(\alpha ) = S + {{\,\mathrm{Diag}\,}}(d)\) in (10). For the binary quadratic optimization problem

$$\begin{aligned} \min _{x \in K} x^{\mathrm T}(Q+S+{{\,\mathrm{Diag}\,}}(d))x - d^{\mathrm T}x, \end{aligned}$$
(11)

the bound (10) does not depend on the choice of the skew-symmetric matrix S or the vector d.

Proof

Let \(S:=S_1\) and \(d:=d_1\) in the BQP (11), and \((\alpha ^{*},\gamma ^{*},y^{*})\) be a feasible solution of (10), whose objective value \(f^{*}\) is given by

$$\begin{aligned} \begin{array}{ll} f^{*} = \underset{y}{\max } \{ b^{\mathrm T}y \;|\; B^{\mathrm T}y \le C\alpha ^{*} + \gamma ^{*} \}. \end{array} \end{aligned}$$
(12)

Let \(S:=S_{2}\) and \(d:=d_{2}\) in the BQP (11). We now construct a feasible solution of (10) for (11) having the objective value \(f^{*}\). By assumption, we can find \({\hat{\alpha }} \in {\mathbb {R}}^{k}\) such that \({\mathcal {A}}({\hat{\alpha }}) = S_{2}-S_{1} + {{\,\mathrm{Diag}\,}}(d_{2}-d_{1})\). Let \(\alpha ^{**} = \alpha ^{*}+{\hat{\alpha }}\) and \(\gamma ^{**} = \gamma ^{*}-d_{2}+d_{1}\). Note that \(C\alpha ^{**} = C\alpha ^{*} + d_{2} - d_{1}\), and thus \(C\alpha ^{**} +\gamma ^{**} = C\alpha ^{*} +\gamma ^{*}\). Here we assume w.l.g. that the linearization vector of a diagonal matrix \({{\,\mathrm{Diag}\,}}(d)\) is given by d. To see that \((\alpha ^{**},\gamma ^{**},y^{*})\) is a feasible solution of (10), we check

$$\begin{aligned} \begin{aligned} B^{\mathrm T}y^{*}&\le C\alpha ^{*} + \gamma ^{*} = C\alpha ^{**} + \gamma ^{**},\\ {\mathcal {A}}(\alpha ^{**})&={\mathcal {A}}(\alpha ^{*}) + {\mathcal {A}}({\hat{\alpha }}) \le Q+S_{2}+{{\,\mathrm{Diag}\,}}(d_{2}), \;\\ \gamma ^{**}&= \gamma ^{*}-d_{2}+d_{1} \le - d_{2}. \end{aligned} \end{aligned}$$

Furthermore, the objective value of this solution is clearly \(f^{*} = b^{\mathrm T}y^{*}\). \(\square \)

We remark here that in the case that \({\mathcal {A}}(\alpha )\) does not contain skew-symmetric matrices, the linearization-based bound (10) depends on S. Note that in Proposition 1 we do not specify a non-skew-symmetric part of \({\mathcal {A}}({\alpha })\).

Let us now restrict to \(B^{\mathrm T}Y +{{\,\mathrm{Diag}\,}}(z)\) where \(Y \in {\mathbb {R}}^{n \times m}\) and \(z \in {\mathbb {R}}^{m}\), see Lemma 1. Now, for any skew-symmetric \(S \in {\mathbb {R}}^{m \times m}\) we have that \(B^{\mathrm T}Y + S+ {{\,\mathrm{Diag}\,}}(z)\) is linearizable with linearization vector \(Y^{\mathrm T}b + z\). This yields the following linearization-based relaxation:

$$\begin{aligned} \underset{Y,S,z,y}{\max } \{ b^{\mathrm T}y \;|\; B^{\mathrm T}y \le Y^{\mathrm T}b + z, \; B^{\mathrm T}Y + S+ {{\,\mathrm{Diag}\,}}(z) \le Q, \; S+S^{\mathrm T} = 0 \}. \end{aligned}$$
(13)

This relaxation satisfies the assumption of Proposition 1. Thus, we may assume without loss of generality that Q is symmetric for (13). By applying Proposition 1, the next result shows that the variable S can be eliminated from (13) to obtain an equivalent relaxation with less variables and constraints.

Proposition 2

Assume Q is symmetric. The relaxation (13) is equivalent to

$$\begin{aligned} \underset{Y,z,y}{\max } \{ b^{\mathrm T}y \;|\; B^{\mathrm T}y \le 2Y^{\mathrm T}b + z, \; B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z) \le Q \}. \end{aligned}$$
(14)

Proof

Let (YSzy) be a feasible solution of (13). Then \(\frac{1}{2}(B^{\mathrm T}Y + Y^{\mathrm T}B) + {{\,\mathrm{Diag}\,}}(z) \le Q\) as Q is symmetric, and thus \((\frac{1}{2}Y,z,y)\) is feasible for (14). Conversely, let (Yzy) be feasible for (14). Let \(S= Y^{\mathrm T}B -B^{\mathrm T}Y\). Then S is skew-symmetric, and

$$\begin{aligned} 2B^{\mathrm T}Y + S + {{\,\mathrm{Diag}\,}}(z) = B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z)\le Q. \end{aligned}$$

Therefore (2YSzy) is a feasible solution for (13). \(\square \)

In what follows, we consider the following two cases. In both cases, we can assume Q is symmetric without loss of generality.

  • We replace \({\mathcal {A}}(\alpha )\) by \(B^{\mathrm T}Y + Y^{\mathrm T}B+ {{\,\mathrm{Diag}\,}}(z)\), and \(C\alpha \) by \(2Y^{\mathrm T}b + z\) in (9). Thus, we have the linearization-based relaxation (14). The obtained lower bound, denoted by \(v_{LBB'}\), is the optimal solution of (14).

  • If we know the spanning set of linearizable matrices \(\{Q_{1},\ldots ,Q_{k}\}\), then we denote by \(v_{LBB^{*}}\) the corresponding linearization-based bound.

6 The LBB and related bounds

In Sect. 6.1 we present the Generalized Gilmore–Lawler bounding scheme, that is a well known iterative lower bounding scheme for binary quadratic problems. We show that the GGL bounds are dominated by our linearization-based bound \(v_{LBB'}\). In Sect. 6.2, we compare our linearization-based bounds with the bounds obtained from the first level RLT relaxation proposed by Adams and Sherali (1986, 1990). In Sect. 6.3 we show the strength of our strongest linearization-based bound, and in Sect. 6.4 we introduce extended linearization based bounds.

6.1 The generalized Gilmore–Lawler bounding scheme

The Generalized Gilmore–Lawler bounding scheme is implemented for many optimization problems, including the quadratic assignment problem (Hahn & Grant, 1998; Carraresi & Malucelli, 1992), the quadratic shortest path problem (Rostami et al. 2018), the quadratic minimum spanning tree problem (Rostami and Malucelli 2015).

Let \(Q = [ {q}_{1},\ldots , {q}_{m}] \in {\mathbb {R}}^{m \times m}\) be the given quadratic cost matrix, see (2). Denote by \(I_{k}\) the k-th column of the identity matrix of size m. Let \({\bar{y}}_k \in {\mathbb {R}}^{n}\) and \({\bar{z}}_{k} \in {\mathbb {R}}\) be an optimal solution of the following linear program:

$$\begin{aligned} \underset{{y}_{k} \in {\mathbb {R}}^{n}, z_{k} \in {\mathbb {R}}}{\max }&\{ \; b^{\mathrm T} {y}_{k} + z_{k} \;|\; B^{\mathrm T} {y}_{k} + I_{k}z_{k}\le {q}_{k} \; \}, \end{aligned}$$
(15)

for each \(k = 1,\ldots ,m\). Collect all \({\bar{y}}_{k}\) in matrix \({\bar{Y}} \in {\mathbb {R}}^{n \times m}\), and all \({\bar{z}}_{k}\) in vector \({\bar{z}} \in {\mathbb {R}}^{m}\). Define

$$\begin{aligned} \begin{aligned} {\bar{c}}&= {\bar{Y}}^{\mathrm T}b + {\bar{z}} \in {\mathbb {R}}^{m}, \\ {\bar{Q}}&= B^{\mathrm T}{\bar{Y}} + {{\,\mathrm{Diag}\,}}({\bar{z}}) \in {\mathbb {R}}^{m \times m}. \end{aligned} \end{aligned}$$
(16)

From Lemma 1, we know that \({\bar{c}}\) is a linearization vector of \({\bar{Q}}\). The feasibility of (15) implies that \({\bar{Q}} \le Q\), and thus \(\min _{x \in {\bar{K}}} {\bar{c}}^{\mathrm T}x\), where \({\bar{K}}\) is given in (6) is a lower bound for the binary quadratic program (2). Moreover, this bound is known as the Gilmore–Lawler (GL) type bound and it is implemented for many BQP problems including the QAP and the QSPP. The GL bound was originally introduced for the QAP, see Gilmore (1962) and Lawler (1963).

Let us call the dual-update of Q the following update of the objective \(Q\leftarrow Q - {\bar{Q}}\). The dual-update can be applied iteratively followed by some equivalent representation of the objective in order to obtain an increasing sequence of lower bounds. This iterative bounding scheme is known as the Generalized Gilmore–Lawler bounding scheme, and it is an important lower bounding strategy for binary quadratic problems. We describe this bounding scheme below.

figure a

Note that the skew-symmetric matrix \(S_{i+1}\) in the algorithm yields an equivalent representation. For example, Frieze and Yadegar (1983) pick a skew-symmetric matrix S such that \(Q+S\) is upper triangular, while Rostami et al. (2018) keep \(Q+S\) symmetric. More sophisticated reformulation can also be found by exploiting the problem structure, see Hahn and Grant (1998) and de Meijer and Sotirov (2020). Bounds based on the dual-update are very competitive, see also Carraresi and Malucelli (1992), Assad and Xu (1985) and Hahn et al. (1998). However, the quality of bounds depends on the choice of the skew-symmetric matrix.

The key observation here is that each iteration of the Generalized Gilmore–Lawler bounding strategy is based on maximizing \({\bar{c}}_{k}\) over \(y_{k}\) and \(z_{k}\) in a ‘local’ way, i.e., solving m linear programs (15) independently. Then, followed by the dual update, the quadratic cost matrix is reshuffled by some skew-symmetric matrix, and the procedure is repeated. Instead, the linearization-based bound \(v_{LBB'}\), see (14), is obtained by maximizing \(\min _{x \in {\bar{K}} } {\bar{c}}^{\mathrm T}x\) over Y, S and z in a “global" way. This means, \(v_{LBB'}\) is optimal in terms of the Generalized Gilmore–Lawler bound. The following proof, where we show that \(v_{LBB'}\) is stronger than the Generalized Gilmore–Lawler bound \(v_{GGL}\), makes this observation precise.

Theorem 2

\(v_{GGL} \le v_{LBB'}\).

Proof

Let \({\bar{Q}}_{i} = B^{\mathrm T}{\bar{Y}}_{i} + {{\,\mathrm{Diag}\,}}({\bar{z}}_{i})\) \((i=0,\ldots ,k-1)\) be the sequence of linearizable matrices from Algorithm 1. Let \(S_{0}\) be the matrix of zeros. For convenience, denote by \(Q_{1},\ldots ,Q_{k-1}\) the cost matrix obtained right after line six in Algorithm 1. Thus, \(Q_{i} = Q_{i-1} +S_{i-1} - {\bar{Q}}_{i-1}\) in this notation. Define function \(g(M) = (M+M^{\mathrm T})/2\). Then g(M) is a zero matrix whenever M is skew-symmetric.

From the feasibility of (15), it holds that \(Q_{k} = Q_{k-1}+S_{k-1} - {\bar{Q}}_{k-1} \ge 0\) and thus \(g(Q_{k}) \ge 0\). As \(Q_{k} = Q_{k-1} + S_{k-1} - {\bar{Q}}_{k-1} = Q_{0} + \sum _{i=0}^{k-1} (S_{i} - {\bar{Q}}_{i})\), we have

$$\begin{aligned} \begin{aligned} 0 \le g(Q_{k}) = g \big ( Q_{0} + \sum \limits _{i=0}^{k-1} (S_{i} - {\bar{Q}}_{i}) \big ) = g(Q_{0})- g\left( \sum \limits _{i=0}^{k-1}{\bar{Q}}_{i}\right) , \end{aligned} \end{aligned}$$

from where it follows \(g\big (\sum _{i=0}^{k-1}{\bar{Q}}_{i} \big ) \le g(Q_{0})\).

Let \({\tilde{Y}} := \frac{1}{2}\sum _{i=0}^{k-1}{\bar{Y}}_{i}\) and \({\tilde{z}} := \sum _{i=0}^{k-1}{\bar{z}}_{i}\). It is easy to see that

$$\begin{aligned} \begin{array}{ccc} g \left( \sum \limits _{i=0}^{k-1}{\bar{Q}}_{i} \right) = g \left( B^{\mathrm {T}}\sum \limits _{i=0}^{k-1}{\bar{Y}}_{i} +{{\,\mathrm{Diag}\,}}\left( \sum \limits _{i=0}^{k-1}{\bar{z}}_{i}\right) \right) = B^{\mathrm T}{\tilde{Y}}+{\tilde{Y}}^{\mathrm T}B+{{\,\mathrm{Diag}\,}}({\tilde{z}}). \end{array} \end{aligned}$$

Therefore \(B^{\mathrm T}{\tilde{Y}}+{\tilde{Y}}^{\mathrm T}B+{{\,\mathrm{Diag}\,}}({\tilde{z}}) \le g(Q)\). Note that \(Q = g(Q)\) in \(v_{LBB'}\). Since \(c = \sum _{i=0}^{k-1}{\bar{c}}_{i} = 2{\tilde{Y}}^{\mathrm T}b + {\tilde{z}}\), it holds that

$$\begin{aligned} v_{LBB'} \ge \underset{y}{\max } \{ b^{\mathrm T}y \;|\; B^{\mathrm T}y \le 2{\tilde{Y}}^{\mathrm T}b + {\tilde{z}} \} = \min _{x \in {\bar{K}} } c^{\mathrm T}x = v_{GGL}. \end{aligned}$$

\(\square \)

In the next section we relate \(v_{LBB'}\) and the bound obtained from the first level RLT.

6.2 The first level RLT bound

The reformulation linearization technique proposed by Adams and Sherali (1986, 1990) generates a hierarchy of linear programming relaxations for binary quadratic programs. It has been substantiated that this hierarchy generates tight relaxations even at the first level in many applications.

We show here that the linearization-based bound \(v_{LBB'}\) coincides with the first level RLT bound for optimization problems where the constraint \(x \le e\) is redundant for \({\bar{K}}=\{ x\ge 0:~Bx=b \}\), see Lemma 3. If this is not the case, we establish a relation between those two bounds in Lemma 4.

The first level RLT relaxation for the binary quadratic problem (2) is given as follows:

$$\begin{aligned} \begin{array}{ll} v_{RLT_1}:= \underset{x \in {\mathbb {R}}^{m}, X \in {\mathcal {S}}^{m} }{\min } &{}\quad \langle Q,X \rangle \\ &{}\quad (x,X) \in {\mathcal {F}} \\ &{}\quad e - x \ge 0 \\ &{}\quad J - xe^{\mathrm T} - ex^{\mathrm T} + X \ge 0 \\ &{}\quad xe^{\mathrm T} - X \ge 0, \end{array} \end{aligned}$$
(17)

where J is all-ones matrix, \({\mathcal {S}}^{m}\) denotes the set of symmetric matrices of order m, and

$$\begin{aligned} {\mathcal {F}}&: = \big \{ (x,X) \in ({\mathbb {R}}^{m},{\mathcal {S}}^{m}) \;|\; Bx=b ,\; BX = bx^{\mathrm T}, \; x = {{\,\mathrm{diag}\,}}(X), \end{aligned}$$
(18)
$$\begin{aligned}&\quad x \ge 0, \; X\ge 0 \big \}. \end{aligned}$$
(19)

Here the ‘diag’ operator maps an \(m\times m \) matrix to the m-vector given by its diagonal.

Depending on the specific problem structure the constraint \(x\le e\) can be omitted without affecting its continuous relaxation. For instance, this is the case when the BQP under consideration is the QAP, the QSPP on directed acyclic graphs, the quadratic cycle cover problem, the quadratic minimum spanning tree problem, etc. In such cases, the first level RLT relaxation is given by

$$\begin{aligned} \begin{array}{cl} v_{RLT_1'} := \underset{x \in {\mathbb {R}}^{m}, X \in {\mathcal {S}}^{m} }{\min } &{}\quad \langle Q,X \rangle \\ \qquad {\text {s.t. }} &{}\quad (x,X) \in {\mathcal {F}}. \end{array} \end{aligned}$$
(20)

In general, the bound \(v_{RLT_1'}\) is weaker than \(v_{RLT_1}\). The next result shows that the linearization-based bound \(v_{LBB'}\) is equivalent to \(RLT_1'\).

Lemma 3

Let \(x \le e\) be redundant for \({\bar{K}}\), then \(v_{LBB'} =v_{RLT_1'}=v_{RLT_1}\).

Proof

The proof follows directly from the dual of (14). The Lagrangian function of (14) is given by

$$\begin{aligned} \begin{array}{ll} {{\mathcal {L}}}(Y,z,y,x,X) &{}= b^{\mathrm T}y + \langle Q - B^{\mathrm T}Y - Y^{\mathrm T}B - {{\,\mathrm{Diag}\,}}(z), X \rangle + \langle 2Y^{\mathrm T}b + z - B^{\mathrm T}y, x \rangle \\ &{}= \langle Q,X \rangle + \langle b - Bx, y \rangle + \langle 2bx^{\mathrm T} -BX - BX^{\mathrm T}, Y \rangle \\ &{}\quad + \langle x - {{\,\mathrm{diag}\,}}(X),z \rangle , \end{array} \end{aligned}$$

where \(x \in {\mathbb {R}}^{m}\) and \(X \in {\mathcal {S}}^{m}\). Thus the dual of (14) is

$$\begin{aligned} \begin{aligned} \underset{x \in {\mathbb {R}}^{m}, X \in {\mathcal {S}}^{m} }{\min } \{ \langle Q,X \rangle \;|\; Bx=b ,\; x \ge 0 ,\; BX = bx^{\mathrm T}, X\ge 0, \; x = {{\,\mathrm{diag}\,}}(X) \}. \end{aligned} \end{aligned}$$
(21)

As we assumed Q is symmetric for the linearization-based bound \(v_{LBB'}\), and thus the dual program of (14) is exactly the same as the RLT relaxation (20). The equality \(v_{RLT_1'}=v_{RLT_1}\) follows from e.g., Adams and Sherali (1986). \(\square \)

Remark 1

Note that we can strengthen the linearization-based bound \(v_{LBB'}\) by exploiting the sparsity pattern of the binary quadratic program. Let \({\mathcal {G}} = \{ (i,j) \;|\; x_{i}x_{j} =0 \; \forall x \in K \}\). Then, \({\mathcal {A}}(\alpha ) \le Q\) can be equivalently replaced by \(\big ({\mathcal {A}}(\alpha )\big )_{ij} \le Q_{ij}\) for every \((i,j) \notin {\mathcal {G}}\). For Lemma 3, this implies that the dual variable X in (21) has to satisfy \(X_{ij} = 0\) for every \((i,j) \in {\mathcal {G}}\). Thus, so strengthened linearization-based bound is equivalent to the first level RLT relaxation (20) with extra sparsity constraints \(X_{ij} = 0\) for every \((i,j) \in {\mathcal {G}}\).

Theorem 2 proves that the Generalized Gilmore–Lawler bound \(v_{GGL}\) is bounded by the first level RLT relaxation for a BQP when the upper bound on x is redundant. A relation between these two bounds was studied only in the context of the quadratic assignment problem. In particular, Frieze and Yadegar (1983) show that the Gilmore–Lawler bound with a particular decomposition is weaker than the Lagrangian relaxation of their well known mixed-integer linear programming formulation for the QAP. On the other hand, the linear programming relaxation of the mentioned mixed-integer linear programming formulation is known to be dominated by the first level RLT relaxation, see Adams and Johnson (1994). Adams and Johnson (1994) also show that the QAP bounds from Carraresi and Malucelli (1992) and Assad and Xu (1985) are dominated by the first level QAP-RTL bound. However, we show here more since our proof is not restrict to a particular skew-symmetric matrix.

Note that if \(x\le e\) are not redundant for \({\bar{K}}\), we have that \(v_{LBB'}\) dominates \(v_{RLT_1'}\). In particular we have the following result.

Lemma 4

Suppose that \(x \le e\) is not redundant for \({\bar{K}}\), then

$$\begin{aligned} v_{RLT_1'} \le v_{LBB'} \le v_{RLT_1}. \end{aligned}$$

6.3 The \(\hbox {LBB}^{*}\) bound

The linearization-based bound \(v_{LBB^{*}}\) can be viewed as a strengthened \(v_{LBB'}\) bound. We show here by an example that \(v_{LBB^{*}}\) may be stronger than \(v_{RLT_1}\).

Let \(Q_{1},\ldots ,Q_{k}\) be linearizable matrices for a given BQP with linearization vectors \(c_{1},\ldots ,c_{k}\), respectively. In general, those matrices are not of the form given in Lemma 1. Combining linearizable matrices \(Q_{1},\ldots ,Q_{k}\) together with the symmetrized linearizable matrix from Lemma 1, we obtain the following linear programming relaxation:

$$\begin{aligned} \begin{aligned}&\underset{Y,\alpha ,y}{\max } \big \{ b^{\mathrm T}y \;|\; B^{\mathrm T}Y + Y^{\mathrm T}B+ {{\,\mathrm{Diag}\,}}(z) + \sum _{i=1}^{k} \alpha _{i} Q_{i} \le Q,\; \\&\quad B^{\mathrm T}y \le 2Y^{\mathrm T}b + z +C\alpha \big \}, \end{aligned} \end{aligned}$$
(22)

whose dual is given by

$$\begin{aligned} \begin{array}{cl} \underset{x \in {\mathbb {R}}^{m}, X \in {\mathcal {S}}^{m} }{\min } &{}\quad \langle Q,X \rangle \\ \text { s.t. }&{}\quad (x,X) \in {\mathcal {F}} \\ &{}\quad \langle Q_{i}, X \rangle = \langle c_{i} , x \rangle \quad \text { for } i = 1,\ldots ,k. \end{array} \end{aligned}$$
(23)

where \({\mathcal {F}}\) is given in (18). Thus, the linear program (23) is a strengthened relaxation (20) with additional constraints of the form \(\langle Q_{i}, X \rangle = \langle c_{i} , x \rangle \). Indeed, if \(Q_{i}\) is linearizable with linearization vector \(c_i\), then \(x^{\mathrm {T}}Q_{i}x = c_{i}^{\mathrm T}x\) for every s-t path x. This yields a valid constraint \(\langle Q_{i}, X \rangle = \langle c_{i}, x \rangle \). In particular, when the set of matrices \(\{ Q_{1},\ldots ,Q_{k}\}\) span the set of linearizable matrices, the optimal solution of relaxation (23) is just \(v_{LBB^{*}}\).

Recall that the first iteration of the GGL is simply the Gilmore–Lawler type bound, denoted by \(v_{GL}\). To the best of our knowledge, it was only known that \(v_{GL}\le v_{GGL} \le v_{RLT_1'}= v_{RLT_1}\) for the QAP. Moreover, \(v_{GGL}\) for the QAP was studied only for particular cases, not in general. We summarize below relations between several of the mentioned bounds for any BQP.

Proposition 3

  1. (a)

    Let \(x \le e\) be redundant for \({\bar{K}}\), then

    $$\begin{aligned} v_{GL} \le v_{GGL} \le v_{LBB'} =v_{RLT_1'} = v_{RLT_1} \le v_{LBB^{*}}. \end{aligned}$$
  2. (b)

    Let \(x \le e\) be not redundant for \({\bar{K}}\), then

    $$\begin{aligned} v_{RLT_1'} \le v_{LBB'} \le v_{LBB^{*}}. \end{aligned}$$

Proof

Part (a) follows from Theorem 2, Lemma 3 and construction of \(v_{LBB^{*}}\). Part (b) follows from Lemma 4 and construction of \(v_{LBB^{*}}\). \(\square \)

It should be clear that for case (a) in the above proposition we have \(v_{RLT_1} = v_{LBB^{*}}\) whenever the spanning set of linearizable matrices can be characterized by the linearizable matrices of the form \(B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z)\). Unfortunately we were not able to prove that the inequality \(v_{RLT_{1}} \le v_{LBB^{*}}\) is strict when \(x \le e\) is redundant for \({\bar{K}}\). On the other hand, we found examples when those two bounds are equal. By using Proposition 5 we computed the spanning set of linearizable matrices for the QSPP on tournament graphs, GRID1, GRID2 and PAR-K graphs up to certain size. We refer to Hu and Sotirov (2020) for the definition of these acyclic graphs. We have also computed the spanning set of linearizable matrices for the QAP for \(n \le 9\) by brute-force search. It turns out that the spanning sets of all mentioned instances can be expressed as \(B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z)\). This indicates that the set of (symmetrized) linearizable matrices obtained from Lemma 1 is considerably rich.

Note that for case (b) in Proposition 3, we do not compare \(v_{LBB^{*}}\) and \(v_{RLT_1}\), as their relation is not clear in general. However, our example below shows that there are instances for which \(v_{RLT_1} < v_{LBB^{*}}\).

Example 1

Consider the quadratic shortest path problem on the complete symmetric digraph \(K^*_n\), that is a digraph in which every pair of vertices is connected by a bidirectional edge. Assume the incoming arcs to s and outgoing arcs from t are removed. We can find the spanning set of linearizable matrices as follows. If Q is linearizable with linearization vector c, then \(\langle xx^{T},Q - {{\,\mathrm{Diag}\,}}(c) \rangle = 0\) for every s-t path x. By enumerating all s-t paths, we have a linear system whose null space represents a set of linearizable matrices. That set, together with the set of diagonal matrices \(\{{{\,\mathrm{Diag}\,}}(e_{k}) \;|\; k = 1,\ldots ,m \}\), provides the spanning set of linearizable matrices for \(K^*_n\). Here, \(e_k\) denotes column k of the identity matrix.

For \(n = 5\), there are 85 linearly independent linearizable matrices in the spanning set. However there are only 59 linearly independent linearizable matrices from the set \(\{ B^{\mathrm T}(e_ie_j^{\mathrm T}) + (e_je_i^{\mathrm T})B + {{\,\mathrm{Diag}\,}}(e_{k}) \;|\; i = 1,\ldots ,n \text { and } j,k = 1,\ldots ,m \}\), see Lemma 1. Here B is the incidence matrix of the complete symmetric digraph with five vertices.

Now, we use these linearizable matrices as the cost matrix Q in the QSPP. For each linearizable matrix, we have \(v_{LBB^{*}} = 0\) as there exists an \(\alpha \) such that \({\mathcal {A}}(\alpha ) = Q\) in (9). However \(v_{LBB'}\) is unbounded from below for some of the instances due to negative loops in \(K^*_5\). Table 1 we shows numerical results for the first level RLT bound \(v_{RLT_1}\) for seven of the mentioned instances. We also compute the strongest semidefinite programming relaxation \(SDP_{NL}\) from Hu and Sotirov (2020) for the mentioned instances. Optimal values are provided in the last column of the table.

Table 1 The bounds for QSPP instances on \(K^*_n\)

Table 1 shows that \(v_{LBB^{*}}\) dominates both \(v_{RLT_1}\) and \(v_{SDP_{NL}}\) for all instances. It is surprising that \(v_{LBB^{*}}\) dominates also the semidefinite programming bound.

6.4 The extended linearization-based bounds

In Sect. 5 we introduce the linearization-based bounding scheme by exploiting linearizable matrices. In this section, we extend the notion of linearizable matrices, which enables us to construct a linearization-type of a bound that turns to be equivalent to the first level RLT bound also for BQPs where \(x\le e\) is not redundant for \({\bar{K}}\).

For a given BQP, we call the cost matrix Q extended linearizable if there exists a vector c such that \(c^{\mathrm T}x \le x^{\mathrm T}{Q}x\) for all \(x \in K\). An example of extended linearizable matrix is given below.

Lemma 5

Let \(\varLambda \in {{\mathcal {S}}}_+^m\), \(\varOmega \in {\mathbb {R}}^{m \times m}_{+}\). Then \(\varLambda - \varOmega \) is extended linearizable, and

$$\begin{aligned} (2\varLambda e - \varOmega e)^{\mathrm T}x - e^{\mathrm T}\varLambda e \le x^{\mathrm T}(\varLambda - \varOmega )x \end{aligned}$$

for any binary vector \(x \in \{0,1\}^{m}\).

Proof

It is not difficult to see that \(-x^{\mathrm T}\varOmega e \le -x^{\mathrm T}\varOmega x \). It also holds that

$$\begin{aligned} 2e^{\mathrm T}\varLambda x - e^{\mathrm T}\varLambda e - x^{\mathrm T}\varLambda x = -(e-x)^{\mathrm T} \varLambda (e-x) \le 0. \end{aligned}$$

The statement follows by summing both inequalities. \(\square \)

Similar to the linearization-based bound, we can use the (extended) linearizable matrices to derive a lower bound for a given BQP. For any fixed \(Y\in {\mathbb {R}}^{n \times m},z \in {\mathbb {R}}^{m},\varLambda \in {{\mathcal {S}}}_+^m,\varOmega \in {\mathbb {R}}^{m \times m}_{+}\) such that \(B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z)+ \varLambda - \varOmega \le Q\), the optimal value of the following problem

$$\begin{aligned} \min _{x\in {\bar{K}}} (2Y^{\mathrm T}b + z + 2\varLambda e - \varOmega e )^{\mathrm T}x, \end{aligned}$$

subtracted by \(e^{\mathrm T}\varLambda e \) is a lower bound for the BQP. The dual of the above LP is

$$\begin{aligned} \underset{Y,z,y}{\max } \{ \; b^{\mathrm T}y \;|\; B^{\mathrm T}y \le 2Y^{\mathrm T}b + z + 2\varLambda e - \varOmega e \}. \end{aligned}$$

In order to obtain the strongest bound of the above type, one has to solve the following maximization problem.

$$\begin{aligned} \begin{array}{ll} v_{ExLBB}:= \underset{Y,z,y,\varLambda = \varLambda ^{\mathrm T}, \varOmega }{\max } &{}\quad b^{\mathrm T}y - \langle J, \varLambda \rangle \\ &{}\quad B^{\mathrm T}Y + Y^{\mathrm T}B + {{\,\mathrm{Diag}\,}}(z)+ \varLambda - \varOmega \le Q \\ &{}\quad B^{\mathrm T}y \le 2Y^{\mathrm T}b + z + 2\varLambda e - \varOmega e \\ &{}\quad \varLambda \ge 0,\varOmega \ge 0. \end{array} \end{aligned}$$
(24)

We call the solution of (24) the extended linearization-based bound. We conclude this section by showing that \(v_{ExLBB}\) is equivalent to the first level RLT bound (17).

Theorem 3

Suppose that \(x \le e\) is not redundant for \({\bar{K}}\), then \(v_{ExLBB} = v_{RLT_1}\).

Proof

Proof follows by verifying that (17) and (24) are a primal-dual pair. \(\square \)

The above result is interesting from a theoretical perspective. However, from a practical view the linearization-based bounds are more attractive due to the smaller number of variables and constraints.

7 The QSPP linearization problem on DAGs

In this section, we first introduce several assumptions and definitions. Then, we derive necessary and sufficient conditions for an instance of the QSPP on a directed acyclic graph to be linearizable, see Theorem 4. We also show that those conditions can be verified in \({{\mathcal {O}}}(nm^{3})\) time. This result is a generalization of the corresponding results for the QSPP on directed grid graphs from Hu and Sotirov (2018).

We have the following assumptions in this section:

  1. (i)

    G is a directed acyclic graph;

  2. (ii)

    the vertices \(v_{1},\ldots ,v_{n}\) are topologically sorted, that is, \((v_{i},v_{j}) \in A \) implies \(i < j\);

  3. (iii)

    for each vertex v, there exists at least one s-t path containing v;

  4. (iv)

    the diagonal entries of the cost matrix Q are zeros.

It is well-known that a topological ordering of a directed acyclic graph can be computed efficiently. We also note that assumptions (iii) and (iv) do not restrict the generality. For instance, assumption (iv) is not restrictive as Q is linearizable if and only if \(Q+{{\,\mathrm{Diag}\,}}(c)\) is linearizable for any cost vector c, see Lemma 4.1 in Hu and Sotirov (2018).

Here, we choose and fix an arbitrary labeling of the arcs. The vertices \(v_{2},\ldots ,v_{n-1}\), e.g., those between the source vertex \(v_{1}\) and the target vertex \(v_{n}\), are called the transshipment vertices. For each transshipment vertex, we pick the outgoing arc with the smallest index and call it a non-basic arc. The remaining arcs are basic. Thus there are \(n-2\) non-basic arcs and \(m-n+2\) basic arcs.

Note that for a linearizable cost matrix its linearization vector is not necessarily unique. However, we would like to restrict our analysis to the linearization vectors that are in a unique, reduced form. For this purpose we introduce the following definitions, see also Hu and Sotirov (2018).

Definition 2

We say that the cost vectors \(c_{1}\) and \(c_{2}\) are equivalent if \(c_{1}^{\mathrm T}x = c_{2}^{\mathrm T}x\) for all \(x \in {\mathcal {P}}\).

Definition 3

The reduced form of a cost vector c is an equivalent cost vector R(c) such that \((R(c))_{e} = 0\) for every non-basic arc e.

The existence of the reduced form of the cost vector c follows from the following transformation. Let v be a transshipment vertex, and f the non-basic arc going from v. Define \({\hat{c}}\) as follows:

$$\begin{aligned} {\hat{c}}_{e} := {\left\{ \begin{array}{ll} c_{e} - c_{f} &{}\quad \text {if } e \text { is an outgoing arc from vertex } v,\\ c_{e} + c_{f} &{}\quad \text {if } e \text { is an incoming arc to } v,\\ c_{e} &{} \quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(25)

It is not difficult to verify that \({\hat{c}}\) and c are equivalent. Furthermore, if we apply this transformation at each transshipment vertex v in the reverse topological order i.e., \(v_{n-1},\ldots ,v_{2}\), then the obtained cost vector, after \(n-2\) transformations, is in the reduced form. Moreover the resulting cost vector is equivalent to c. Let us define now critical paths, see also Hu and Sotirov (2018).

Definition 4

For a basic arc \(e = (u,v)\), the associated critical path \(P_{e}\) is an s-t path containing arc e and determined as follows. Choose an arbitrary s-u path \(P_{1}\), and take for \(P_{2}\) the unique v-t path with only non-basic arcs. Then, the critical path \(P_{e} = (P_{1},P_{2})\) is the concatenation of the paths \(P_{1}\) and \(P_{2}\).

The uniqueness of \(P_{2}\) in Definition 4 follows from the fact that each transshipment vertex has exactly one outgoing arc that is non-basic and G is acyclic. Clearly, to each basic arc e we can associate one critical path \(P_{e}\) as given above.

The following result shows that for a linearizable cost matrix Q there exists a unique linearization vector in reduced form. The uniqueness is up to the choice of non-basic arcs and critical paths.

Proposition 4

Let \(Q\in {\mathbb {R}}^{m\times m}\) be a linearizable cost matrix for the QSPP, and \(c \in {\mathbb {R}}^{m}\) its linearization vector. Then, the reduced form of c, \(R(c) \in {\mathbb {R}}^{m}\), is uniquely determined by the costs of the critical paths in the underlying graph G.

Proof

Let M be a binary matrix whose rows correspond to the s-t paths in G and columns correspond to the basic arcs. In particular, \(M_{P,e} = 1\) if and only if the path P contains the basic arc e. Let b be the vector whose elements contain quadratic costs of s-t paths. Let \({\hat{c}}_{B} \in {\mathbb {R}}^{m-n+2}\) is the subvector of R(c) composed of the elements corresponding to the basic arcs. Then, \({\hat{c}}_{B}\) satisfies the linear system \(M{\hat{c}}_{B} = b\). In order to show the uniqueness of \({\hat{c}}_{B}\) it suffices to prove that the rank of M equals \(m-n+2\), which is the number of the basic arcs.

Let \({\bar{M}}\) be a square submatrix of M of size \((m-n+2) \times (m-n+2)\) whose rows correspond to the critical paths. Let \({\mathcal {C}}_{i}\) be the set of the basic arcs emanating from vertex \(v_{i}\) for \(i = 1,\ldots ,n-1\). Since the sets \({\mathcal {C}}_{1}, \ldots , {\mathcal {C}}_{n-1}\) partition the set of the basic arcs, they can be used to index the matrix \({\bar{M}}\). Upon rearrangement, \({\bar{M}}\) is a block matrix such that the (ij)th block \({\bar{M}}^{ij}\) is the submatrix whose rows and columns correspond to \({\mathcal {C}}_{i}\) and \({\mathcal {C}}_{j}\), respectively. It is readily seen that every diagonal block \({\bar{M}}^{ii}\) is an identity-matrix. Furthermore, the block \({\bar{M}}^{ij}\) is a zero matrix for \(i < j\). To see this, we first recall that the vertices are topologically ordered. Then, note that for the critical path that is associated to the arc \(e = (i,j)\), all arcs visited after e are non-basic by construction. Thus, the rank of \({\bar{M}}\) is \(m-n+2\), and this finishes the proof. \(\square \)

From the previous proposition, it follows that for a linearizable cost matrix Q its linearization vector can be computed easily from the costs of the critical paths. However, the above calculation of the unique linear cost vector in reduced form can be performed even when the linearizability of Q is not known. Since the resulting vector does not have to be a linearization vector, we call it pseudo-linearization vector. In particular, we have the following definition, see also Hu and Sotirov (2018).

Definition 5

The pseudo-linearization vector of the cost matrix \(Q \in {\mathbb {R}}^{m \times m}\) is the unique cost vector \(p \in {\mathbb {R}}^{m}\) in reduced form such that \(x^{\mathrm T}Qx = p^{\mathrm T}x\) for every critical path x.

Here, the uniqueness is up to the choice of non-basic arcs and critical paths. Recall that the pseudo-linearization vector can be computed also for a non-linearizable cost matrix. The following lemma shows that for linearizable Q its pseudo-linearization vector coincides with the linearization vector in reduced form.

Lemma 6

Let Q be linearizable. Then the corresponding linearization vector in reduced form and the pseudo-linearization vector are equal.

Proof

Let c be the linearization vector of Q in reduced form, and p the pseudo-linearization vector of Q. Then, \(c^{\mathrm T}x = p^{\mathrm T}x\) for each critical path x. From Proposition 4 it follows that \(c = p\) since both cost vectors are in the reduced form. \(\square \)

If we change the input target vertex from t to another vertex v, some arcs and vertices have to be removed from G in order to satisfy assumption (iii). This results in a reduced QSPP instance. To simplify the presentation, we introduce the following notation.

Table 2 Notation with respect to target vertex v

The next result from Hu and Sotirov (2018) establishes a relationship between the linearization vector of \(Q_{t}\) and the linearization vector of \(Q_{v}\) where \((v,t)\in A\).

Lemma 7

(Hu and Sotirov 2018) The cost vector \(c \in {\mathbb {R}}^{m}\) is a linearization of \(Q_{t} \in {\mathbb {R}}^{m \times m}\) if and only if the cost vector \(T_{e}(c) \in {\mathbb {R}}^{|A_{v}|}\) given by

$$\begin{aligned} \big (T_{e}(c)\big )_{e'} = {\left\{ \begin{array}{ll} c_{e'} - 2 \cdot q_{e,e'} &{} \quad \text {if } e' = (u,w) \in A_{v} \text { and } u \ne s \\ c_{e'} - 2 \cdot q_{e,e'} + c_{e} &{} \quad \text {if } e' = (u,w) \in A_{v} \text { and } u = s \\ \end{array}\right. }, \end{aligned}$$
(26)

is a linearization of \(Q_{v}\) for every vertex v such that \(e = (v,t)\in A\).

Now, we are ready to prove the main result in this section.

Theorem 4

Let \(R_{v}(\cdot )\) and \(p_{v}\) be defined as in Table 2, and \(T_{e}\) given as in Lemma 7. Then, the pseudo-linearization vector \(p_{t}\) is a linearization of \(Q_{t}\) if and only if \((R_{u} \circ T_{e})(p_{v}) = p_{u}\) for every \(e = (u,v) \in A\).

Proof

If \(Q_{t}\) is linearizable, then it follows from Lemmas 6 and 7 that for every vertex v, the cost matrix \(Q_{v}\) is linearizable with the linearization vector \(p_{v}\). Thus, \((R_{u} \circ T_{e})(p_{v}) = p_{u}\) for every arc \(e = (u,v) \in A\).

Conversely, assume that \(Q_{t}\) is not linearizable. Then, from Lemma 7, it follows that there exists an arc \(e = (v,t) \in A\) such that the vector \(T_{e}(p_{t})\) is not a linearization vector of \(Q_{v}\). Let us distinguish the following two cases:

  1. 1.

    If \(Q_{v}\) is linearizable, then \(p_{v}\) is its linearization vector and \((R_{v} \circ T_{e})(p_{t}) \ne p_{v}\) by Lemma 6.

  2. 2.

    If \(Q_{v}\) is not linearizable, then we again distinguish two cases. Thus, we repeat the whole argument to \(Q_{v}\).

This recursive process must eventually end up with case (i), as the number of vertices in the underlying graph decreases in each recursion step, and every cost matrix on a graph with at most three vertices is linearizable. Thus, we obtain \((R_{u} \circ T_{e})(p_{v}) \ne p_{u}\) for some \(e = (u,v) \in A\). \(\square \)

Note that the iterative procedure from Theorem  4 provides an answer to the QSPP linearization problem. Moreover, it returns the linearization vector in reduced form if such exists. We present the pesudo-algorithm for the QSPP linearization problem below.

figure b

The quadratic cost of an s-t path can be computed in \({{\mathcal {O}}}(m^2)\) steps, and thus we need \({{\mathcal {O}}}(m^3)\) steps for the \(m-n+2\) critical paths. The pseudo-linearization of Q can be obtained in \({{\mathcal {O}}}(m^2)\) steps by solving a linear system whose left-hand-side is a lower triangular square matrix \({\bar{M}}\) of order \(m-n+2\), see Proposition 4. Since there are n vertices, computing all pseudo-linearizations requires \({{\mathcal {O}}}(nm^3)\) steps. The rest of the computation takes at most \({{\mathcal {O}}}(m^3)\) steps. Thus the complexity of the algorithm given in Theorem 4 is \({{\mathcal {O}}}(nm^3)\).

Now we show that the linearization algorithm also characterize the set of linearizable matrices.

Proposition 5

Let G be an acyclic digraph. The matrices \(Q_{1},\ldots ,Q_{k}\) spanning the set of linearizable matrices can be computed in polynomial time.

Proof

Let \(v_{i}\) be a transshipment vertex, and f the non-basic arc going from \(v_{i}\). Let \(M_{i}\) be the matrix with ones on the diagonals, and the following non-zero off-digonal entries,

$$\begin{aligned} (M_{i})_{e,f} := {\left\{ \begin{array}{ll} -1 &{}\quad \text {if } e \text { is an outgoing arc from vertex } v_{i},\\ 1 &{}\quad \text {if } e \text { is an incoming arc to } v_{i}.\\ \end{array}\right. } \end{aligned}$$
(27)

It it easy to see that for a given cost vector c, \({\hat{c}} = M_{i}c\) gives an alternative representation for (25). Thus, the reduced form \(R_{u}(c)\) of c is a linear transformation given by \(R_{u}(c)=(M_{n-1}M_{n-2}\cdots M_{2})c\). Similarly, the linear operator \(T_{e}\) from Lemma 7, and the pseudo-linearization vector \(p_{v}\) can be obtained from linear transformations. Therefore, we can find a matrix L in polynomial time such that \(L\text {vec}(Q) = 0\) if and only if Q is linearizable. If \(\text {vec}(Q_{1}),\ldots ,\text {vec}(Q_{k})\) span the null space of L, then Q is linearizable if and only if \(\text {vec}(Q) = \text {vec}(\sum _{i=1}^{k}\alpha _{i}Q_{i})\) for some \(\alpha \in {\mathbb {R}}^{k}\). \(\square \)

The previous proposition can be used to compute the spanning set of linearizable matrices for the QSPP on DAGs.

8 Conclusions

In this paper, we present several applications of the linearization problem for binary quadratic problems. In particular, we propose a new lower bounding scheme which follows from a simple certificate for a quadratic function to be non-negative. Each linearization-based relaxation depends on the chosen set of linearizable matrices. This allows us to compute a number of different lower bounds. One can obtain the best possible linearization-based bound in the case that the complete characterization of the set of linearizable matrices is known.

In Theorem 2, we prove that the Generalized Gilmore–Lawler bound obtained from Algorithm 1, and for any choice of a skew-symmetric matrix, is bounded by \(v_{LBB'}\), see (14). We also show that \(v_{LBB'}\) coincides with the first level RLT bound by Adams and Sherali (1990) when the upper bounds on variables are implied by the rest of the constraints, see Lemma 3. This also implies that all Generalized Gilmore–Lawler bounds are bounded by the first level RLT bound for the mentioned setting. Similar result was already observed in the context of the quadratic assignment problem, but it was not known for BQPs in general. For BQPs where upper bounds on variables are not implied by constraints, Lemma 4 establishes the relation between \(v_{LBB'}\) and \(v_{RLT_1}\). In Proposition 3, we relate all here presented bounds with the strongest linearization-based bound \(v_{LBB^{*}}\). Our Example 1 demonstrates the strength of that bound.

We also provide a polynomial-time algorithm to solve the linearization problem of the quadratic shortest path problem on directed acyclic graphs, see Algorithm  2. Our algorithm yields the complete characterization of the set of linearizable matrices for the QSPP on DAGs. Thus, we are able to compute the strongest linearization-based bound \(v_{LBB^*}\) for the QSPP on DAGs. Our numerical experiments show that \(v_{LBB^*}\) and \(v_{LBB'}\) coincides for all tested instances.

For future research, it would be interesting to further investigate strength of linearization-based bounds for types of linearizable matrices that do not fall into the case of Lemma 1. Finally, let us note that the results from this paper (partially) address questions posed by Çela et al. (2018) related to computing good bounds by exploiting polynomially solvable special cases.