# BDD-based heuristics for binary optimization

## Authors

- First Online:

- Received:
- Revised:
- Accepted:

DOI: 10.1007/s10732-014-9238-1

- Cite this article as:
- Bergman, D., Cire, A.A., van Hoeve, W. et al. J Heuristics (2014) 20: 211. doi:10.1007/s10732-014-9238-1

- 5 Citations
- 234 Views

## Abstract

In this paper we introduce a new method for generating heuristic solutions to binary optimization problems. We develop a technique based on binary decision diagrams. We use these structures to provide an under-approximation to the set of feasible solutions. We show that the proposed algorithm delivers comparable solutions to a state-of-the-art general-purpose optimization solver on randomly generated set covering and set packing problems.

### Keywords

Binary decision diagramsHeuristicsSet covering Set packing## 1 Introduction

Binary optimization problems (BOPs) are ubiquitous across many problem domains. Over the last fifty years there have been significant advances in algorithms dedicated to solving problems in this class. In particular, general-purpose algorithms for binary optimization are commonly branch-and-bound methods that rely on two fundamental components: a relaxation of the problem, such as a linear programming relaxation of an integer programming model, and heuristics. Heuristics are used to provide feasible solutions during the search for an optimal one, which in practice is often more important than providing a proof of optimality.

Much of the research effort dedicated to developing heuristics for binary optimization has primarily focused on specific combinatorial optimization problems; this includes, e.g., the set covering problem (SCP) (Caprara et al. 1998) and the maximum clique problem (Grosso et al. 2008; Pullan et al. 2011). In contrast, general-purpose heuristics have received much less attention in the literature. The vast majority of the general techniques are embodied in integer programming (IP) technology, such as the *feasibility pump* (Fischetti et al. 2005) and the *pivot, cut, and dive* heuristic (Eckstein and Nediak 2007). A survey of heuristics for integer programming is presented by Glover and Laguna (1997a, b) and Berthold (2006). Local search methods for general binary problems can also be found in Aarts and Lenstra (1997) and Bertsimas et al. (2013).

We introduce a new general-purpose method for obtaining a set of feasible solutions for BOPs. Our method is based on an under-approximation of the feasible solution set using binary decision diagrams (BDDs). BDDs are compact graphical representations of Boolean functions (Akers 1978; Lee 1959; Bryant 1986), originally introduced for applications in circuit design and formal verification (Hu 1995; Lee 1959). They have been recently used for a variety of purposes in combinatorial optimization, including post-optimality analysis (Hadzic and Hooker 2006, 2007), cut generation in integer programming (Becker et al. 2005), and 0–1 vertex and facet enumeration (Behle and Eisenbrand 2007). The techniques presented here can also be readily applied to arbitrary discrete problems using multi-valued decision diagrams (MDDs), a generalization of BDDs for discrete-valued functions.

Our method is a counterpart of the concept of *relaxed* MDDs, recently introduced by Andersen et al. (2007) as an over-approximation of the feasible set of a discrete constrained problem. The authors used relaxed MDDs for the purpose of replacing the typical domain store relaxation used in constraint programming by a richer data structure. They found that relaxed MDDs drastically reduce the size of the search tree and allow much faster solution of problems with multiple all–different constraints, which are equivalent to graph coloring problems. Analogous methods were applied to other types of constraints in Hadzic et al. (2008) and Hoda et al. (2010).

Using similar techniques, Bergman et al. (2011) proposed the use of *relaxed BDDs* to derive relaxation bounds for binary optimization problem. The authors developed a general top–down construction method for relaxed BDDs and reported good results for structured set covering instances. Relaxed BDDs were also applied in the context of the maximum independent set problem, where the ordering of the variables in the BDD were shown to have a significant bearing on the effectiveness of the relaxation it provides (Bergman et al. 2012).

- 1.
Introducing a new heuristic for BOPs;

- 2.
Discussing the necessary ingredients for applying the heuristic to specific classes of problems;

- 3.
Providing an initial computational evaluation of the heuristic on the well-studied set covering and set packing problems. We show that, on a set of randomly generated instances, the solutions produced by our algorithm are comparable to those obtained with state-of-the-art integer programming optimization software (CPLEX).

## 2 Binary decision diagrams

BOPs are specified by a set of binary variables \(X = \{x_1, \ldots , x_n\}\), an objective function \(f : \{0,1 \}^n \rightarrow \mathbb {R}\) to be minimized, and a set of \(m\) constraints \(C = \{C_1, \ldots , C_m\}\), which define relations among the problem variables. A *solution* to a BOP \(P\) is an assignment of values zero or one to each of the variables in \(X\). A solution is *feasible* if it satisfies all the constraints in \(C\). The set of feasible solutions of \(P\) is denoted by \(\mathrm{Sol}(P)\). A solution \(x^*\) is *optimal* for \(P\) if it is feasible and satisfies \(f(x^*) \le f(\tilde{x})\) for all \(\tilde{x} \in \mathrm{Sol}(P)\).

A binary decision diagram (BDD) \(B = (U,A)\) for a BOP \(P\) is a layered directed acyclic multi-graph that encodes a set of solutions of \(P\). The nodes \(U\) are partitioned into \(n+1\) layers, \(L_1, L_2, \ldots , L_{n+1}\), where we let \(\ell (u)\) be the layer index of node \(u\). Layers \(L_1\) and \(L_{n+1}\) consist of single nodes; the root \(r\) and the terminal \(t\), respectively. The *width* of layer \(j\) is given by \(\omega _j = |L_j|\), and the *width* of \(B\) is \(\omega (B) = \max _{j \in \{1, 2, \ldots , n\}} \omega _j\). The *size* of \(B\), denoted by \(|B|\), is the number of nodes in \(B\).

Each arc \(a \in A\) is directed from a node in some layer \(j\) to a node in the adjacent layer \(j+1\), and has an associated *arc-domain*\(d_a \in \{0,1\}\). The arc \(a\) is called a *1-arc* when \(d_a = 1\) and a *0-arc* when \(d_a = 0\). For any two arcs \(a, a'\) directed out of a node \(u\), \(d_a \ne d_{a'}\), so that the maximum out-degree of a node in a BDD is two, with each arc having a unique arc-domain. Given a node \(u\), we let \(a_0(u)\) be the \(0\)-arc directed out of \(u\) (if it exists) and \(b_0(u)\) be the node in \(L_{\ell (u) + 1}\) at its opposite end, and similarly for \(a_1(u)\) and \(b_1(u)\).

A BDD \(B\) represents a set of solutions to \(P\) in the following way. An arc \(a\) directed out of a node \(u\) represents the assignment \(x_{\ell (u)} = d_a\). Hence, for two nodes \(u\), \(u'\) with \(\ell (u) < \ell (u')\), a directed path \(p\) from \(u\) to \(u'\) along arcs \(a_{\ell (u)}, a_{\ell (u)+1}, \ldots , a_{\ell (u')-1}\) corresponds to the assignment \(x_j = d_{a_j}\), \(j = \ell (u), \ell (u)+1, \ldots , \ell (u')-1\). In particular, an \(r\)–\(t\) path \(p = \left( a_1, \ldots , a_n\right) \) corresponds to a solution \(x^p,\) where \(x^p_j = d_{a_j}\) for \(j=1,\dots ,n\). The set of solutions represented by a BDD \(B\) is denoted by Sol \((B) = \{x^p \,|\,p \text{ is } \text{ an }\)\(r\)–\(t\)\(\text{ path } \}\). An *exact* BDD \(B\) for \(P\) is any BDD for which Sol\((B) = \) Sol\((P)\).

For two nodes \(u, u' \in U\) with \(\ell (u) < \ell (u')\), let \(B_{u,u'}\) be the BDD induced by the nodes that belong to some directed path between \(u\) and \(u'\). In particular, \(B_{r,t} = B\). A BDD is called *reduced* if Sol\((B_{u, u'})\) is unique for any two nodes \(u\), \(u'\) of \(B\). The reduced BDD \(B\) is unique when the variable ordering is fixed, and therefore the most compact representation in terms of size for that ordering (Wegener 2000).

Finally, for a large class of objective functions, e.g. additively separable functions, optimizing over the solutions represented by a BDD \(B\) can be reduced to finding a shortest path in \(B\). For example, given a real cost vector \(c\) and a linear objective function \(c^T x\), we can associate an *arc-cost*\(c(u,v) = c_{\ell (u)}d_{u,v}\) with each arc \(a = (u,v)\) in the BDD. This way, a shortest *r*–*t* path corresponds to a minimum cost solution in Sol\((B)\). If \(B\) is exact, then this shortest path corresponds to an optimal solution for \(P\).

*Example 1*

## 3 Exact BDDs

An exact reduced BDD \(B=(U,A)\) for a BOP \(P\) can be interpreted as a compact search tree for \(P\), where infeasible leaf nodes are removed, isomorphic subtrees are superimposed, and the feasible leaf nodes are merged into \(t\). In principle, \(B\) can be obtained by first constructing the branching tree for \(P\) and reducing it accordingly, which is impractical for our purposes.

We present here an efficient top–down algorithm for constructing an exact BDD \(B\) for \(P\). It relies on problem-dependent information for merging BDD nodes and thus reducing its size. If this information satisfies certain conditions, the resulting BDD is reduced. The algorithm is a *top–down* procedure since it proceeds by compiling the layers of \(B\) one-by-one, where layer \(L_{j+1}\) is constructed only after layers \(L_1, \ldots , L_j\) are completed.

*partial solution*that assigns a value to variables \(x_1, \ldots , x_j\). We define

*feasible completions*of \(x'\). We say that two distinct partial solutions \(x^1, x^2\) on variables \(x_1, \ldots ,x_j\) are

*equivalent*if \(F(x^1) = F(x^2)\).

The algorithm requires a method for establishing when two partial solutions are necessarily equivalent. If this is possible, then the last nodes \(u,u'\) of the BDD paths corresponding to these partial solutions can be merged into a single node, since \(B_{u,t}\) and \(B_{u',t}\) are the same. To this end, with each partial solution \(x'\) of dimension \(k\) we associate a *state function*\(\mathtt {s}: \{0,1\}^k \rightarrow S\), where \(S\) is a problem-dependent *state space*. The state of \(x'\) corresponds to the information necessary to determine if \(x'\) is equivalent to any other partial solution on the same set of variables.

Formally, let \(x^1, x^2\) be partial solutions on the same set of variables. We say that the function \(\mathtt {s}(x)\) is *sound* if \(\mathtt {s}(x^1) = \mathtt {s}(x^2)\) implies that \(F(x^1) = F(x^2)\), and we say that \(\mathtt {s}\) is *complete* if the converse is also true. The algorithm requires only a sound state function, but if \(\mathtt {s}\) is complete, the resulting BDD will be reduced.

For simplicity of exposition, we further assume that it is possible to identify when a partial solution \(x'\) cannot be completed to a feasible solution, i.e. \(F(x') = \emptyset \). It can be shown that this assumption is not restrictive, but rather makes for an easier exposition of the algorithm. We write \(\mathtt {s}(x') = \hat{0}\) to indicate that \(x'\) cannot be completed into a feasible solution. If \(x\) is a solution to \(P\), we write \(\mathtt {s}(x) = \emptyset \) if \(x\) is feasible and \(\mathtt {s}(x) = \hat{0}\) otherwise.

We now extend the definition of state functions to nodes of the BDD \(B\). Suppose that \(\mathtt {s}\) is a complete state function and \(B\) is an exact (but not necessarily reduced) BDD. For any node \(u\), the fact that \(B\) is exact implies that any two partial solutions \(x^1,x^2 \in \mathrm{Sol}(B_{r,u})\) have the same feasible completions, i.e. \(F(x^1) = F(x^2)\). Since \(\mathtt {s}\) is complete, we must have \(\mathtt {s}(x^1) = \mathtt {s}(x^2)\). We henceforth define the state of a node \(u\) as \(\mathtt {s}(u) = \mathtt {s}(x)\) for any \(x \in \mathrm{Sol}(B_{r,u})\), which is therefore uniquely defined for a complete function \(\mathtt {s}\).

We also introduce a function \(\mathtt {update}: S \times \{0,1\} \rightarrow S\). Given a partial solution \(x'\) on variables \(x_1, \ldots , x_j\), \(j < n\), and a domain value \(d \in \{0,1\}\), the function \(\mathtt {update}(\mathtt {s}(x'),d)\) maps the state of \(x'\) to the state of the partial solution obtained when \(x'\) is appended with \(d\), \(\mathtt {s}((x',d))\). This function is similarly extended to nodes: \(\mathtt {update}(s(u),d)\) represents the state of all partial solutions in \(\mathrm{Sol}(B_{r,u})\) extended with value \(d\) for a node \(u\).

*Example 2*

**Theorem 1**

Let \(\mathtt {s}\) be a sound state function for a binary optimization problem \(P\). Algorithm 1 generates an exact BDD for \(P\).

*Proof*

Consider the first iteration. We start with the root \(r\) and \(\mathtt {s}(r) = s_0\), which is the initial state corresponding to not assigning any values to any variables. \(r\) is the only node in \(L_1\). When \(d = 0,\) if there exists no feasible solution with \(x_1= 0\), no new node is created. Hence no solutions are introduced into \(B\). If otherwise there exists at least one solution with \(x_1 = 0\), we create a new node, add it to \(L_2\), and introduce a 0-arc from \(r\) to the newly created node. This will represent the partial solution \(x_1 = 0\). This is similarly done for \(d=1\).

Consider the end of iteration \(j\). Each solution \(x' = (x'', d)\) that belongs to Sol\((B_{r,u})\) for some node \(u \in L_{j+1}\) must go through some node \(u' \in L_j\) with \(b_d(u') = u\). By induction, \(x''\) is a feasible partial solution with \(\mathtt {s}(u') = \mathtt {s}(x'') \ne \hat{0}\). But when the arc \(a_d(u')\) is considered, we must have \(\mathtt {update}(u', d) \ne \hat{0}\), for otherwise this arc would not have been created. Therefore, each solution in \(\mathrm{Sol}(B_{r,u})\) is feasible. Since \(u \in L_{j+1}\) was chosen arbitrarily, only feasible partial solutions exists in \(\mathrm{Sol}(B_{r,u})\) for all nodes \(u \in L_{j+1}\).

What remains to be shown is that all feasible partial solutions exist in \(\mathrm{Sol}(B_{r,u})\) for some \(u \in L_{j+1}\). This is trivially true for the partial solutions \(x_1 = 0\) and \(x_1 = 1\). Take now any partial feasible solution \(x' = (x'', d)\) on the first \(j\) variables, \(j \ge 2\). Since \(x'\) is a partial feasible solution, \(x''\) must also be a partial feasible solution. By induction, \(x''\) belongs to \(\mathrm{Sol}(B_{r,u}),\) for some \(u \in L_j\). When Algorithm 1 examines node \(u\), \(\mathtt {update}(\mathtt {s}(u),d)\) must not return \(\hat{0}\) because \(F(x') \ne \emptyset \). Therefore, the \(d\)-arc directed out of \(u\) is created, ending at some node \(b_d(u) \in L_{j+1}\), as desired. \(\square \)

**Theorem 2**

Let \(\mathtt {s}\) be a complete state function for a binary optimization program \(P\). Algorithm 1 generates an exact reduced BDD for \(P\).

*Proof*

By Theorem 1, \(B\) is exact. Moreover, for each \(j\), each node \(u \in L_j\) will have a unique state because of line 9. Therefore, any two partial solutions \(x',x''\) ending at unique nodes \(u', u'' \in L_j\) will have \(F(x') \ne F(x'')\). \(\square \)

**Theorem 3**

Let \(B = (U,A)\) be the exact BDD outputted by Algorithm 1 for a BOP \(P\) with a sound state function \(\mathtt {s}\). Algorithm 1 runs in time \(O(|U|K)\), where \(K\) is the time complexity for each call of the \(\mathtt {update}\) function.

*Proof*

Algorithm 1 performs two calls of \(\mathtt {update}\) for every node \(u\) added to \(B\). Namely, one call to verify if \(u\) has a \(d\)-arc for each domain value \(d \in \{0,1\}\). \(\square \)

Theorem 3 implies that, if \(\mathtt {update}\) can be implemented efficiently, then Algorithm 1 runs in polynomial time in the size of the exact BDD \(B\). Indeed, there are structured problems for which one can define complete state functions with a polynomial time-complexity for \(\mathtt {update}\) (Andersen et al. 2007; Bergman et al. 2011, 2012). This will be further discussed in Sect. 5.

## 4 Restricted BDDs

Constructing exact BDDs for general binary programs using Algorithm 1 presents two main difficulties. First, the update function may take time exponential in the input of the problem. This can be circumvented by not requiring a complete state function, but rather just a sound state function. The resulting BDD is exact according to Theorem 1, but perhaps not reduced. This poses only a minor difficulty, as there exist algorithms for reducing a BDD \(B\) that have a polynomial worst-case complexity in the size of \(B\) (Wegener 2000). A more confining difficulty, however, is that even an exact reduced BDD may be exponentially large in the size of the BOP \(P\). We introduce the concept of *restricted BDDs* as a remedy for this problem. These structures provide an under-approximation, i.e. a subset, of the set of feasible solutions to a problem \(P\). Such BDDs can therefore be used as a generic heuristic procedure for any BOP.

More formally, let \(P\) be a BOP. A BDD \(B\) is called a restricted BDD for \(P\) if \(\mathrm{Sol}(B) \subseteq \mathrm{Sol}(P)\). Analogous to exact BDDs, optimizing additively separable objective functions over \(\mathrm{Sol}(B)\) reduces to a shortest path computation on \(B\) if the arc weights are assigned appropriately. Thus, once a restricted BDD is generated, we can readily extract the best feasible solution from \(B\) and provide an upper bound to \(P\).

We will focus on *limited-width* restricted BDDs, in which we limit the size of the BDD \(B\) by requiring that \(\omega (B) \le W\) for some pre-set maximum allotted width \(W\).

*Example 3*

Consider the BOP from Example 1. Figure 2 shows a width-2 restricted BDD. There are eight paths in the BDD, which correspond to eight feasible solutions. Assigning arc costs as in Example 1, a shortest path from the root to the terminal corresponds to the solution \((0,1,0,0,1)\) with an objective function value of \(-6\). The optimal value is \(-8\).

Limited-width restricted BDDs can be easily generated by performing a simple modification to Algorithm 1. Namely, we insert the procedure described in Algorithm 2 immediately after line 3 of Algorithm 1. This procedure is described as follows. We first verify whether \(\omega _j = |L_j| > W\). If so, we delete a set of \(|L_j| - W\) nodes in the current layer, which is chosen by a function \(\mathtt {node\_select}(L_j)\). We then continue building the BDD as in Algorithm 1.

Theorem 4 describes how the time complexity of Algorithm 1 is affected by the choice of the maximum allotted width \(W\).

**Theorem 4**

The modified version of Algorithm 1 for width-\(W\) restricted BDDs has a worst-case time complexity of \(O(nL + nWK)\), where \(L\) and \(K\) are the time complexity for each call of the \(\mathtt {node\_select}\) and \(\mathtt {update}\) functions, respectively.

*Proof*

Because the function \(\mathtt {node\_select}\) is called once per layer, it contributes to \(O(nL)\) to the overall time complexity. The \(\mathtt {update}\) function is called twice for each BDD node. Since there will be at most \(O(nW)\) nodes in a width-\(W\) restricted BDD, the theorem follows. \(\square \)

The selection of nodes in \(\mathtt {node\_select}(L_j)\) can have a dramatic impact on the quality of the solutions encoded by the restricted BDD. In fact, as long as we never delete the nodes \(u_1, \ldots , u_n\) that are traversed by some optimal solution \(x^*\), we are sure to have the optimal solution in the final BDD.

We observed that the following \(\mathtt {node\_select}\) procedure yields restricted BDDs with the best quality solutions in our computational experiments. We are assuming a minimization problem, but a maximization problem can be handled in an analogous way. Each node \(u \in L_j\) is first assigned a value \(lp(u) = \min f(x) \in \mathrm{Sol}(B_{r,u})\), where \(f\) is the objective function of \(P\). This can be easily computed for a number of objective functions by means of a dynamic programming algorithm; for example linear cost functions whose arc weights are as described in Sect. 2. The \(\mathtt {node\_select}(L_j)\) function then deletes the nodes in \(L_j\) with the largest \(lp(u)\) values. We henceforth use this heuristic for \(\mathtt {node\_select}\) in the computational experiments of Sect. 6. It can be shown that the worst-case complexity of this particular heuristic is \(O(W \log W)\).

## 5 Applications

We now describe the application of restricted BDDs to two fundamental problems in binary optimization: the set covering problem and the set packing problem (SPP). For both applications, we describe the problem and provide a sound state function. We then present the \(\mathtt {update}\) operation based on this state function which can be used by the modified version of Algorithm 1.

### 5.1 The set covering problem

*covers*\(\{1, \ldots , m\}\).

#### 5.1.1 State function

In addition, the following Lemma shows that \(\mathtt {s}\) is a sound state function for the SCP.

**Lemma 1**

Let \(x^1, x^2\) be two partial solutions on variables \(x_1, \ldots , x_j\). Then, \(\mathtt {s}(x^1) = \mathtt {s}(x^2)\) implies that \(F(x^1) = F(x^2)\).

*Proof*

Let \(x^1, x^2\) be two partial solutions with dimension \(j\) for which \(\mathtt {s}(x^1) = \mathtt {s}(x^2) = s'\). If \(s' = \hat{0}\) then both have no feasible completions, so it suffices to consider the case when \(s' \ne \hat{0}\). Take any completion \(\tilde{x} \in F(x^1)\). We show that \(\tilde{x} \in F(x^2)\).

*Example 4*

*Example 5*

There are several ways to modify the state function to turn it into a complete one (Bergman et al. 2011). The state function \(\mathtt {s}\) can be strengthened to a complete state function. This requires only polynomial time to compute per partial solution, but nonetheless at an additional computational cost. Sect. 6 reports results for the simpler (sound) state function presented above.

### 5.2 The set packing problem

#### 5.2.1 State function

As the following lemma shows, if the states of two partial solutions on the same set of variables are the same, then the set of feasible completions for these partial solutions are the same, thus proving that this state function is sound.

**Lemma 2**

Let \(x^1, x^2\) be two partial solutions on variables \(x_1, \ldots , x_j\). Then, \(\mathtt {s}(x^1) = \mathtt {s}(x^2)\) implies that \(F(x^1) = F(x^2)\).

*Proof*

Let \(x^1, x^2\) be two partial solutions for which \(\mathtt {s}(x^1) = \mathtt {s}(x^2) = s'\). If \(s' = \hat{0}\) then both have empty sets of feasible completions, so it suffices to consider the case when \(s' \ne \emptyset \). Take any partial solution \(\tilde{x} \in F(x^1)\). We show that \(\tilde{x} \in F(x^2)\).

First suppose that \(\sum \nolimits _{k=j+1}^n a_{i^*,k} \, \tilde{x}_k = 1\). By (6), \(\sum \nolimits _{k=1}^j a_{i^*,k} \, x^1_k = 0\). This implies that \(F(x^1)\) contains \(i^*\) since no variables in \(C_{i^*}\) are set to 1 and there exists \(\ell \in C_{i^*}\) with \(\ell > j\). Therefore \(F(x^2)\) also contains \(i^*\), implying that no variable in \(C_{i^*}\) is set to one in the partial solution \(x^2\). Hence \(\sum \nolimits _{k=1}^j a_{i^*,k} \, x^2_k = 0\), contradicting (5).

Now suppose that \(\sum \nolimits _{k=j+1}^n a_{i^*,k} \, \tilde{x}_k = 0\). Then \(\sum \nolimits _{k=1}^j a_{i^*,k} \, x^2_k > 1\), contradicting the assumption that \(s' = \mathtt {s}(x^2) \ne \emptyset \). \(\square \)

*Example 6*

*Example 7*

There are several ways to modify the state function above to turn it into a complete one. For example, one can reduce the SPP to an independent set problem and apply the state function defined in Bergman et al. (2012). We only consider the sound state function in this work.

## 6 Computational experiments

In this section, we perform a computational study on randomly generated set covering and set packing instances. We evaluate our method by comparing the bounds provided by a restricted BDD with the ones obtained via state-of-the-art IP technology. We acknowledge that a procedure solely geared toward constructing heuristic solutions for BOPs is in principle favored against general-purpose IP solvers. Nonetheless, we sustain that this is still a meaningful comparison, as modern IP solvers are the best-known general bounding technique for 0–1 problems due to their advanced features and overall performance. This method of testing new heuristics for BOPs was employed by the authors in Bertsimas et al. (2013) and we provide a similar study here to evaluate the effectiveness of our algorithm.

CPLEX parameters

Parameters (CPLEX internal name) | Value |
---|---|

Version | 12.4 |

Number of explored nodes (NodeLim) | 0 (only root) |

Parallel processes (Threads) | 1 |

Cuts (Cuts, Covers, DisjCuts, \(\ldots \)) | \(-\)1 (off) |

Emphasis (MIPEmphasis) | 4 (find hidden feasible solutions) |

Time limit (TiLim) | 3,600 |

*bandwidth*. The bandwidth of a matrix \(A\) is defined as

*minimum bandwidth problem*seeks to find a variable ordering that minimizes the bandwidth (Martí et al. (2008); Corso and Manzini (1999); Feige (2000); Gurari and Sudborough (1984); Martí et al. (2001); Piñana et al. (2004); Saxe (1980)). This underlying structure, when present in \(A\), can be captured by BDDs, resulting in good computational performance.

### 6.1 Problem generation

*consecutive ones property*and is totally unimodular (Fulkerson and Gross 1965) and IP finds the optimal solution for the set packing and set covering instances at the root node. Similarly, we argue that an \((m+1)\)-width restricted BDD is an exact BDD for both classes of problems, hence also yielding an optimal solution for when this structure is present. Indeed, we show that \(A\) containing the consecutive ones property implies that the state of a BDD node \(u\) is always of the form \(\{j,j+1,\dots ,m\}\) for some \(j \ge \ell (u)\) during top–down compilation.

To see this, consider the SCP. We claim that for any partial solution \(x'\) that can be completed to a feasible solution, \(\mathtt {s}(x') = \{ i(x'), i(x')+1, \ldots , m \}\) for some index \(i(x')\), or \(\mathtt {s}(x') = \emptyset \) if \(x'\) satisfies all of the constraints when completed with 0’s. Let \(j' \le j\) be the largest index in \(x'\) with \(x'_j = 1\). Because \(x'\) can be completed to a feasible solution, for each \(i \le b_w + j - 1\) there is a variable \(x_{j_i}\) with \(a_{i,j_i} = 1\). All other constraints must have \(x_j = 0\) for all \(i\) with \(a_{i,j} = 0\). Therefore \(\mathtt {s}(x') = \{ b_w + j, b_w + j + 1, \ldots , m \}\), as desired. Hence, the state of every partial solution must be of the form \({i, i+1, \ldots , m}\) or \(\emptyset \). Because there are at most \(m+1\) such states, the size of any layer cannot exceed \((m+1)\). A similar argument works for the SPP.

Increasing the bandwidth \(b_w\), however, destroys the totally unimodular property of \(A\) and the bounded width of \(B\). Hence, by changing \(b_w\), we can test how sensitive IP and the BDD-based heuristics are to the staircase structure dissolving.

We note here that generating instances of this sort is not restrictive. Once the bandwidth is large, the underlying structure dissolves and each element of the matrix becomes randomly generated. In addition, as mentioned above, algorithms to solve the minimum bandwidth problem exactly or approximately have been investigated. To any SCP or SPP one can therefore apply these methods to reorder the matrix and then apply the BDD-based algorithm.

### 6.2 Relation between solution quality and maximum BDD width

We first analyze the impact of the maximum width \(W\) on the solution quality provided by a restricted BDD. To this end, we report the generated bound versus maximum width \(W\) obtained for a set covering instance with \(n=1{,}000\), \(k=100\), \(b_w=140\), and a cost vector \(c\) where each \(c_j\) was chosen uniformly at random from the set \(\{1,\dots ,nc_j\}\), where \(nc_j\) is the number of constraints in which variable \(j\) participates. We observe that the reported results are common among all instances tested.

### 6.3 Set covering

First, we report the results for two representative classes of instances for the SCP. In the first class, we studied the effect of \(b_w\) on the quality of the bound. To this end, we fixed \(n=500\), \(k=75\), and considered \(b_w\) as a multiple of \(k\), namely \(b_w \in \{\lfloor 1.1k\rfloor , \lfloor 1.2k\rfloor , \dots , \lfloor 2.6k\rfloor \}\). In the second class, we analyzed if \(k\), which is proportional to the density of \(A\), also has an influence on the resulting bound. For this class we fixed \(n=500\), \(k \in \{25,50,\dots ,250\}\), and \(b_w = 1.6k\). In all classes we generated 30 instances for each triple \((n,k,b_w)\) and fixed \(500\) as the restricted BDD maximum width.

It is well-known that the objective function coefficients play an important role in the bound provided by IP solvers for the set covering problem. We considered two types of cost vectors \(c\) in our experiments. The first is \(c=\mathbf {1}\), which yields the *combinatorial* SCP. For the second cost function, let \(nc_j\) be the number of constraints that include variable \(x_j\), \(j=1,\ldots ,n\). We chose the cost of variable \(x_j\) uniformly at random from the range \([0.75nc_j, 1.25nc_j]\). As a result, variables that participate in more constraints have a higher cost, thereby yielding harder SCPs to solve. This cost vector yields the *weighted* SCP.

The feasible solutions are compared with respect to their *optimality gap*. The optimality gap of a feasible solution is obtained by first taking the absolute difference between its objective value and a lower bound to the problem, and then dividing this by the solution’s objective value. In both BDD and IP cases, we used the dual value obtained at the root node of CPLEX as the lower bound for a particular problem instance.

Next, we compare solution quality and time as the number of variables \(n\) increases. We generated random instances with \(n \in \{250, 500, 750, \ldots , 4{,}000\}\), \(k = 75\), and \(b_w = 2.2k = 165\) to this end. The choice of \(k\) and \(b_w\) was motivated by Fig. 6b, corresponding to the configuration where IP outperforms BDD with respect to solution quality when \(n=500\). As before, we generated 30 instances for each \(n\). Moreover, only weighted set covering instances are considered in this case.

*y*axis in Fig. 8b is in logarithm scale. For \(n > 500\), we observe that the restricted BDDs yield better-quality solutions than the IP method, and as \(n\) increases this gap remains constants. However, IP times grow in a much faster rate than restricted BDD times. In particular, with \(n=4{,}000\), the BDD times are approximately two orders-of-magnitude faster than the corresponding IP times.

### 6.4 Set packing

We extend the same experimental analysis of the previous section to set packing instances. Namely, we initially compare the quality of the solutions by means of two classes of instances. In the first class we analyze variations of the bandwidth by generating random instances with \(n=500\), \(k=75\), and setting \(b_w\) in the range \(\{\lfloor 1.1k\rfloor , \lfloor 1.2k\rfloor , \ldots , \lfloor 2.5k\rfloor \}\). In the second class, we analyze variations in the density of the constraint matrix \(A\) by generating random instances with \(n=500\), \(k \in \{25,50,\dots ,250\}\), and with a fixed \(b_w = 1.6k\). In all classes, we created 30 instances for each triple \((n,k,b_w)\) and set \(500\) as the restricted BDD maximum width.

The quality is also compared with respect to the optimality gap of the feasible solutions, which is obtained by dividing the absolute difference between the solution’s objective value and an upper bound to the problem by the solution’s objective value. We use the the dual value at CPLEX’s root node as the upper bound for each instance.

Similarly to the SCP, experiments were performed with two types of objective function coefficients. The first, \(c=\mathbf {1}\), yields the *combinatorial* set packing problem. For the second cost function, let \(nc_j\) again denote the number of constraints that include variable \(x_j\), \(j=1,\dots ,n\). We chose the objective coefficient of variable \(x_j\) uniformly at random from the range \([0.75nc_j, 1.25nc_j]\). As a result, variables that participate in more constraints have a higher cost, thereby yielding harder set packing problems since this is a maximization problem. This cost vector yields the *weighted* SPP.

Next, we proceed analogous to the set covering case and compare solution quality and time as the number of variables \(n\) increases. As before, we generated random instances with \(n \in \{250, 500, 750, \ldots , 4{,}000\}\), \(k = 75\), and \(b_w = 2.2k = 165\), and 30 instances per configuration. Only weighted set packing instances are considered.

## 7 Conclusion

Unlike problem-specific heuristics, general-purpose heuristics for BOPs have received much less attention in the literature. Often, the latter end up incorporated into integer programming software, many of which have dozens of such heuristics at their disposal. With each heuristic likely to be better suited for BOPs with different mathematical structures, IP solvers typically run many of them at the root node, as well as during search, hoping to find strong primal bounds to help with node pruning and variable fixing. Therefore, it is important for these heuristics to produce high-quality solutions quickly.

We introduce a new structure, restricted BDDs, and describe how they can be used to develop a new class of general-purpose heuristics for BOPs. A restricted BDD is a limited-size directed acyclic multigraph that represents an under-approximation of the feasible set. One of the advantages of representing BOPs with BDDs is that finding the best feasible solution for any separable objective function only requires solving a shortest path problem. Secondly, adapting a generic restricted BDD to a particular problem type is simple; it amounts to defining two criteria used while building the BDD: how to delete nodes from layers that grow beyond the maximum allowed width, and how to combine equivalent nodes in a given layer. Our empirical observations indicate that a good rule of thumb for the first criterion is to keep nodes whose paths to the root of the BDD are the shortest when dealing with minimization objectives, or the longest when dealing with maximization objectives. The second criterion is more problem-specific, as detailed in Sect. 5, but still often easy to implement.

To test its effectiveness, we apply our restricted-BDD approach to randomly generated set covering and set packing instances, and compare its performance against the heuristic solution-finding capabilities of the state-of-the-art IP solver CPLEX. Our first empirical observation is that, among all instances tested, the quality of the solution obtained by the restricted BDD approaches the optimal value with a super-exponential-like convergence in the value of the maximum BDD width \(W\), whereas the time to build the BDD and calculate the solution only grows linearly in \(W\). For both the set covering and set packing problems we consider combinatorial instances, which have all costs equal to one, as well as weighted instances, which have arbitrary costs.

For the SCP, solutions obtained by the restricted BDD can be up to 30 % better on average than solutions obtained by CPLEX. This advantage progressively decreases as either the bandwidth of the coefficient matrix \(A\) increases, or the sparsity of \(A\) decreases. In general, the BDD performs better on weighted instances. In terms of execution time, the BDD approach has a slight advantage over the IP approach on average, and can sometimes be up to twice as fast.

For the SPP, the BDD approach exhibits even better performance on both the combinatorial and weighted instances. Its solutions can be up to 70 % better on average than the solutions obtained by CPLEX, with the BDD performing better on weighted instances than on combinatorial instances once again. Unlike what happened in the set covering case, on average, the BDD solutions were always at least as good as the ones produced by CPLEX. In addition, the BDD’s performance appears to improve as the bandwidth of \(A\) increases. As the sparsity of \(A\) changes, the BDD’s performance is good for sparse instances, drops at first as sparsity starts to increase, and tends to slowly increase again thereafter. In terms of execution time, the BDD approach can be up to an order of magnitude faster than CPLEX.

In summary, our results indicate that restricted BDDs can become a useful addition to the existing library of heuristics for binary optimization problems. Several aspects of our algorithm may still need to be further investigated, including the application to broader classes of problems and how BDDs can be incorporated into existing complete or heuristic methods. For example, they could be used as an additional primal heuristic during a branch-and-bound search. Moreover, restricted BDDs could also be applied to problems for which no strong linear programming relaxation is known, since they can accommodate constraints of arbitrary form.