A branch and bound algorithm for robust binary optimization with budget uncertainty

Since its introduction in the early 2000s, robust optimization with budget uncertainty has received a lot of attention. This is due to the intuitive construction of the uncertainty sets and the existence of a compact robust reformulation for (mixed-integer) linear programs. However, despite its compactness, the reformulation performs poorly when solving robust integer problems due to its weak linear relaxation. To overcome the problems arising from the weak formulation, we propose a bilinear formulation for robust binary programming, which is as strong as theoretically possible. From this bilinear formulation, we derive strong linear formulations as well as structural properties for robust binary optimization problems, which we use within a tailored branch and bound algorithm. We test our algorithm’s performance together with other approaches from the literature on a diverse set of “robustified” real-world instances from the MIPLIB 2017. Our computational study, which is the first to compare many sophisticated approaches on a broad set of instances, shows that our algorithm outperforms existing approaches by far. Furthermore, we show that the fundamental structural properties proven in this paper can be used to substantially improve the approaches from the literature. This highlights the relevance of our findings, not only for the tested algorithms, but also for future research on robust optimization. To encourage the use of our algorithms for solving robust optimization problems and our instances for benchmarking, we make all materials freely available online.


Introduction
Dealing with uncertainties is inevitable when considering real-world optimization problems. A classical approach to include uncertainties into the optimization process is robust optimization, where different realizations of the uncertain parameters are modeled via an uncertainty set. A robust optimal solution remains feasible for all considered scenarios in the uncertainty set and minimizes the worst-case cost occurring under these scenarios. The concept was first introduced by Soyster [40] in the early 1970s, was later considered for combinatorial optimization problems and discrete uncertainty sets by Kouvelis and Yu [30] in the 1990s, and was analyzed in detail by Ben-Tal and Nemirovski [10][11][12] and Bertsimas and Sim [15,16] at the beginning of this century. An overview on robust optimization is given in [9,13,21]. The approach by Bertsimas and Sim has proven to be the most popular, with the introductory paper [16] being the most cited document on robust optimization in the literature databases Scopus and Web of Science (search for robust optimization in title, keywords, and abstract). The approach's popularity is primarily based on the intuitive definition of the uncertainty set and the existence of a compact reformulation for the robust counterpart. However, instances from practice can still pose a considerable challenge for modern MILP solvers, even if the non-robust problem is relatively easy to solve, as observed e.g. by Kuhnke et al. [31]. In this paper, we address this challenge by studying the structure of robust binary problems and proposing a new branch and bound algorithm. Thereby, we restrict ourselves to problems with uncertain objective functions. However, most of our results carry over to general robust optimization.
We start by formally defining a standard, so called nominal, binary program NOM min with an objective vector c ∈ R n , a constraint matrix A ∈ R m×n , and a right-hand side b ∈ R m . Instead of assuming the objective coefficients c i to be certain, we consider uncertain coefficients c i that lie in an interval c i ∈ c i , c i +ĉ i and can deviate from their nominal value c i by up to the deviationĉ i . In the worst-case, all coefficients c i are equal to their maximum value c i +ĉ i , as this maximizes the optimal solution value. However, for practical problems it is in general very unlikely that all coefficients deviate to their maximum value. Bertsimas and Sim [16] propose a robust counterpart to NOM, with an adjustable level of conservatism, by defining a budget Γ ∈ [0, n] on the set of considered uncertain scenarios. For this robust counterpart, we do not consider all scenarios, but only those in which at most Γ coefficients c i deviate to their maximum c i +ĉ i and one coefficient deviates by a fraction of (Γ − Γ ). The robust counterpart can be written as . . , n}. The above problem is non-linear and thus impractical, but can be reformulated by dualizing the inner maximization problem, as shown by Bertsimas and Sim [16]. This results in the compact robust problem Unfortunately, solving ROB as an MILP may require much more time than solving the nominal problem NOM. For example, we observed in our computational study that Gurobi [26] already struggles to solve robust knapsack instances with only 100 items within a time limit of an hour (see Sect. 8.5). This is because the integrality gap of the formulation P ROB may be arbitrarily large, even if the integrality gap of the corresponding nominal problem is zero (see Sect. 2). This is problematic, since a large integrality gap implies that optimal solutions to the linear relaxation are most likely far from being integer feasible, i.e., many variables that should be integer take fractional values. However, primal heuristics in MILP solvers, like the feasibility pump [19], perform better for solutions that are nearly integer feasible. Furthermore, even if we find an optimal solution, we probably have to spend much more time proving that it is indeed optimal.
There exist several approaches and studies in the literature on how to solve ROB in practice. Bertsimas et al. [14] as well as Fischetti and Monaci [20] evaluate whether it is more efficient to solve ROB over the compact reformulation P ROB or using a separation approach over an alternative formulation with exponentially many inequalities, all of which correspond to a scenario from the uncertainty set. Although the alternative formulation is exponentially large, it is theoretically as strong, or weak respectively, as the compact reformulation. Atamtürk [5] addresses the issue of the weak formulation P ROB and proposes four different strong, although considerably larger, formulations for solving ROB. Atamtürk even proves that the strongest of the four formulations describes the convex hull of the set of robust solutions if the linear relaxation P NOM = x ∈ [0, 1] n |Ax ≥ b} is the convex hull of the set of nominal solutions. Another approach for solving ROB is to resort to its nominal counterpart. Bertsimas and Sim [15] show that there always exists an optimal solution (x, p, z) to ROB such that z ∈ ĉ 0 ,ĉ 1 , . . . ,ĉ n , withĉ 0 = 0. Note that the ideal choice for p i is always ĉ i − z + x i , with (a) + := max {a, 0} for a ∈ R. When fixing z ∈ ĉ 0 ,ĉ 1 , . . . ,ĉ n , the term ĉ i − z + x i becomes linear, and thus ROB can be written as an instance of its nominal counterpart Hence, solving ROB reduces to solving up to ĉ 0 ,ĉ 1 , . . . ,ĉ n ≤ n + 1 nominal subproblems NOS (z), implying that the robust counterpart of polynomially solvable nominal problems is again polynomially solvable. However, if the number of distinct deviations ĉ 0 , . . . ,ĉ n is large then solving all nominal subproblems may require too much time. Hence, it is beneficial to discard as many non-optimal choices for z as possible. For Γ ∈ Z, Álvarez-Miranda et al. [4] as well as Park and Lee [38] showed independently that there exists a subset Z ⊆ ĉ 0 , . . . ,ĉ n containing an optimal choice for z with |Z | ≤ n + 2 − Γ , or |Z | ≤ n + 1 − Γ respectively. This result was later improved by Lee and Kwon [33], who prove that Z can be chosen such that |Z | ≤ n−Γ 2 + 1 holds. Hansknecht et al. [27] propose a divide and conquer approach for the robust shortest path problem that also aims to reduce the number of nominal subproblems to be solved. Their algorithm, which can as well be used to solve general problems ROB, successively divides the set of deviations ĉ 0 , . . . ,ĉ n into intervals and chooses in each iteration a value z from the most promising interval for solving the nominal subproblem NOS (z). After each iteration, given the optimal objective values of the previously considered subproblems, non-optimal choices of z are identified and discarded by using a relation between the optimal objective values of NOS (z) for different z.
Roughly summarized, there are two general directions for solving ROB in the literature: strong formulations on the one hand and fixing z on the other hand. In this paper, we take a middle course between these directions by proposing a branch and bound algorithm that combines restrictions on z with strong formulations. The general idea of the branch and bound paradigm, which was first proposed by Land and Doig [32], for solving general optimization problems min v (x) x ∈ X is to partition (branch) the set of feasible solutions X = k i=1 X i and then solve the corresponding subproblems min v (x) x ∈ X i recursively. In order to avoid a complete enumeration, an easy to obtain dual bound v (X ) ≤ min v (x) x ∈ X is computed for every considered X ⊆ X and compared with a primal bound, which is the value of the so far best known solution. In our case, we partition the set of solutions X = P ROB ∩ (Z n × R n × Z ), where Z contains an optimal choice for z, into subsets P ROB ∩ (Z n × R n × Z ) with Z ⊆ Z . For the corresponding robust subproblems ROB (Z ), we introduce improved formulations P (Z ) and prove structural properties, from which we derive strong dual bounds on the optimal objective value v (ROB (Z )). This enables us to prune subsets Z ⊆ Z containing non-optimal values for z. Furthermore, once the not yet pruned sets Z are sufficiently small, our findings enable us to solve ROB (Z ) efficiently as an MILP, sparing us from considering many nominal subproblems NOS (z) separately.
The fourfold contribution of this paper is summarized in the following.
-We propose a branch and bound algorithm to solve ROB and show in an extensive computational study that it outperforms all existing approaches from the literature by far. The code of all tested algorithms is available online [23]. -For developing the branch and bound algorithm, we first introduce different strong formulations and prove several structural properties for ROB. -We show that these structural properties can as well be used to improve existing approaches from the literature substantially, highlighting the relevance of our findings also for future research. -To conduct the computational study, we carefully construct a set of hard robust instances on the basis of real-world nominal problems from MIPLIB 2017 [25]. We make these instances freely available online for future benchmarking in the field of robust optimization [24].
Outline Before we introduce the basic framework of our branch and bound algorithm, we provide the theoretical foundations in Sects. 2 and 3. In Sect. 2, we discuss the weakness of P ROB and propose a bilinear formulation P BIL for ROB, which is as strong as theoretically possible. Although the bilinearity limits the practical use of this formulation, P BIL will play a critical role in the design of our branch and bound algorithm. Based on the bilinear formulation, we introduce the strong linear formulations P (Z ) for restricted z ∈ Z in Sect. 3. Using formulation P (Z ), we present a basic framework of our branch and bound algorithm in Sect. 4, which will then be improved in the subsequent sections by gaining more insight in the structure of ROB. In Sect. 5, we show how to improve the formulations by using cliques in the so-called conflict graph of the nominal problem. In Sect. 6, we characterize optimal choices of p and z, establishing the theoretical background for many components of the branch and bound algorithm, which we describe in detail in Sect. 7. Finally, in Sect. 8 we conduct our computational study.

A strong bilinear formulation
To better understand why formulation P ROB is problematic, we start by considering an example showing that the integrality gap of ROB can be arbitrarily large, even if the integrality gap of the corresponding nominal problem is zero.
Example 1 Consider the trivial task of choosing the smallest out of n elements whose integrality gap is zero for all c ∈ R n . Now, consider an instance of the uncertain counterpart ROB with c = 0,ĉ = 1, and Γ = 1 Let v (ROB) be the optimal objective value of ROB and v R (ROB) be the optimal value of the linear relaxation. For the above problem, we have v (ROB) = 1. However, The example shows that choosing fractional values of x in the linear relaxation enables us to meet the constraints p i + z ≥ĉ i x i with a relatively low value of z, which marginalizes the influence of the deviations on the objective value. To overcome these problems, we will discuss alternative formulations for modeling ROB.
Formally, we call P ⊆ R n 1 +n 2 a formulation for the problem min c T x x ∈ X with a set of solutions X ⊆ Z n 1 × R n 2 if P ∩ (Z n 1 × R n 2 ) = X holds [42]. Using a formulation, we can solve the original problem by solving min c T x x ∈ P and branching on the integer variables. Additionally, P ⊆ R n+n is called an extended formulation for a problem if its projection proj P ⊆ R n into the original solution space is a formulation for that problem. For two formulations P 1 and P 2 with P 1 ⊆ P 2 , we say that P 1 is at least as strong as P 2 . When considering extended formulations, we compare their projections instead.
To the best of our knowledge, the only results directly targeting the weakness of P ROB are presented by Atamtürk [5], who proposes four problems RP1 -RP4 that are equivalent to ROB, using different (extended) formulations P RP1 , . . . , P RP4 . The theoretical strength of the four formulations exceeds the one of P ROB by far. More precisely, we have proj P RP4 proj P RP1 = P RP2 = proj P RP3 P ROB for non-trivial cases. The problem over the strongest formulation with [n] 0 := {0, . . . , n}, is especially interesting. For every vertex (x, p, z, ω, λ) of the polyhedron P RP4 , it holds λ k * = 1 for a k * ∈ [n] 0 and λ k = 0 for k = k * .
Choosing λ in such a way reduces RP4 to solving the nominal subproblem NOS ĉ k * . Thus, RP4 essentially combines the nominal subproblems NOS (z) that are solved in the Bertsimas and Sim approach for all possible values z ∈ ĉ 0 , . . . ,ĉ n into one problem. Formulation P RP4 is not only the strongest proposed by Atamtürk, but can be seen as the strongest possible polyhedral formulation overall. This is because it preserves the integrality gap of the nominal problem [5]. However, the disadvantage of all formulations P RP1 , . . . , P RP4 is that they may become too large for practical purposes, as we will see in the computational study in Sect. 8.
To deal with this issue, we introduce a smaller, although bilinear, formulation for ROB. For this, we multiply z in the constraints p i +z ≥ĉ i x i of the original formulation P ROB with x i for all i ∈ [n]. The resulting constraint p i + zx i ≥ĉ i x i is valid for all solutions of ROB, since the inequality becomes p i ≥ 0 for x i = 0 and is equivalent to the original inequality for x i = 1. The new bilinear formulation is at least as strong as P RP4 , as stated in the following theorem.
Theorem 1 It holds P BIL ⊆ proj P RP4 .
Although formulation P BIL is strong and compact, its bilinearity is rather hindering when solving instances in practice. To understand how we can still make practical use of it, we first consider P BIL with z restricted to a fixed value. The formulation becomes not only linear, but it also holds p i = ĉ i − z + x i for all i ∈ [n] in an optimal (fractional) solution (x, p, z). Hence, the problem of optimizing over the set which is the linear relaxation of the nominal subproblem NOS (z). This is noteworthy, since this equivalence does not hold for P ROB ∩ R 2n × {z} . The strength of the linearization for fixed z suggests that we may also derive strong linearizations of P BIL for general restrictions on z, that is z ∈ Z ⊆ ĉ 0 , . . . ,ĉ n . In the next section, we introduce such a linearization, which will be a key component of our branch and bound algorithm.

Strong linear formulations for bounded z
Consider a subset Z ⊆ ĉ 0 , . . . ,ĉ n and let z = min (Z ) and z = max (Z ) for the remainder of this paper. Assuming that there exists an optimal solution (x, p, z) to ROB with z ∈ Z , we can restrict ourselves to the domain R 2n × z, z . We use this to obtain a linear relaxation of the restricted bilinear formulation P BIL ∩ R 2n × z, z .

Lemma 1 The linear constraints
and are valid for all (x, p, z) ∈ P BIL ∩ R 2n × z, z .
Proof Since p i + zx i ≥ĉ i x i and p i ≥ 0 hold for all (x, p, z) ∈ P BIL , the restriction z ≤ z implies Furthermore, due to z ≤ z, we obtain Note that the Constraints (1) and (2) strictly dominate the inequalities p i + z ≥ĉ i x i and p i ≥ 0 of P ROB in the case of z > 0 andĉ i > z respectively. Both constraints address the problem of the original formulation, which is that one can decrease x i in a fractional solution down to x i ≤ ẑ c i in order to choose p i = 0, even if we haveĉ i > z. Given a lower bound z ≥ z, Constraint (1) reduces the benefit of decreasing x i , as the right-hand side only decreases with the factor ĉ i − z + instead ofĉ i . For an upper bound z ≤ z, Constraint (2) guarantees that p i is not zero forĉ i > z and x i > 0 by using the fact that the value of p i is at leastĉ i − z if we haveĉ i > z and x i = 1. Using these strengthened constraints, we obtain the robust subproblem As shown in Lemma 1, P (Z ) is a relaxation of the restricted bilinear formulation P BIL ∩ R 2n × z, z . Note that P (Z ) becomes stronger, the narrower the bounds of Z are, i.e., for Z , Z with z, z z , z it holds P (Z ) P Z ∩ R 2n × z, z for non-trivial cases. The following statement shows that P (Z ) is even as strong as P BIL in the case where z equals one of the bounds z, z.
and for z = z, we have Hence, (x, p, z) ∈ P BIL and thus P (Z )∩ R 2n × z, z ⊆ P BIL ∩ R 2n × z, z . The statement follows together with Lemma 1.
Note that the improved formulation P (Z ) comes with the cost of a larger constraint matrix compared to P ROB , as we have This can be hindering in practice, as smaller constraint matrices tend to be computationally beneficial. We overcome this issue by substituting p i = p i + ĉ i − z + x i and z = z + z. We then obtain the equivalent substituted problem The substituted problem ROB S (Z ) is also interesting from a theoretical point of view.
Since z ≤ z − z holds for all optimal solutions, ROB S (Z ) is equivalent to ROB for an instance with objective coefficients and an added constant Γ z. This will be useful in subsequent sections, since properties that we prove for ROB carry over directly to ROB S (Z ) and ROB (Z ).
In the next section, we show how to use formulation P (Z ) in a branch and bound algorithm for solving ROB.

The basic branch and bound framework
The general idea of our branch and bound framework, which is sketched in Algorithm 1, is to solve ROB by branching the set ĉ 0 , . . . ,ĉ n of possible values for z into subsets Z ⊆ ĉ 0 , . . . ,ĉ n , for which we then consider the robust subproblem ROB (Z ). For each considered subset Z , we store a dual bound v (Z ) based on the linear relaxation value v R ROB S Z for a superset Z ⊇ Z using the strong formulation from the previous section. If the dual bound v (Z ) is greater than or equal to the current primal bound v then we can prune Z . If Z cannot be pruned, we first asses the strength of formulation P S (Z ), which converges towards the strength of P BIL ∩ R 2n × z, z and achieves equality at latest for |Z | = 1 according to Proposition 1. If P S (Z ) is almost as strong as P BIL ∩ R 2n × z, z then we may directly solve the substituted robust subproblem ROB S (Z ), sparing us from considering further subsets of Z . Otherwise, if P S (Z ) is too weak, we continue solving the linear relaxation and branching into subsets Note that the framework given in Algorithm 1 only serves for getting a basic intuition, as many components are described vaguely. For example, we leave open for now how to evaluate whether we can stop branching Z due to P S (Z ) being "strong enough". We will describe all components of our algorithm in detail in Sect. 7. There, we will not only discuss whether and how we should branch Z (Sect. 7.5) and how to choose the next Z ∈ N (Sect. 7.4), but also improve on the computation of dual bounds (Sect. 7.1) and primal bounds (Sect. 7.2) and discuss an efficient pruning strategy (Sect. 7.3).
Before doing so, we first establish some theoretical background in the following two sections that will be crucial for the design of our algorithm.

Algorithm 1: The Basic Branch and Bound Framework
Input: An instance of ROB Output: An optimal solution x * , p * , z * of value v 1 Initialize N = ĉ 0 , . . . ,ĉ n 2 Set dual bound v ĉ 0 , . . . ,ĉ n = −∞ and primal bound v = ∞ 3 while N = ∅ do 4 Choose Z ∈ N and remove N ← N \ {Z } 5 if v (Z ) < v then 6 if P S (Z ) is "strong enough" then 7 Solve ROB S (Z ) and update x * , p * , z * and v if a new incumbent is found 8 else 9 Compute optimal solution x, p , z ∈ P S (Z ) with value v x, p , z

A reformulation using cliques in conflict graphs
In this section, we propose a stronger formulation for ROB that depends on so-called conflicts between variables and can also be used to solve the robust subproblems ROB (Z ). We already considered the concept of extended formulations in Sect. 2. We now propose a reformulation that is also not in the original variable space, but not an extended formulation. To generalize the concept, we call a problem v = min c T x x ∈ X over a polyhedron P ⊆ R n 1 +n 2 with P ∩ Z n 1 × R n 2 = X a reformulation in a different variable space of v = min c T x x ∈ X , if both have the same optimum objective value, i.e., v = v , and there exists a polynomially time computable, cost preserving mapping φ : P → R n 1 +n 2 with φ X ⊆ X . Then, instead of solving the original problem, we can solve the problem over X and map an optimal solution x ∈ X to an optimal solution φ x ∈ X . To generalize the concept of strong formulations, we say that P 1 is at least as strong as P 2 if φ 1 P 1 ⊆ φ P 2 holds.
Here, we reformulate ROB in a different variable space by aggregating variables p in a tailored preprocessing step. Preprocessing routines, which aim to reduce the size and improve the strength of a given problem formulation, are key components of modern MILP solvers and critical to their performance [1,3,17]. One of these preprocessing routines involves the search for logical implications between binary variables, e.g., x i = 1 ⇒ x j = 0 for every solution x. These implications can be modeled within a so-called conflict graph, consisting of a node for every binary variable x i and its complement x i = (1 − x i ) [6]. There exists an edge between two nodes in the conflict graph if there exists no solution where the corresponding literals are both equal to one. Since every solution to the original problem corresponds to an independent set within the conflict graph, all valid inequalities for the independent set problem on the conflict graph are also valid for the original problem. An interesting type of valid inequalities are set-packing constraints, which are defined by cliques in the conflict graph, i.e., subsets of nodes forming a complete subgraph. For a clique x i 1 , . . . , x i q ∪ x j 1 , . . . , x j q , at most one of the literals can be equal to one, which yields the corresponding set-packing constraint q k=1 x i k + q k=1 1 − x j k ≤ 1. Here, we are less interested in adding set-packing constraints to our formulations, as they are already used in modern MILP solvers. Instead, we focus on the structural implications of set-packing constraints consisting of positive literals x i on the variables p and robustness constraints p i + z ≥ x i . To ease notation, we call a subset Q ⊆ [n] a clique if the variables x i i ∈ Q form a clique in the conflict graph. The following proposition shows that we can use a partitioning Q of [n] into cliques to obtain a stronger reformulation of ROB in a smaller variable space.

Proposition 2 Let Q be a partitioning of [n] into cliques. Then the problem
is a reformulation in a different variable space of ROB that is at least as strong as P ROB .
Proof First, note that if p Q = i∈Q p i for all Q ∈ Q holds then (x, p, z) and x, p , z have the same objective value for their respective problems. Hence, in order to show v (ROB) = v (ROB (Q)), we construct corresponding solutions fulfilling this property.
Let (x, p, z) be a solution to ROB and consider x, p , z with p ∈ R Q ≥0 such that p Q = i∈Q p i . For all cliques Q ∈ Q, we have i∈Q x i ≤ 1 and thus there exists an index j ∈ Q such that x i = 0 for all i ∈ Q\ { j}. It follows It remains to show that every x, p , z ∈ P ROB (Q) ∩ Z n × R |Q|+1 has a corresponding solution φ x, p , z ∈ P ROB ∩ Z n × R n+1 of the same cost. Note that such a mapping φ : P ROB (Q) → R 2n+1 already implies v (ROB (Q)) ≥ v (ROB), and thus v (ROB) = v (ROB (Q)). We define the image of x, p , z ∈ P ROB (Q) as φ x, p , z = (x, p, z) and consider two different cases for the definition of p ∈ R n .
For cliques Q ∈ Q with j∈Qĉ j x j > 0, we define p i =ĉ Then p i + z ≥ĉ i x i holds, since we have For cliques Q ∈ Q with j∈Qĉ j x j = 0, we can choose p i arbitrarily as long as p Q = j∈Q p j , since p i + z ≥ 0 =ĉ i x i holds for any p i ≥ 0. This shows not only that φ P ROB (Q) ∩ Z n × R |Q|+1 ⊆ P ROB ∩ Z n × R n+1 holds, but also proves the strength of ROB (Q), because we did not use the integrality of x and thus have φ P ROB (Q) ⊆ P ROB .
Reconsider Example 1 from Sect. 2 to see that reformulation ROB (Q) is not only equal, but actually stronger. In the example, [n] is a clique and we thus have n . As mentioned in Sect. 3, the improvement of ROB can also be applied to ROB S (Z ). Given a clique partitioning Q of [n] we obtain the stronger reformulation Obviously, in order to obtain these strong reformulations, we first have to compute a conflict graph and a clique partitioning Q of [n]. Ideally, this partitioning contains few cliques that are as large as possible. However, finding a partitioning of minimum cardinality is equivalent to computing a minimum clique cover, which was shown to be N P-hard by Karp [28]. Moreover, building the whole conflict graph itself is also N P-hard [18]. Consequently, we have to restrict ourselves to a subgraph of the whole conflict graph. If our algorithm was natively implemented in a MILP solver, we could use the conflict graph that is computed during the solver's preprocessing without spending additional time searching for conflicts. Unfortunately, we cannot access the conflict graph in Gurobi [26], the solver we use for our implementation. Thus, we implement our own heuristics in which we check for each constraint of the nominal problem whether it implies conflicts between variables. Afterwards, we use these conflicts to partition [n] greedily into cliques. As the construction of conflict graphs and clique partitionings are not the focus of this paper, we refer to Appendix A for a detailed description of our implementation. For related work on the construction and handling of conflict graphs, we refer to Achterberg et al. [1], Atamtürk et al. [6], as well as Brito and Santos [18]. Note that our approach of aggregating constraints and variables depends on the variable z being shared by all constraints p i + z ≥ x i . Atamtürk et al. [7] propose for the mixed vertex packing problem a similar approach for aggregating constraints containing conflicting binary variables and a common continuous variable. Their mixed clique inequalities are analogous to our clique inequalities and their strengthened star inequalities can be adapted for generalizing these. For now, we leave the adaptation for future research and stick to using clique inequalities depending on clique partitionings, as we otherwise cannot benefit from the reduced number of variables. We will see in our computational study in Sect. 8 that using clique partitionings already yields a substantially stronger reformulation for many instances and improves the performance of our branch and bound algorithm. Before describing the branch and bound algorithm in detail in Sect. 7, we further establish some theoretical background in the next section by characterizing optimal solutions of ROB. Note that although we solve ROB S (Z , Q) in practice, for the sake of simplicity, we mostly refer to the equivalent problem ROB (Z ) in the remainder of this paper and only refer to ROB S (Z , Q) when necessary, e.g., when considering its linear relaxation or the strength of the formulation P S (Z , Q).

Characterization of optimal values for p and z
The central idea of our branch and bound algorithm for solving ROB is to restrict the value of z and trying to find an optimal corresponding nominal solution x ∈ P NOM . In this section, however, we want to consider the opposite direction. Given a nominal solution x ∈ P NOM , what are the optimal values for p and z? The answer to this question will deepen our understanding of the structural properties of ROB and is of practical use in many ways. First, we will generalize the result of Lee and Kwon [33], who showed for Γ ∈ Z that there exists a subset Z ⊆ ĉ 0 , . . . ,ĉ n , with |Z | ≤ n−Γ 2 + 1, containing an optimal choice for z. This reduction is relevant for our branch and bound algorithm, as we only have to consider subsets Z ⊆ Z . Second, given a choice of z, we will be able to restrict our search for a corresponding nominal solution x ∈ P NOM to those for which the chosen z is optimal. We will extensively use this idea within our branch and bound algorithm, especially in Sect. 7.1 where we describe further dual bounding strategies. Third, as we prove the characterization for (potentially fractional) solutions within P BIL , we can compute for any x ∈ P NOM the corresponding objective value for the optimization problem over P BIL . This provides an upper bound on the optimal objective value over P BIL , which we compare to v R ROB S (Z , Q) in order to obtain an indicator of the strength of P S (Z , Q). We use this indicator in our branch and bound algorithm to decide whether ROB S (Z , Q) should be solved directly as an MILP or whether Z needs to be shrunk further, as explained in Sect. 7.5. The following theorem states the characterization of optimal values for p and z.
the optimal values satisfying (x, p, z) ∈ P BIL and minimizing Γ z + n i=1 p i . For integer solutions x ∈ P NOM , the theorem states that z should be large enough such that there are at most Γ indices i ∈ [n] with x i = 1 andĉ i > z. Otherwise, we could increase z while simultaneously decreasing p i for more than Γ indices, leading to an improvement of the objective value. Conversely, z should be small enough such that there exist at least Γ indices i ∈ [n] with x i = 1 andĉ i ≥ z. Otherwise, we could decrease z and would have to increase p i for less than Γ indices, also yielding an improvement of the objective value. Obviously, if Γ is so large that i∈ [n] x i < Γ holds then we need to choose z as small as possible, i.e., z = 0.
Before proving Theorem 2, we characterize the bounds z (x) and z (x) in an additional way. The proof of the following lemma can be found in Appendix B.
Using the above lemma, we are able to prove Theorem 2.

Proof of Theorem 2
First, note that the interval z (x) , z (x) is well-defined, since i∈[n]:ĉ i >z x i ≤ Γ is a weaker requirement than i∈[n]:ĉ i >z x i < Γ and we thus is optimal for a given x and z, as we minimize and have p i ≥ ĉ i − z x i and p i ≥ 0 for all (x, p, z) ∈ P BIL . Now, let z ≥ z (x) and consider another value z > z together with an appropriate p such that x, p , z ∈ P BIL . By definition of z (x), it holds i∈[n]:ĉ i >z Hence, the objective value is non-decreasing for z ≥ z (x). Moreover, for z (x) < ∞, we even have i∈[n]:ĉ i >z x i < Γ in the case of z = z (x) by Eq. (4). Then ( * ) is a proper inequality and it follows that all choices z > z (x) are non-optimal. Now, let z ≤ z (x) and consider z < z. This implies z (x) > 0 and together with the definition of z (x), we obtain i∈[n]:ĉ i ≥z

It follows
Therefore, the objective value is non-increasing for z ≤ z (x), which shows that all is again a proper inequality and all choices z < z (x) are non-optimal.
As already mentioned, Lee and Kwon [33] showed for Γ ∈ Z that the number of different values for z to be considered can be reduced from n + 1 to n−Γ 2 + 1. To see this, it is helpful to sort the deviationsĉ i . Therefore, for the remainder of this paper, we assume without loss of generality thatĉ 0 ≤ · · · ≤ĉ n holds. The first observation leading to the reduction of Lee and Kwon is that the values z ∈ ĉ n+1−Γ , . . . ,ĉ n are no better than the value z =ĉ n−Γ , i.e.,ĉ n−Γ ≥ z (x) for all solutions x ∈ P NOM . The second observation is that if the value z =ĉ i is optimal then z ∈ ĉ i−1 ,ĉ i+1 also contains an optimal choice. To put it in terms of Theorem 2: . . ,ĉ n−Γ contains an optimal choice for z. The following statement generalizes the first observation to Γ ∈ R ≥0 . Furthermore, both observations are strengthened by using conflicts and a clique partitioning, which we already compute to obtain the strengthened formulations from Sect. 5, to reduce the set Z .

Proposition 3 Let Q be a partitioning of [n] into cliques and q : [n] → Q be the mapping that assigns an index j ∈ [n] its corresponding clique Q
, E) be a conflict graph for ROB and Γ ∈ Z. Furthermore, let Z ⊆ ĉ 0 , . . . ,ĉ i max such thatĉ i max ∈ Z and for every i ∈ i max − 1 0 it holds Then there exists an optimal solution (x, p, z) to ROB with z ∈ Z .
A proof of the above proposition and an algorithm for computing a set Z meeting the required criteria can be found in Appendix C. Note that the second part of the proposition only holds for Γ ∈ Z. This is because the statement relies on the fact that forĉ i ∈ z (x) , z (x) and Γ ∈ Z, it also holdsĉ k ∈ z (x) , z (x) . However, for Γ / ∈ Z, we always have z (x) = z (x), which implies thatĉ i always needs to be contained in Z .
After paving the way with the theoretical results of the previous sections, we now describe the components of our branch and bound algorithm in detail in the next section.

The branch and bound algorithm
In the following sections, we will describe our approach for computing dual and primal bounds, our pruning rules as well as our node selection and branching strategies. A summary of the components, merged into one algorithm, is given in Sect. 7.6. An overview on different strategies regarding the components of branch and bound algorithms is provided by Morrison et al. [37].
For the remainder of this paper, Z ⊆ ĉ 0 , . . . ,ĉ n will be a set of possible values for z, as constructed by Algorithm 6 from Appendix C. To ease notation, we will refer to the considered subsets Z ⊆ Z as nodes in a rooted branching tree, where Z is the root node and Z is a child node of Z if it emerges directly via branching. Furthermore, we denote with N ⊆ 2 Z the set of active nodes, that are the not yet pruned leaves of our branching tree, which are still to be considered.

Dual bounding
The focus of this paper is primarily on the computation of strong dual bounds v (Z ). We already paved the way for these in the previous sections by introducing the strong reformulation ROB S (Z , Q) , yielding dual bounds v (Z ) = v R ROB S (Z , Q) . In the following, we show that we can obtain even better bounds by restricting ourselves to solutions fulfilling the optimality criterion in Theorem 2.

Deriving dual bounds from ROB (Z)
Imagine that we just solved a robust subproblem ROB (Z ), using the equivalent problem ROB S (Z , Q), and observed that the optimal objective value v (ROB (Z )) is significantly higher than the current primal bound v. Furthermore, imagine that there exists a yet to be considered value z in an active node Z ∈ N that is very close to one of the just considered values z ∈ Z . Note that the objective function of the nominal subproblem NOS (z), arising from fixing z, differs only slightly in its objective function gests that the objective value v NOS z is probably not too far from v (NOS (z)).
Since v (ROB (Z )) is higher than v and also a dual bound on v (NOS (z)), we might be able to prune z without considering ROB Z if we are able to carry over some information from NOS (z) to NOS z . In fact, Hansknecht et al. [27] showed that there exists a relation between the optimal solution values v (NOS (z)) for different values z.
Accordingly, in addition to the dual bound v Z for a node Z ∈ N , we can also maintain individual dual bounds v z on the optimal objective value v NOS z The dual bound for a node Z is then the combination of the linear relaxation value v R ROB S Z , Q and the minimum of all individual bounds min v z z ∈ Z , i.e., we have While this already strengthens the dual bounds in our branch and bound algorithm, we can improve the results of Hansknecht et al. even more by using the optimality criterion from Theorem 2 and the clique partitioning Q from Sect. 5. Since we are solely interested in optimal solutions to ROB, it is sufficient to only consider solutions to NOS z that fulfill the optimality criterion, i.e., solutions x with z ∈ z x , z x . If an optimal solution to NOS z does not fulfill this property then z is no optimal choice in the first place and can therefore be pruned. Accordingly, we establish an improved bound that is not a dual bound on v NOS z , but a dual bound on the objective value of all solutions to NOS z fulfilling the optimality criterion.
Let x be such a solution to NOS z with objective value v . Note that x is also a feasible solution to NOS (z) and let v ≥ v (NOS (z)) be the corresponding objective value.
on the objective value v . Note that the decrease by δ dec is taken into account in the estimation of Lemma 3, but the increase δ inc is not. Obviously, δ inc can be zero if we have x i = 0 for all i ∈ [n] withĉ i > z . However, if x fulfills the optimality criterion then we know from Theorem 2 that there exist at least Γ indices withĉ i ≥ z and x i = 1. Assuming that there do not exist Γ indices withĉ i = z , there must exist at least one i ∈ [n] withĉ i > z and x i = 1, yielding a positive lower bound on δ inc . Taking conflicts between variables x i into account, we might even deduce that there must exist some indices with x i = 1 and very highĉ i , which improves the bound on δ inc . Note that for z > z, Lemma 3 provides no bound on NOS z , although we can apply similar arguments to this case. Observe that Inequality (5) still holds, with δ inc ≤ 0 and δ dec < 0. Unfortunately, if we have x i = 1 for all i ∈ [n] withĉ i > z then δ inc < 0 might have a large absolute value, leading to a weak estimation. However, if x fulfills the optimality criterion then we know from Theorem 2 that there exist at most Γ indices withĉ i > z and x i = 1. From this, we can again deduce a lower bound on δ inc , which can also be improved by taking conflicts between variables x i into account.
Theorem 3 Let Q be a partitioning of [n] into cliques, z, z ∈ R ≥0 , and x be an arbitrary solution to NOS z of value v that satisfies z ∈ z (x) , z (x) . Then we have v ≥ v (NOS (z)) − δ z z , where the estimator δ z z is defined as Proof For z = 0, the statement follows from Lemma 3. Otherwise, we obtain an estimation as in Inequality (5) We obtain where the last equality holds since Q is a partitioning of [n]. Now, let 0 < z < z. Since z ≤ z x holds, Theorem 2 implies i∈[n]:ĉ i ≥z We obtain which concludes the proof.
The above statement now enables us not only to compute bounds for z > z, but also stronger bounds for 0 < z < z. Note that for z = 0, we have to use the dual bound from Lemma 3, since Theorem 2 provides no statement on the required structure of x in this case.
In our branch and bound algorithm, we use the estimators δ z z for all z < z and δ z z for z > z after solving ROB (Z ). Accordingly, we define for Z ⊆ Z the estimators The improved bounds v (ROB (Z )) − δ Z z come with the cost of a higher computational effort compared to the bounds from Lemma 3. However, the additional overhead is marginal, as we can solve the involved maximization problems in linear time and compute all estimators δ z z , or δ z z respectively, simultaneously. Algorithm 2 describes our approach for computing the estimators for a set Z ⊆ Z of remaining values z .
Algorithm 2: Procedure for computing estimators δ z z .
, and a corresponding mapping q : [n] → Q Output: Estimators δ z z for z < z and δ z z for z > z Set δ z z j = δ 25 return δ z and δ z We first compute δ z z for z ∈ z ∈ Z z > z (lines 1 to 9). For computing δ z z j , we consider all deviationsĉ k with z <ĉ k ≤ z j (line 4) and add the corresponding valueĉ k − z (line 8). Furthermore, we mark the clique q (k) containing k as considered by adding it to the set Q and we associate the clique q (k) with the index k by maintaining a mapping q −1 : Q → [n] (line 7). However, if q (k) is already contained within Q then we considered an index k = q −1 (q (k)) with q (k) = q k before k and counted the valueĉ k −z towards δ z z . Hence, eitherĉ k −z orĉ k −z has to be subtracted, as we only count the highest value per clique. Since we iterate over the deviations in a non-decreasing order, it holdsĉ k −z ≥ĉ k −z, which is why we subtractĉ k − z (line 6). Note that we do not have to consider all values ĉ k z <ĉ k ≤ z j for computing δ z z j if we already considered the values ĉ k z <ĉ k ≤ z j−1 for δ z z j−1 . Instead, we construct δ z z j on the basis of δ z z j−1 and only iterate The computation of δ z z for z ∈ z ∈ Z z < z is almost analogous (lines 10 to 24). The difference here is that we only consider up to Γ values z −ĉ k . Hence, we not only maintain the set Q and the mapping q −1 , but also a list containing the indices of currently added values z −ĉ k . The list is updated every time we subtract (line 18) or add (line 20) a value z −ĉ k . Furthermore, since we iterate reversely over ĉ k z j ≤ĉ k < z , the list is ordered non-decreasing with respect to z −ĉ k . Hence, before assigning δ z ĉ i j , we check whether L contains more than Γ elements and, if necessary, remove the first Γ − |L| indices together with their value z −ĉ k and their clique q (k) (lines 21 to 23).

Optimality-cuts
Consider a node Z ⊆ Z of our branching tree and assume that ( We know from Theorem 2 that it is needless to consider x for the subset Z , as there is a different set x is part of a globally optimal solution. Nevertheless, it is possible that (x, p, z) is an optimal solution to ROB (Z ), resulting in an unnecessarily weak dual bound v (Z ).
Using the following theorem, we are able to strengthen our formulations such that we and in the case of c > 0 also i∈[n]:ĉ i ≥c We first show that z (x) ≤ c holds if and only if x fulfills Inequality (6). We know from Theorem 2 that i∈[n]: and thus Inequality (6) due to x being binary. Additionally, x cannot fulfill Inequality (6) if we have c < z (x), as this contradicts the minimality in the definition of and thus Inequality (6). Additionally, x cannot fulfill Inequality (7) if z (x) < c holds, as this contradicts the maximality in the definition of z (x).
In our branch and bound algorithm, we add the above Inequalities (6) and (7), with c = z and c = z, as optimality-cuts to the formulation P S (Z , Q) when solving the corresponding linear problem. However, the optimality-cuts can cause several problems when added to a robust subproblem ROB (Z ), especially with respect to the dual bounds of the last section. Let ROB Z , c, c be the corresponding problem with added optimality-cuts for bounds c ≤ z and z ≤ c. Note that in the proof of Theorem 3, we require x , the solution to NOS z , to be feasible for NOS (z) in order to show that v (NOS (z)) − δ z z is a dual bound. Analogously, we require x to be a feasible solution to ROB Z , c, c in order to derive a dual bound from v ROB Z , c, c . That is, if x does not meet the optimality-cuts then we can not derive any dual bounds from v ROB Z , c, c . However, if z x , z x ∩ c, c = ∅ holds then x is according to Theorem 4 a feasible solution to ROB Z , c, c , leading to the following generalization of Theorem 3.

Corollary 1
Let Z ⊆ R ≥0 and c ≤ c with Z ⊆ c, c . Furthermore, let z ∈ c, c and x be an arbitrary solution to NOS z of value v satisfying z ∈ z x , z x .
Accordingly, there is a trade-off in the choice of c, c. On the one hand, the optimal objective value v ROB Z , c, c , and thus the derived dual bounds for other z ∈ c, c , increases if the bounds c, c are close together. On the other hand, we want to derive dual bounds for as many z as possible. Furthermore, the optimality-cuts can be hindering for finding good primal bounds. We resolve this trade-off by adding loose optimality-cuts, corresponding to wide bounds c, c, in the beginning and gradually strengthening them as we consider more robust subproblems. Let Z * ⊆ Z be the union of all nodes Z * ⊆ Z for which we already solved a robust subproblem ROB Z * , c * , c * and let Z = Z ∈N Z be the union of all active nodes. In our branch and bound algorithm, for a node Z ∈ N , we choose c, c ∈ Z as wide as possible around Z such that there exists no z * ∈ Z * in between, i.e., In order to see that it is not reasonable to expand the interval c, c , consider a value z ∈ Z \ c, c . By definition, we already considered a subproblem ROB Z * , c * , c * with z ∈ c * , c * for a node Z * containing a value z * with z < z * < z or z < z * < z . Since δ Z * z ≤ δ z * z < δ Z z holds, we have already computed a dual bound v z = v ROB Z * , c * , c * − δ Z * z that is probably better than a potential dual bound derived from ROB Z , c, c . Thus, expanding c, c tends to be useless for obtaining new dual bounds. Now, assume that there exists a nominal solution x with z (x) , z (x) ∩ c, c = ∅ such that (x, p, z) is feasible for ROB (Z ) and also defines an improving primal bound v. In this case, it would be beneficial to expand c, c such that x fulfills the optimality-cuts and we obtain a new incumbent. However, we have seen in the proof of Theorem 2 that the objective value of (x, p, z) is non-increasing for z ≤ z (x) and non-decreasing for z ≥ z (x) with the appropriate p = ĉ i − z + x i . Using the arguments from above, we should have already found a solution (x, p * , z * ) that is at least as good as (x, p, z) for a previous subproblem ROB Z * , c * , c * . Accordingly, expanding c, c is also uninteresting for obtaining new primal bounds.
In the next section, we show what else we can do except for choosing appropriate bounds c, c in order to guide the branch and bound algorithm in the search for primal bounds.

Primal bounding
We already stated in the introduction that the potentially large optimality gap of ROB can cause problems for MILP solvers when trying to compute feasible solutions.
Hence, we have to provide guidance for the solver in order to consistently obtain strong primal bounds. As the focus of this paper is on the robustness structures of ROB, and not the corresponding nominal problem NOM, we implement no heuristics that explicitly compute feasible solutions x to NOM. Nevertheless, our branch and bound algorithm naturally aids in the search for optimal solutions by quickly identifying nonpromising values of z. This allows us early on to focus on nodes Z ⊆ Z containing (nearly) optimal choices for z, for which solving ROB Z , c, c is much easier, using the equivalent problem ROB S Z , Q, c, c , and yields (nearly) optimal solutions to ROB.
Furthermore, even when considering ROB Z , c, c for a node Z ⊆ Z that contains no optimal choice for z, we can potentially derive good primal bounds or even optimal solutions to ROB. In many cases, an optimal solution (x, p, z) to ROB Z , c, c does not meet the optimality criterion z ∈ z (x) , z (x) , which leaves potential for improving the primal bound provided by v ROB Z , c, c . Since z (x) is easily computable, we can obtain a better primal bound v (x) provided by the solution value of x, p , z (x) , with p i = ĉ i − z (x) + x i . Moreover, we can not only compute v (x) for an optimal solution x to ROB Z , c, c , but any feasible solution the solver reports while solving ROB Z , c, c . This increases the chance of finding good primal bounds, as an improved sub-optimal solution may provide an even better bound than an optimal solution to ROB Z , c, c . We will see in our computational study that our branch and bound algorithm quickly finds optimal solutions to ROB, often while considering the very first robust subproblem. Additionally, the possibility to derive strong primal bounds from sub-optimal solutions, which may be found early on while solving ROB Z , c, c , will be relevant for our pruning strategy in the next section.

Pruning
In theory, a problem is solved to optimality if the primal bound v is equal to a proven dual bound v. In practice, however, it is neither always necessary to prove v = v, nor is it always possible due to numerical issues. Instead, one considers a problem to be solved if v is sufficiently close to v, that is, it either holds v − v ≤ t abs or v−v |v| ≤ t rel , where t abs > 0 is the absolute tolerance and t rel > 0 is the relative tolerance. The concept of "sufficiently solved" problems is also applied to the pruning of nodes Z ⊆ Z within our branching tree. More specifically, we prune Z not only if v (Z ) ≥ v holds, but as soon as we In our computational study, we choose t abs = 10 −10 and t rel = 10 −4 , which are the default tolerances used by Gurobi [26]. Note that for v = 0 and v > v, the relative gap v−v |v| is defined to be ∞. If v ≥ v = 0 holds then the relative gap does not matter, since we have v − v ≤ t abs . To simplify notation, we define prn v, v = 1 if the dual and primal bounds v, v are strong enough for pruning and prn v, v = 0 otherwise. Recall that the dual bound v (Z ) for Z ⊆ Z is the maximum of the linear relaxation value v R ROB S Z , Q, z, z and the worst individual bound min v (z) z ∈ Z from Sect. 7.1.1. Even if v (Z ) is too weak for pruning, i.e., prn v (Z ) , v = 0, it may hold prn v (z) , v = 1 for a value z ∈ Z . Therefore, we apply a further pruning step in addition to the pruning of the whole node Z . Every time we consider a node Z , we check for all z ∈ Z whether z can be pruned according to its individual dual bound v (z). This is beneficial, as we obtain a stronger formulation for the resulting subset of Z . Furthermore, before solving a robust subproblem ROB Z , c, c , we check for all remaining z ∈ Z ∈N Z whether prn v z , v = 1 holds, so that the bounds c, c, as chosen in Sect. 7.1.2, are as narrow as possible.
Once we consider a robust subproblem ROB Z , c, c , we let the MILP solver manage the pruning itself, as the otherwise necessary interference into its solving process would lead to a performance degradation. However, we can monitor the best known dual bound v ROB Z , c, c and terminate the subproblem as soon as we have prn v ROB Z , c, c , v = 1. This is especially important for robust subproblems ROB Z , c, c corresponding to nodes Z containing values that are far from being optimal. In this case, we are usually aware of a primal bound v that is substantially smaller than the optimal solution value v ROB Z , c, c , allowing for a fast termination. Such a primal bound can either come from a previously considered robust subproblem or from a solution to ROB Z , c, c that we improved as described in the previous section.
Unfortunately, terminating ROB Z , c, c prematurely is problematic regarding the dual bounds v ROB Z , c, c − δ Z z computed in Sect. 7.1.1. Note that in practice, we do not necessarily know the optimal solution value v ROB Z , c, c and thus use the best known dual bound v ROB Z , c, c instead. Hence, there is a trade-off in saving time by terminating ROB Z , c, c early and generating strong dual bounds v ROB Z , c, c − δ Z z . We resolve this trade-off by computing the estimators δ Z z before solving ROB Z , c, c and constantly evaluating whether improving v ROB Z , c, c can potentially lead to the pruning of additional values z . Let Z ∩ c, c be the remaining values of z for which we computed the estimators δ Z z . Furthermore, let v ROB Z , c, c be the currently best known primal bound for ROB Z , c, c . For evaluating whether z ∈ Z ∩ c, c can potentially be pruned, we consider three different cases. If Case 1 applies then z is irrelevant to the question whether we should terminate ROB Z , c, c early, as it will be pruned anyway. In contrast, it is unlikely that z will be pruned if Case 3 applies. We have already stated in the previous section that our branch and bound algorithm usually finds (nearly) optimal solutions to ROB while solving the first robust subproblem. Hence, most of the time, the primal bound v will not be improved, leaving little chance for z to be pruned. Accordingly, in our implementation, we continue solving ROB Z , c, c as long as there exists a value z ∈ Z ∩ c, c for which Case 2 applies. However, since closing the gap between v ROB Z , c, c and v ROB Z , c, c can potentially waste much time, we use an additional termination criterion. In our implementation, we also terminate ROB Z , c, c if no z ∈ Z ∩ c, c switched to Case 1 within the last 10 s. That is, raising the dual bound did not lead to a pruning of an additional z within this time. Of course, this criterion is highly arbitrary, but it leads to an improvement of our algorithm's performance in our computational study. As heavy engineering is beyond the scope of this paper, we leave a detailed analysis of this component and its potential for future research.

Node selection
The node selection strategy determines the order in which we explore nodes within our branching tree, and thus directly impacts the number of nodes we consider before finding an optimal solution. Hence, a good node selection strategy is critical to the performance of any branch and bound algorithm, as finding an optimal (or at least good) solution quickly enables us to prune more efficiently. A review of different strategies in the context of mixed integer programming is given by Linderoth and Savelsbergh [34]. A survey on machine learning for node selection is given by Lodi and Zarpellon [35].
Two basic strategies, from which many other strategies emerge as a combination, are depth-first and best-first search. Depth-first search is based on the last-in-firstout principle and thus follows a path down the branching tree until a prunable node is reached. In contrast, best-first search ranks the nodes of the branching tree by assigning a value to each node and always picking a node with the best value. Here, we consider the case where the ranking value is equal to the node's dual bound. In this case, bestfirst search is also called best-bound search. Naturally, both strategies, depth-first and best-bound search, have advantages and disadvantages, as discussed by Linderoth and Savelsbergh [34]. An advantage of depth-first search is that it requires less memory, as the number of active nodes |N | in the branching tree is relatively small. It also allows for a fast reoptimization after branching, since the optimal dual solution to the parent node's subproblem is readily available to warm start the, typically similar, subproblem of the child node. Furthermore, depth-first search usually finds feasible solutions quickly, as integer feasible solutions tend to be located deep in the branching tree, where many variables are fixed due to branching. An obvious drawback, however, is that depth-first search may explore many unnecessary nodes and can get stuck in unpromising subtrees if the current primal bound is far from the optimal solution value. In contrast, best-bound search tends to minimize the number of nodes in the branching tree. This is because best-bound search will never select a node whose dual bound is worse than the optimal solution value. However, the drawback of best-bound search is that it may require more memory, as the number of active nodes in the branching tree grows large if there exist many nodes with similar bounds. This can also prevent the algorithm from finding feasible solutions early, since deeper levels of the branching tree are explored late. Furthermore, the reoptimization is hindered, as sequentially considered subproblems are less related compared to the depth-first search.
The strategy for our branch and bound algorithm can be seen as a hybrid of depthfirst and best-bound search. Note that our algorithm switches back and forth between two phases. In phase one, we branch the set Z into subsets Z and obtain dual bounds v R ROB S Z , Q, z, z from solving linear subproblems. In phase two, we stick to a node Z ⊆ Z and solve the robust subproblem ROB S Z , Q, c, c . Phase two can be seen as a leaning towards depth-first search, since we focus on the chosen values in Z until the problem ROB S Z , Q, c, c is either solved to optimality or terminated as described in the previous section. Since ROB S Z , Q, c, c is potentially a hard problem, it would be beneficial to only solve it for promising nodes Z , presumably leading to good solutions. We use the dual bound v (Z ) of a node Z ⊆ Z as an indicator for the node's potential to contain good solutions (x, p, z) with z ∈ Z , and thus perform a best-bound search in phase one. In detail, for the set of active tree nodes N , we always process a node Z ∈ N for which the current dual bound v (Z ) is minimum among all nodes, i.e., Z ∈ argmin v (Z ) Z ∈ N .
Fortunately, the drawbacks of best-bound search described above are not critical in our case. The number of active nodes is at most |N | ≤ |Z | ≤ n−Γ 2 + 1. Hence, memory consumption should be no limiting factor in phase one. This also allows us to store a solution basis for each node to warm start the simplex algorithm after branching. However, while warm starting usually accelerates the solving process, we observed that it leads to less consistent results in our computational study, as it disables Gurobi's LP presolve [26]. Therefore, we do not consider warm starts for the evaluation of our branch and bound approach in Sect. 8.

Branching
Much research has been devoted to the question of how to branch efficiently in integer linear programming, see, e.g., Achterberg et al. [2] or Linderoth and Savelsbergh [34]. However, the main question that is addressed there is on which integer infeasible variable to branch. Obviously, this question is uninteresting in our case, since we solely branch on the variable z and hand the robust subproblems to the chosen MILP solver, which manages the branching on its own. Instead, we have to address the question how to divide a node Z ⊆ Z so that the branching is efficient. Furthermore, we want to discuss how to decide whether a node Z should be branched at all or whether we solve ROB S Z , Q, c, c directly as an MILP.
To answer the latter, let (x, p, z) ∈ P Z , Q, z, z be an optimal solution to the linear relaxation of ROB Z , Q, z, z . Here, we consider ROB Z , Q, z, z instead of the equivalent ROB S Z , Q, z, z for simplicity. Since x meets the optimality-cuts, there exists a value z ∈ z, z ∩ z (x) , z (x) . The bilinear solution x, p , z ∈ P BIL , with p i = ĉ i − z + x i , provides an upper bound on the optimal objective value over all solutions in P BIL fulfilling the optimality-cuts for c = z and c = z. This upper bound is easily computable, as we have  ((x, p, z)) and v x, p , z are nearly identical. Since P BIL is the strongest possible formulation for ROB, there is not much potential for improving the integrality gap of ROB Z , Q, z, z via branching. While this does not necessarily imply for c, c z, z that the integrality gap of ROB Z , Q, c, c is also small enough, we use the relation between the objective values v ((x, p, z)) and v x, p , z as an indicator and stop branching Z once they are sufficiently close to each other. In our implementation, we consider the two values to be suf-ficiently close, if their gap is in the relative tolerance or absolute tolerance, that is prn v ((x, p, z)) , v x, p , z = 1, as defined in Sect. 7.3. However, we do not solve ROB S Z , Q, c, c right away, but first reinsert the node Z into the set of active nodes N and mark Z to be considered for a robust subproblem by storing a value sol (Z ) = 1. This is because Z was not selected with respect to the just computed dual bound v R ROB S Z , Q, z, z , but a dual bound based on the linear relaxation value of a parent node. Once Z is chosen again with respect to its new (potentially significantly improved) dual bound, we solve the robust subproblem ROB S Z , Q, c, c directly as an MILP. Now, assume that we have decided otherwise and want to branch the node Z ⊆ Z further into subnodes Z 1 , Z 2 . Obviously, Z 1 , Z 2 should form "intervals", that is z 1 , z 1 ∩ z 2 , z 2 = ∅, as otherwise, the bounds on z would be unnecessarily wide, leading to weaker formulations. Hence, we search for a branching-point θ ∈ z, z defining Z 1 = z ∈ Z z ≤ θ and Z 2 = z ∈ Z z > θ . Another desirable property of Z 1 , Z 2 would be that the computed optimal solution (x, p, z) ∈ P Z , Q, z, z is neither contained in P Z 1 , Q, z 1 , z 1 , nor in P Z 2 , Q, z 2 , z 2 . We can achieve this by choosing θ = z. First, note that θ < max (Z ) holds, since we did not stop branching and thus have (x, p, z) / ∈ P BIL , which implies z < max (Z ) due to Proposition 1. Furthermore, if z / ∈ Z holds then it is trivial that (x, p, z) is not feasible for any child node. In the case of z ∈ Z , we have z = z 1 , and thus (x, p, z) ∈ P Z 1 , Q, z 1 , z 1 would again imply (x, p, z) ∈ P BIL due to Proposition 1.
Unfortunately, the computed value z is in practice often near to one of the bounds z, z, leading to an unbalanced branching, where the optimal linear relaxation value v R ROB Z i , Q, z i , z i for one child node rises significantly, while the other remains nearly unchanged. This problem is also observed in the context of spatial branch and bound, which is a common approach for solving non-linear optimization problems. In spatial branch and bound, a convex relaxation of the non-linear formulation is considered to obtain lower bounds on the optimal objective value. This relaxation is then strengthened via branching on (continuous) variables occurring in non-linear terms, similar to the branching we perform on z to obtain stronger relaxations P (Z ) of the bilinear formulation P BIL . A common choice for the branching-point in spatial branch and bound is a convex combination of the variable's value in the current solution and the middle point of the variable's domain [41]. In our case, this translates to choosing αz + (1 − α) z + z /2 with α ∈ [0, 1]. This value is then often projected into a subinterval to ensure that θ is not at the boundaries of its domain, i.e., θ ∈ z + β z − z , z − β z − z with β ∈ [0, 0.5]. In summary, the branching point is chosen as Obviously, the parameters α and β leave room for engineering and differ between solvers. For example, the solvers SCIP [22] and COUENNE [8] choose θ with default values α = 0.25 and β = 0.2, while ANTIGONE (α = 0.75, β = 0.1) and BARON (α = 0.7, β = 0.01) choose a significantly higher value for α, according to [41]. Note that β has actually no effect on θ for any of these choices. Again, we don't dive too deep into the engineering of our branch and bound algorithm in this paper and simply take a middle course by choosing α = 0.5 and β = 0. However, in some cases, this leads to a branching where (x, p, z) is still feasible for one of the child nodes. Hence, we check if the branching is effective, by evaluating for both child nodes Z 1 , Z 2 if (x, p, z) / ∈ P Z i , Q, z i , z i holds. If not, we update θ by choosing the middle value between z and θ . We do this until (x, p, z) is infeasible for both nodes, which is guaranteed to happen, as our branching point converges to z.

Summary and implementation
In this section, we summarize the components of our branch and bound approach and merge them into one algorithm, as described in Algorithm 3. We also discuss some details regarding the implementation of the algorithm, which is written in Java and uses Gurobi [26] as an MILP and LP solver.
Algorithm 3 starts with the preparation for the branch and bound by computing the conflict graph and clique partitioning (line 1), which are then used to compute Z (line 2). Afterwards, the set of active nodes N is initialized with the root node Z , which is marked with sol (Z ) = 0, since we don't know whether ROB S Z , Q, c, c should be solved directly as an MILP (line 3). Afterwards, we initialize the primal and dual bounds (line 4), as well as the set of already considered values z ∈ Z for subproblems ROB S Z , Q, c, c with z ∈ Z (line 5). Note that we manage the whole branching tree outside of Gurobi, as it does not provide all callbacks to perform the necessary branching and node-selection [26].
After the initialization, our algorithm starts processing the nodes Z within the set of active nodes N until no node remains, and thus the problem is solved to optimality (line 6). In accordance with Sect. 7.4, we choose a node among those having the lowest dual bound v (Z ) (line 7). Afterwards, we check whether ROB S Z , Q, c, c is marked to be solved as an MILP (line 8). If so, we try to prune all remaining values z in active nodes (lines 9 and 10) in order to reduce Z as much as possible and allow for a choice of tighter bounds c, c for the optimality-cuts, as described in Sect. 7.1.2 (line 11). We then compute the estimators δ Z z for the remaining values in c, c (line 12), which we need for our termination strategy of ROB S Z , Q, c, c and also for updating the dual bounds v z . We then construct the problem ROB S Z , Q, c, c and pass it to the solver (line 13). While constructing the robust subproblem in practice, we have to avoid some pitfalls regarding numerical issues. Since the deviationsĉ i , and thus the values z ∈ Z , can be arbitrarily close to each other, it is possible that our subproblem contains constraints p Q + z ≥ i∈Q min ĉ i , z − z + x i for which the coefficients on the right-hand side are very small. Such constraints may not only be troublesome for the solver's performance, but also irrelevant in practice, since Gurobi considers per default all constraints that are violated by less than the feasibility tolerance 10 −6 as satisfied [26]. Hence, we only add the constraint p Q + z ≥ i∈Q min ĉ i , z − z + x i if min ĉ i , z − z > 10 −6 holds for at least one i ∈ Q. Once the subproblem is passed to the solver, we monitor the solution process via callbacks. Using these, the solver allows us to access the current best solution of the subproblem, and improve it as in Sect. 7.2, every time a new incumbent is found (line 14). Furthermore, we can query

Algorithm 3: The Branch and Bound Algorithm
Input: An instance of ROB Output: An optimal solution x * , p * , z * of value v 1 Compute conflict graph and clique partitioning Q, as in Appendix A 2 Compute possible optimal values Z ⊆ ĉ 0 , . . . ,ĉ n , as in Appendix Compute estimators δ Z z for all z ∈ Z ∩ c, c , as in Algorithm 2 13 Solve ROB S Z , Q, c, c with the following callbacks 14 Update v and x * , p * , z * for all solutions found, as in Sect. 7. Update if P S Z , Q, z, z = ∅ then 23 Compute optimal solution x, p , z ∈ P S Z , Q, z, z 24 if x, p , z is integer feasible then 25 Update v and x * , p * , z * Branch Z into Z 1 , Z 2 , as in Sect. 7.5

31
Set the current primal and dual bounds of the subproblem at every node of its branching tree in order to decide whether the subproblem can be terminated, as in Sect. 7.3 (line 15). After the subproblem is solved or terminated, we remove Z from the set of active nodes, add the values in Z to the set of already considered values Z * (line 16), and update dual bounds using the estimators δ Z z (lines 17 and 18). If we don't solve ROB S Z , Q, c, c directly as an MILP, we remove Z from the set of active nodes, as it will either be pruned or branched (line 20). In order to obtain a formulation that is as strong as possible, we try to prune all values z ∈ Z , using their individual dual bound v (z) (line 21). Afterwards, we let the solver solve the linear relaxation over P S Z , Q, z, z , also avoiding numerical issues as above. If there exists no solution to the linear relaxation then Z can be pruned and there is nothing left to do (line 22). Otherwise, we check whether the optimal solution x, p , z is integer feasible and potentially update the best known solution (x * , p * , z * ) (lines 23 to 25). If the solution is not integer feasible, we check whether Z can be pruned using the new dual bound v x, p , z (line 26). If this is not the case, then we decide whether Z should be branched further, as in Sect. 7.5 (line 27). If we decide to solve ROB S Z , Q, c, c directly, we reinsert Z into N and mark sol (Z ) = 1 (line 28). Otherwise, we branch Z into subsets Z 1 , Z 2 , as described in Sect. 7.5 (line 30), compute dual bounds for both child nodes (line 31), and insert them into the set of active nodes (line 32).
After Z is processed, either by solving the robust subproblem or its linear relaxation, we check whether the potentially obtained new primal and dual bounds allow for a pruning of some active nodes (line 33). If any active nodes remain, we continue with choosing the next node, otherwise we report the optimal solution (x * , p * , z * ).
Obviously, it will not always be possible to solve ROB to optimality within a given time limit. Hence, in practice, we also keep track of a dual bound v (ROB) in order to evaluate the quality of the best solution found. We do this by initializing v (ROB) = ∞ and updating it every time a node Z is pruned, using the corresponding dual bound, i.e., v (ROB) ← min v (ROB) , v (Z ) . After the algorithm is terminated, we update v (ROB) ← min v (ROB) , v (Z ) for all remaining active nodes Z ∈ N . Doing so, we make sure that the dual bound v (ROB) is equal to the minimum dual bound v (Z ) of all leaves Z of our branching tree.
The above summary of our branch and bound algorithm closes the theoretical part of this paper. In the next section, we perform an extensive computational study to evaluate the performance of our approach.

Computational study
In this section, we first carefully construct a set of hard robust problems, which we then use to experimentally compare our branch and bound algorithm with other approaches from the literature. Afterwards, we perform several tests on the robust knapsack problem to further demonstrate different trends and effects of our branch and bound algorithm. All experiments have been implemented in Java 11 and are performed on a single core of a Linux machine with an Intel ® Core TM i7-5930K CPU @ 3.50GHz, with 4 GB RAM reserved for each calculation. All LPs and MILPs are solved using Gurobi version 9.1.0 [26] in single thread mode and all other settings at default.
All implemented algorithms [23] and benchmark instances [24] are freely available online for further use.

Instance generation
In order to avoid a bias towards certain combinatorial problems, we generate robust instances on the basis of the diverse MIPLIB 2017 [25]. To transform a given nominal problem from the MIPLIB 2017 into a robust problem, we have to decide which objective coefficients c i are uncertain, that isĉ i > 0, how large the corresponding deviationsĉ i are, and what our robustness budget Γ is. In real-world applications, a coefficient is uncertain if, for example, it is the result of a forecast or a measurement. In [12,14,16], a coefficient is expected to be a result of such procedures, and thus uncertain, if it is an "ugly" number. In particular, integer values are considered "nonugly" and are assumed to be certain. However, since many MIPLIB instances only contain integer values, treating all integer objective coefficients as certain would leave us with few instances for our study. Therefore, we take a middle course by considering c i to be certain only if we have c i ∈ {−1, 0, 1} in the nominal instance, since it is unlikely that c i is the result of a forecast or measurement in this case. Coefficients c i ∈ {−1, 1} usually do not represent a numerical objective value for x i , but are for counting the number of chosen variables. Moreover, c i = 0 suggests that the choice of x i has no direct effect on the objective at all. Regarding the choice of the deviations, in [12,14,16,20] a fixed percentage of the absolute nominal coefficient is considered, i.e.,ĉ i = ξ |c i | for uncertain objective coefficients, where ξ ranges from from 0.01% to 2% across the different studies. Furthermore, the robustness budget Γ is chosen from a predefined set of arbitrarily fixed values [14,16,20].
Note that the aforementioned studies not only consider uncertain objective coefficients, but also uncertainties in the constraints. Bearing this in mind, the above choices may be appropriate in the respective settings for illustrating the effect of uncertainty [12,16] and the creation of sufficiently hard instances [14,20]. Nevertheless, we advocate for a different choice ofĉ i and Γ in order to construct instances with which we can test our algorithms to their limits. In the following, we study the impact ofĉ i and Γ on the integrality gap of ROB to evaluate how they should be chosen to obtain hard instances.
Just like in the literature, we define our deviationsĉ i = ξ i |c i | with respect to the nominal coefficients. However, the factor ξ i is chosen independently for each uncertain coefficient from an interval ξ , ξ . In order to see whether a strong correlation between c i and c i raises the integrality gap, we test different ranges ξ , ξ with a fixed middle value ξ + ξ /2. We also test much higher values ξ i , compared to the values chosen in [12,14,16,20], since large deviations result in more difficult problems and deviations of even more than 100% are relevant in practice, as observed in [29].
The choice of Γ must be made especially carefully. For a problem ROB, let n ROB be the number of uncertain variables contributing to an arbitrary optimal solution. If Γ = 0 or Γ ≥ n ROB holds then either none or all coefficients of the chosen uncertain variables deviate to their maximum. This not only leads the idea of budgeted robust optimization to absurdity, but also results in a relatively small integrality gap. Hence, Γ should be somewhere between 0 and n ROB to obtain a difficult instance. Accordingly, choosing Γ from a fixed set of values for all instances is not appropriate for our purpose, as, e.g., Γ = 100 may be suitable for large instances, while it is way too high for the smaller ones. Obviously, we cannot choose Γ with respect to n ROB , as we do not know the exact value in advance. Furthermore, in contrast to a practitioner solving a real problem, we have no insight into the structure of the diverse problems from the MIPLIB 2017. Hence, our best bet is to solve the nominal problem first, count the number n NOM of uncertain variables appearing in the obtained optimal solution, and choose Γ relative to n NOM . We will see in the following that for the choice Γ = γ n NOM , there is a correlation between γ and the integrality gap of ROB.
Before determining the integrality gap of ROB for different choices ofĉ i and Γ , we have to select the nominal instances to be transformed into robust problems. Naturally, not all instances from the MIPLIB 2017 are suitable for this transformation. Of the available 1065 instances, we consider the ones that are labeled to be feasible, have an objective function, and consist only of binary variables. Furthermore, we only consider instances that have the "easy" label, as we cannot expect to solve the robust counterpart of hard instances. After this first selection, we try to solve the remaining 123 nominal instances within one hour using Gurobi. Of the instances that could be solved, we select those whose computed optimal solution contains at least ten uncertain variables, i.e., n NOM ≥ 10. This ensures that variables with uncertain coefficients have an impact on the optimal solution. From the remaining instances, we also had to exclude pb-fit2d and supportcase11 due to numerical issues. After this final selection, we are left with 67 nominal instances for our computational study.
Here, we choose ξ , ξ ∈ {[10%, 90%] , [30%, 70%] , [45%, 55%] , {50%}}. For each resulting robust problem, we solve the linear relaxation, try to compute an optimal integer solution using our branch and bound algorithm, and determine the integrality gap. For a fair comparison of the integrality gap with respect to different choices of γ and ξ , ξ , we only consider the 44 underlying nominal instances for which we were able to compute an optimal solution for all combinations of γ and ξ , ξ . As we are interested in the impact of γ and ξ , ξ , and not of the nominal instance, we normalize the integrality gaps by dividing each gap by the maximum gap over all combinations of γ and ξ , ξ for the respective instance. Figure 1 shows for all combinations of γ and ξ , ξ the mean of the normalized integrality gaps over all considered instances.
For all choices of ξ , ξ , the mean integrality gap first increases monotonically in γ , peaks at latest at γ = 100% and decreases afterwards. This suggests that, at least for most problems, the maximum integrality gap is achieved for a value Γ somewhere in 0, n NOM . We take this into account by choosing γ ∈ {10%, 40%, 70%, 100%} in our computational study. Note that γ = 100% is most likely way too conservative for a practical problem. However, we are not interested in constructing meaningful Regarding the deviations, the integrality gap is higher for narrow intervals ξ , ξ , suggesting that a strong correlation betweenĉ i and c i results in hard robust instances. Although choosing ξ = ξ seems to be beneficial in this regard, we chose ξ = ξ for our computational study, since fixing ξ i may result in structural properties that lead to a biased performance of the tested algorithms. For example, Monaci and Pferschy [36] showed that an adaptation of the classical greedy heuristic for the binary knapsack problem has a better worst-case performance for the robust knapsack if max ξ i /ξ j i, j ∈ [n] is small. Moreover, our branch and bound algorithm would particularly benefit from choosing ξ = ξ , as this usually provides a smaller set Z of possible values for z. Therefore, we choose ξ , ξ = [45%, 55%] in our computational study and additionally take smaller and larger deviations into account by also considering ξ , ξ ∈ {[5%, 15%] [95%, 105%]}.

Impact of components of the branch and bound algorithm
Before comparing the branch and bound algorithm with approaches from literature, we first evaluate the different components described in the previous sections. More precisely, we disable components to test their impact on the performance, leading to the following variants of our branch and bound algorithm.

BnB
is our branch and bound algorithm as in Algorithm 3.

BnB-Clique
does not compute the conflict graph and cliques from Sect. 5.

BnB-Estimators
does not derive estimators from one ROB (Z ) to another as in Sect. 7.1.1.

BnB-CutLP
does not use optimality-cuts for solving linear relaxations as in Sect. 7.1.2.

BnB-CutMILP
does not use optimality-cuts for robust subproblems as in Sect. 7.1.2.

BnB-Cut
does not use optimality-cuts at all.

BnB-Primal
does not improve primal bounds as in Sect. 7.2.

BnB-Terminate
does not terminate robust subproblems as in Sect. 7.3, but solves them to optimality.

BnB-Branching
does not choose the branching point θ as in Sect. 7.3, but chooses θ = z.
Note that disabling some components can also have an effect on other components. The computation of cliques not only prevents us from using reformulation ROB (Q), but also worsens the filtering of Z and the estimators δ z z . Disabling estimators allows us to terminate robust subproblems ROB Z , c, c more aggressively, as raising the dual bound v ROB Z , c, c past the current primal bound v is no longer beneficial. Furthermore, disabling optimality-cuts for robust subproblems allows us to use estimators δ z z for z / ∈ c, c . We use the ten variants above to solve the 67·3·4 = 804 robust instances (67 nominal instances, 3 different ξ , ξ , 4 different γ ) constructed in the previous section within a time limit of 3600 s, including preprocessing, construction of subproblems, etc. Detailed results per instance and algorithm are provided in a supplementary electronic file.
The plots in Fig. 2 give an indicator of the performances on an aggregate level by showing for all variants the proportion of instances that could be solved within a specific number of seconds. Figure 2a suggests that disabling the filtering of the set Z of possible values for z barely makes a difference for our branch and bound algorithm. In contrast, disabling cliques or estimators heavily affects the algorithm's performance. After 3600 s, the default algorithm solves around 5% instances more than these two variants.
Surprisingly, Fig. 2b shows that disabling optimality-cuts (especially for linear programs) slightly improves the performance for the tested instances. A detailed look at the computational results shows that the impact of optimality-cuts can differ highly between instances. This is partially because the different variants solve different robust subproblems, whose complexity can vary significantly. However, there is also a synergy between our branching strategy, described in Sect. 7.5, and the disabling of optimality-cuts for linear relaxations. Although it seems unintuitive, the addition of optimality-cuts for linear relaxations often results in an increase in the number of nodes in our branching tree, which is partially due to the following reason. Consider a node Z ⊆ Z such that all solutions x obeying the optimality-cuts for z, z are far off from being optimal. Furthermore, let (x, p, z) ∈ P (Z , Q) be an optimal solution for the linear relaxation without optimality-cuts. We observed for many instances that in this case z ∈ z, z holds and thus the linear formulation is as strong as the bilinear formulation according to Proposition 1. Hence, we stop branching the node due to our branching strategy, and directly solve ROB S (Z , Q) as an MILP, which usually leads to a quick pruning of Z . In contrast, when adding optimality cuts, z usually lies in the inner of the interval z, z which often results in further unnecessary branching. This observation suggests that the current branching strategy has potential for further In the current state of the algorithm, it is reasonable to test whether using optimality-cuts is useful when solving a practical problem. Figure 2c shows that terminating robust subproblems prematurely improves the algorithm's performance and results in solving around 2% instances more. Choosing the branching point θ as a convex combination of the solution value z and the middle of the interval z + z /2 seems to have a marginal positive effect. The same holds for the improvement of the primal bound via computing optimal z for incumbent solutions, which suggests that our approach for selecting nodes Z ⊆ Z containing promising z is very effective. We already stated in Sect. 7.2 that most of the time, we already have a (nearly) optimal solution after solving the very first robust subproblem. For our test instances, the relative gap between the best known solution value after the first robust subproblem and the primal bound after 3600 s is below 10 −2 for 99% of all instances and even below 10 −4 for 87.8% of all instances. When disabling the improvement of incumbent solutions, this still holds for more than 94.5% (below 10 −2 ) and 69% (below 10 −4 ) of all instances respectively.
Another noteworthy setting is the one in which estimators and improvement of primal bounds are disabled together. This version of BnB still solves 76.7% of all instances, and thus outperforms all approaches from literature, as we will see in the next section. The setting is of particular interest, since it only relies on results that are also generalizable to uncertain constraints with budget uncertainty. Here, the j-th constraint i∈[n] a ji ≥ b j of the constraint matrix Ax ≥ b would become i∈[n] a ji − p ji − Γ j z j ≥ b with additional constraints z j + p ji ≥â ji x i for deviationsâ ji and a constraint specific budget Γ j . Since the additional constraints have the same structure as for the uncertain objective function, we can branch on the variables z j , use clique reformulations, filter possible values for z j , and add optimality cuts. Estimators and the improvement of primal bounds cannot be generalized, since these rely on the fact that a feasible solution for a fixed z has a corresponding feasible solution for a different z , which does not apply when fixing z j to different values. The observation that the generalizable results yield a well-performing branch and bound algorithm for uncertain objective functions suggests that our approach and the theoretical results in this paper might also be relevant for robust optimization with uncertain constraints.

Comparing algorithms from the literature
We now evaluate our branch and bound approach by comparing BnB-CutLP with the following eight algorithms.

ROB
is the MILP over the standard formulation P ROB . SEP is the cutting-plane approach separating scenarios from the uncertainty set, as described in [14].
The approaches ROB, SEP, and BS are widely known and studied, and can thus be considered as the current state-of-the-art approaches. In contrast, DnC has so far only been considered for robust shortest path problems and was not evaluated for general robust optimization problems [27]. To the best of our knowledge, we also present the first study that evaluates the reformulations RP1,...,RP4 on a set of instances based on real-world problems.
In our implementation, we slightly adapt RP2 and RP3 compared to [5]. Reformulation RP2 consists of an exponential number of valid inequalities that are separated in O n 2 by searching for negative weighted paths in an acyclic directed graph. Atamtürk shows that a subset of these inequalities is sufficient to define the convex hull of It is easy to see that the graph constructed in [5] can be modified by deleting some arcs, such that there exists a one-to-one correspondence between paths in the graph and inequalities defining the convex hull. We use this reduced graph for our implementation. Reformulation RP3 incorporates the valid inequalities of RP2 by adding n + 2 additional variables and O n 2 constraints. Similar to the graph for the separation problem some of these constraints can be omitted, resulting in a smaller formulation.  Figure 3 shows for all algorithms the proportion of instances that were solved in a specific number of seconds. It is evident that our branch and bound algorithm outperforms all existing approaches from the literature by far, solving 83.8% of all instances in 3600 s. Among all other algorithms, DnC solves the most instances, ending at 55.3% after 3600 s. In comparison, BnB-CutLP only needs 220 s to solve 55.3% of all instances. ROB solves less problems (53.1%) than DnC but is faster in the beginning, solving more problems in shorter time. SEP solves only 31.6% of all instances and thus performs significantly worse than ROB. Interestingly, this is in contrast to the findings of Bertsimas et al. [14], who observed no clear winner between these two approaches for robust problems with uncertain constraints. BS solves more instances (45.9%) than SEP but is still clearly worse than ROB and DnC, supporting our claim from the introduction that BS is not practical if the number of different deviations ĉ 0 , . . . ,ĉ n is large. Of the four reformulations of Atamtürk, RP2 is the only practicable one, solving 49.1% of all instances. This is because RP1, RP3, and RP4 are simply too large for most practical problems. RP1 exceeds the memory limit of 4 GB for 29.9% of all instances, RP3 for 20.9% and RP4 even for 36.1%. Even if the models can be build obeying the memory limit, they are most of the time still too large for Gurobi to solve them. RP1 was not even able to solve the linear relaxation of the root node for 5.8% of all instances. For RP3 and RP4, this was the case for 44.8% and 12.1% of all instances respectively.
Obviously, many of the robust instances are very difficult to solve. Thus, in practice, one might also be satisfied obtaining a nearly optimal solution. Figure 4 shows the proportion of instances that are solved up to a specified relative optimality gap within the time limit of 3600 s. Our branch and bound algorithm also clearly outperforms the other algorithms in this regard. BnB-CutLP solves 94% of all instances to the optimality gap of 1% and 98.8% to the gap of 10%. In contrast, ROB, which is the second best performing algorithm, only solves 78.9% of all instances to the optimality gap of 10%. Note that the line corresponding to BS is nearly horizontal, as the dual bound computed by BS equals the minimum dual bound over all nominal subproblems, and Fig. 4 Proportion of instances solved within a specific relative optimality gap for different approaches from the literature is thus negative infinity before all z ∈ ĉ 0 , . . . ,ĉ n have been considered. Therefore, we usually either solve an instance to the optimality gap of 10 −4 or report an infinite gap. The small increase in instances solved before 1% is due to an error of Gurobi occurring for robust instances emerging from the nominal instance neos-1516309. For these problems, Gurobi solves all nominal subproblems within the time limit, but reports for some a dual bound that is too low. For a fair comparison, we still consider these instances to be solved to optimality.
In order to evaluate the impact of the deviationsĉ i and the robustness budget Γ on the algorithms' performances, we report in Table 1 the proportion of instances solved for every combination of ξ , ξ and γ . Our branch and bound approach solves consistently for every setting more instances than any other algorithm and has a relatively stable performance for the different choices of ξ , ξ . This is in contrast to ROB, which performs significantly worse for higher deviations, as these weaken the formulation. ROB also performs worse for γ around 40% and 70%, which supports our claim from Sect. 8.1 that choosing Γ somewhere within 0, n NOM results in hard problems. For BnB-CutLP, the performance increases for higher γ , as the set of possibly optimal values Z for z decreases. Higher γ are also beneficial for BS, as for these, the optimal choice for z is smaller and thus found faster when iterating over ĉ 0 , . . . ,ĉ n . Hence, we find a good primal bound early on, which can be used to terminate the remaining nominal subproblems once their respective dual bounds are high enough.

Improving algorithms from literature
We close our computational study on the MILIB instances by showing that the theoretical results from this paper can also be used to significantly improve most of the algorithms considered in the previous section. An obviously improvement for ROB is to compute a partitioning into cliques, determineĉ i max , the highest possible optimal value for z as in Sect. 6, and solve the reformulation ROB S 0,ĉ i max , Q instead of the original one. In the following, we call this approach ROB+. For improving RP4, recall that the reformulation combines the nominal subproblems NOS (z) for all z ∈ ĉ 0 , . . . ,ĉ n into one problem. However, it is sufficient to define RP4 for the filtered set Z of possible optimal values for z, instead of the set of all deviations ĉ 0 , . . . ,ĉ n . An analogous reduction can be applied for the reformulation RP1, which is quite similar to RP4. In the following, we call the approaches using these reduced reformulations RP1+ and RP4+.
The algorithm that can be improved the most is the DnC of Hansknecht et al. [27]. DnC chooses specific values z ∈ ĉ 0 , . . . ,ĉ n , solves the corresponding nominal subproblems NOS (z), and computes dual bounds for other z ∈ ĉ 0 , . . . ,ĉ n on the basis of Lemma 3 until all z are either pruned or considered for a nominal subproblem. To improve DnC, we only consider values for z within the filtered set Z and use the estimators from Theorem 3 instead of the ones from Lemma 3. Furthermore, we apply optimality-cuts for the nominal subproblems, with c, c chosen analogously as in Sect. 7.1.2. We also improve incumbent solutions by computing the corresponding optimal z and terminate nominal subproblems prematurely, analogously to Sects. 7.2 and 7.3. The improved DnC is called DnC+ in the following.
While the approaches SEP, RP2, and RP3 cannot be improved using our theoretical results, BS could be enhanced similarly to DnC. However, we do not consider an improved version of BS, since it is essentially a strictly weaker algorithm compared to DnC.
We again report results per instance and algorithm in a supplementary electronic file and show aggregate results in Fig. 5. We see that computing a partitioning into cliques andĉ i max pays off, as the improved formulation used for ROB+ enables us to solve 57.2% instances, compared to 53.1% for ROB. The filtering of Z is also effective for reducing the size of the reformulations RP1 and RP4. RP1+ exceeds the memory limit for 19.4% of all instances, instead of 29.9%. For RP4+, this reduction is from 36.1% to 29.4%. Both approaches also solve more instances within the time limit. RP1+ solves 19.7% instead of 17.9%, while RP4+ even solves 20.5% instead of 11.9%. Nevertheless, both reformulations are still way too large for most problems and cannot compete with the other approaches. This is especially true in comparison with DnC+, which performs significantly better than DnC, solving 83% of all instances instead of 55.3%. In fact, DnC+ performs similar to our branch and bound approach. While  DnC+ solves more instances early on, BnB-CutLP solves slightly more instances within 3600s (83.8%). A glance at the proportion of instances solved within a specific relative optimality gap, shown in Fig. 6, indicates that our branch and bound is indeed slightly better than DnC+ in solving very hard instances, as BnB-CutLP still solves clearly more instances (94%) to the optimality gap of 1% than DnC+ (89.4%).
We close this section with an evaluation of the improvements applied to DnC+. We do so by disabling the components individually, analogously to Sect. 8.2. Figure 7a shows that disabling the filtering of Z and the computation of clique partitionings lead to a slight degradation in performance. Both variants solve 82.5% of all instances instead of 83%. While the effect of disabling the filtering is similar for BnB and DnC+, the partitioning into cliques is much more important for BnB than it is for DnC+. This is because DnC+ only uses the cliques for improving the filtering and the estimators of Theorem 3, but BnB also relies on the strengthened clique reformulation. Using the estimators from Lemma 3 instead of Theorem 3 significantly worsens the algorithm, solving only 78.9% of all instances. Figure 7b reveals that disabling the improvement of primal bounds, the termination of robust subproblems or optimality-cuts is also hindering. In contrast to BnB, which chooses nodes based on the linear relaxation, DnC+ selects many values for z that are not promising. Accordingly, improving the incumbent solutions for the corresponding subproblems and terminating them early is much more important to DnC+ compared to BnB. Disabling the improvement of primal bounds results in solving 81.5% of all instances. Disabling the termination of robust subproblems even leads to solving only 77.6% of the instances. Surprisingly, the addition of optimality-cuts has the largest impact on DnC+, although we observed that they slightly worsen BnB in the current implementation. With optimality-cuts disabled, we only solve 75.6% of all instances. This shows the potential of the optimality-cuts and raises hope that they can also be a helpful addition to our branch and bound approach with further engineering.

When to use branch and bound or divide and conquer
We have seen that both our branch and bound algorithm and our improved version of the divide and conquer perform exceptionally well compared to the other approaches from literature, with BnB-CutLP solving slightly more instances and DnC+ being faster on the easier ones. We close our computational study with a more detailed comparison of these algorithms, providing some guidance on which algorithm to use in different practical settings.
A major strength of our branch and bound is that we only need to solve very few robust subproblems ROB (Z ). Considering only the instances that were solved by both BnB-CutLP and DnC+, BnB-CutLP solves on average 7.9 subproblems, while DnC+solves 13.9 subproblems NOS (z). Furthermore, as already stated in Sect. 8.2, our branch and bound finds (nearly) optimal solutions within the very first subproblem, resulting in a fast termination of the following ones. Despite this advantage, DnC+ solves many instances faster than BnB-CutLP. The reason for this is that solving the linear relaxations over P S (Z , Q) can require surprisingly much time. In fact, considering only the solved instances, BnB-CutLP spends on average 33.1% of its time solving LPs, although we only consider 45.9 of these on average. To further demonstrate the relevance of the LPs for the computation time and better understand the underlying effects, we provide an additional computational study on the robust knapsack problem.
In order to construct hard instances of the robust knapsack problem for a number of items n ∈ N, we first select independent and uniformly distributed weights a i ∈ [10,000] for all i ∈ [n]. Afterwards, we choose profits c i = ζ i a i , where ζ i ∈ [0.95, 1.05] is an independent and uniformly distributed random vari-able. This choice is based on the observation that the profits and weights should be correlated, as otherwise many items can be excluded easily due to domination [39]. Similar to Sect. 8.1, we choose deviationsĉ i = ξ i c i , where the random variable ξ i ∈ [0.45, 0.55] is again independent and uniformly distributed. The capacity b and robustness budget Γ depend on the number of items n. We choose b = min n 2 , 1000 · 5000, i.e., we can store min n 2 , 1000 items of average weight into the knapsack. Note that we choose the minimum of n 2 and 1000 such that the capacity doesn't become too large for the sake of numerical stability. Finally, we choose Γ = min{ n 2 ,1000} 2 , i.e., approximately half of the included items will deviate from their nominal weight, which results in hard instances according to our observations regarding Table 1.
In Table 2, we show computational results for different numbers of items ranging from n = 50 to n = 1,000, 000. For every n, we generate 10 different instances to test the algorithms ROB, ROB+, DnC, DnC+, and BnB-CutLP. As before, all algorithms are given a time limit of 3600 s. For ROB and ROB+, Table 2 shows the number of instances that could not be solved to optimality within the time limit and the mean time in seconds used for the instances that could be solved to optimality. The generated instances are apparently quite hard, as ROB and ROB+ already fail to solve nine out of ten instances with n = 100 items. Note that ROB+ has a noticeable advantage over ROB although there exist no conflicts between items which ROB+ could make use of. The advantage depends solely on filtering the possible values Z and solving the problem ROB (Z ).
Still, the performance of ROB+ is not even close to the performance of the other three algorithms. As DnC, DnC+, and BnB-CutLP are able to solve all instances, we omit the timeout column for these algorithms in Table 2. The mean computation time in seconds reveals that DnC+ performs especially well. This is partially due to the nominal knapsack subproblems NOS (z) being quite easy to solve. Furthermore, DnC+ considers on average only 25.3 subproblems for n = 1,000,000 items, compared to an average of 165.6 subproblems that are solved by DnC.
BnB-CutLP even solves only one robust subproblem ROB (Z ) for each knapsack instance and on average 24.2 linear relaxations over P S (Z ) for n = 1,000,000 items. Nevertheless, its performance degrades for higher n as solving the LPs becomes more and more challenging. For n = 1,000,000, BnB-CutLP spends on average 93.17% of its time solving linear relaxations. This is due to the up to n + 1 additional variables p , z and n additional constraints Interestingly, the very first LP corresponding to the root node Z = Z is by far the hardest one and requires on average 1062 s for n = 1, 000, 000 items. After branching on z, the LPs become easier, since the tighter bounds z, z lead to many constraints p i + z ≥ min ĉ i , z − z + x i becoming redundant for z ≥ĉ i or at least less impactful for small z − z. In fact, the single robust subproblem ROB (Z ) that we solve for each knapsack instance only requires 74.8 s on average for n = 1,000,000, and is thus significantly easier than the LP of the root node. These observations suggest that our branch and bound could benefit drastically from further research on how to solve the linear relaxation of the robust problem. As of now, the branch and bound seems to have an advantage for problems where the

Conclusion
In this paper, we considered robust binary optimization problems with budget uncertainty in the objective, which are tractable in theory, but often hard to solve in practice. We identified that the standard formulation for solving these problems is weak and that the variable z is critical in this regard. To address this issue, we proposed a compact bilinear formulation that is as strong as theoretically possible. To benefit from the formulation's strength in practice, we derived a strong linear formulation for the case where z is bounded. Building upon this linear formulation and many structural properties of the robust problem, we proposed a branch and bound algorithm in which we obtain bounds on z via branching.
To test the algorithms strength, we compared it to other sophisticated algorithms from the literature within a comprehensive computational study. For this, we carefully generated a set of hard robust instances based on real-world problems from the MIPLIB 2017, which we made available online [24] together with the implemented algorithms [23]. The computational results show that our algorithm outperforms all existing approaches by far. Furthermore, we showed that the structural properties shown in this paper can be used to substantially improve the divide and conquer approach by Hansknecht et al. [27], providing us with two potent algorithms for robust optimization.
For future research, the different components of our approach leave much room for engineering to further enhance the branch and bound algorithm. Additionally, it would be interesting to test our approach for robust optimization with uncertain constraints. We already mentioned that most theoretical results can be generalized to this case and showed that our algorithm performs well relying only on these general results.
Code availibility The branch and bound algorithm and all other tested approaches have been implemented in Java and are available on GitHub, see [23].

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A Implementation of conflict graph and clique partitioning
We restrict ourselves here to detecting conflicts between positive literals x i of uncertain variables, as these are the only ones we need for our clique reformulation. In order to determine if there exists a conflict between two uncertain variables x i 1 and x i 2 , we have to evaluate whether x ∈ X x i 1 = x i 2 = 1 is empty. Since this is in general N P-hard, we resort to a simpler problem. Let n i=1 a ji x i ≤ b j be a row of the constraint matrix Ax ≤ b and let l i ≤ x i ≤ u i be bounds on variable x i for all i ∈ [n]. If min n i=1 a ji x i x ∈ [l, u] , x i 1 = x i 2 = 1 > b j (8) holds then x i 1 and x i 2 cannot both be equal to one and we can add an edge x i 1 , x i 2 to the conflict graph. In order to evaluate Inequality (8)  a ji x i x ∈ [l, u] + max a ji 1 , 0 + max a ji 2 , 0 , and thus it is sufficient to evaluate whether b j < max a ji 1 , 0 + max a ji 2 , 0 holds. In order to find conflicting variables x i 1 , x i 2 having this property, we use Algorithm 4, which is similar to the one presented by Brito and Santos [18]. Instead of performing a pairwise evaluation, Algorithm 4 directly searches for subsets h ⊆ {x 1 , . . . , x n } forming a clique in the conflict graph. This is not only faster than a pairwise evaluation, but also beneficial for storing the conflict graph. Assume that a row of the constraint matrix implies conflicts of a large clique in the conflict graph. Atamtürk et al. [6] already mentioned that storing these conflicts as a set of edges or using adjacency lists consumes too much memory. Instead, one should use a separate structure to store these conflicts. Here, we store conflicts implied by cliques h with |h| = 2 in adjacency lists, while cliques with |h| > 2 are stored directly as a list of variables. To guarantee fast access to the cliques, we maintain for each variable a list of references to cliques in which it is contained. Thus, the memory requirement of adding a clique h is only O (|h|) instead of O |h| 2 . In the following, we think of the conflict graph as a hyper graph G = (V , H ), with V = {x 1 , . . . , x n } being the variables and H ⊆ 2 V being a set of hyper edges representing cliques in the conflict graph. Let l * = argmin l ∈ [k − 1] max a ji l , 0 + max a ji l+1 , 0 > b j 10 Add H ← H ∪ x i l * , . . . , x i k 11 for p ∈ l * − 1 do 12 Let l = argmin l ∈ [k] max a ji p , 0 + max a ji l , 0 > b j 13 Add H ← H ∪ x i p , x i l , . . . , x i k Algorithm 4 constructs this hyper graph by iterating over all constraints n i=1 a ji x i ≤ b j . For each constraint, we first compute b j (line 3) and the highest coefficient a * occurring in the constraint (line 4). If we have 2 max {a * , 0} ≤ b j then the constraint will imply no conflicts, which allows for a fast skipping of uninteresting constraints (line 5). Otherwise, we construct a list of candidates for which we have max {a * , 0} + max a ji l , 0 > b j , that is the variable x i l either defines a * or has a conflict with the variable defining a * (line 6). If the list consists of more than one variable, i.e., the variable defining a * , then we have found a conflict (line 7). We sort the list of candidates (line 8) and find the lowest index l * such that x i l * and x i l * +1 are in conflict (line 9). Due to the ordering of the list, all variables x i l * , . . . , x i k are in conflict with each other and can thus be added as a hyper edge to the conflict graph (line 10). For all variables x i p that do not belong to the hyper edge, we search for the lowest index l such that x i p and x i l are in conflict (line 12). Such an index exists, since x i p is in conflict with x i k . Again, due to the ordering of the list, the variables x i p , x i l , . . . , x i k are in conflict and can be added to our graph (line 13).
Note that the hyper edge added in line 13 consists of a subset of the first hyper edge added to the constraint, starting at an index l , and an additional variable x i p . Brito and Santos [18] pointed out that this can be used to store the graph more efficiently. Instead of storing the whole hyper edge x i p , x i l , . . . , x i k , we only store the variable x i p , a reference to the hyper edge x i l * , . . . , x i k and the starting index l , from which we can reconstruct the hyper edge x i p , x i l , . . . , x i k . After constructing the conflict graph, we use Algorithm 5 to compute a partitioning of V into cliques. Algorithm 5 is a greedy heuristic that builds on the idea that a minimum clique cover consists without loss of generality only of cliques that are maximal with respect to inclusion. We first initialize a set of cliques Q = ∅ and remaining nodes V = V (line 1). Until there are nodes left for partitioning (line 2), we iteratively select an arbitrary remaining node v ∈ V (line 3) and construct a maximal clique Q ⊆ V containing v . We initialize Q as the largest hyper edge on the remaining nodes h ∩V containing v (line 4 and 5). This speeds up the construction of Q, since we do not have to check whether the variables in h are in conflict. We then expand Q by iteratively adding remaining nodes that are contained in the neighborhood of all current clique members (lines 6 to 10). Afterwards, we add this clique to our partitioning (line 11), remove the contained nodes from the remaining nodes (line 12), and proceed until no nodes are left.

B Proof of Lemma 2
Lemma 2 For x ∈ R n , we have contradicting the definition of z (x).
We now prove Equality (4 contradicting the definition of z (x).

C Filtering possible values for z
We start with proving the proposition.

Proposition 3 Let Q be a partitioning of [n] into cliques and q :
[n] → Q be the mapping that assigns an index j ∈ [n] its corresponding clique Q ∈ Q with j ∈ Q. For it holdsĉ i max ≥ z (x) for all solutions x ∈ P NOM ∩ {0, 1} n and there exists an optimal solution (x, p, z) to ROB with z ∈ ĉ 0 , . . . ,ĉ i max . Now, let G = ([n] , E) be a conflict graph for ROB and Γ ∈ Z. Furthermore, let Z ⊆ ĉ 0 , . . . ,ĉ i max such thatĉ i max ∈ Z and for every i ∈ i max − 1 0 it holds -ĉ i ∈ Z or -there exists an index k < i withĉ k ∈ Z and for all j ∈ {k + 1, . . . , i − 1} there exists an edge { j, i} ∈ E in the conflict graph G.
The above statement already determines the structure of Algorithm 6, which we use to compute a minimal set Z satisfying the requested properties. We start by computing the index i max (lines 1 to 5). Afterwards, we add the mandatory deviationĉ 0 (line 6) and check whether Γ ∈ Z holds (line 7). If so, we evaluate for all i ∈ {1, . . . , i max − 1} if deviationĉ i has to be added according to Proposition 3 (lines 8 to 13). Lastly, we add deviationĉ i max , which always needs to be considered (line 14). Note that the second part of Proposition 3 only holds for Γ ∈ Z, as otherwise we have z (x) = z (x), which makes it impossible to skip deviations. Hence, in the case of Γ / ∈ Z, we only use the first part (line 16).