An oracle-based framework for robust combinatorial optimization

We propose a general solution approach for min-max-robust counterparts of combinatorial optimization problems with uncertain linear objectives. We focus on the discrete scenario case, but our approach can be extended to other types of uncertainty sets such as polytopes or ellipsoids. Concerning the underlying certain problem, the algorithm is entirely oracle-based, i.e., our approach only requires a (primal) algorithm for solving the certain problem. It is thus particularly useful in case the underlying problem is hard to solve, or only defined implicitly by a given software addressing the certain case. The idea of our algorithm is to solve the convex relaxation of the robust problem by a simplicial decomposition approach, the main challenge being the non-differentiability of the objective function in the case of discrete or polytopal uncertainty. The resulting dual bounds are then used within a tailored branch-and-bound framework for solving the robust problem to optimality. By a computational evaluation, we show that our method outperforms straightforward linearization approaches on the robust minimum spanning tree problem. Moreover, using the Concorde solver for the certain oracle, our approach computes much better dual bounds for the robust traveling salesman problem in the same amount of time.


Introduction
Robust optimization has become a wide and active research area in the last decades.The aim is to address optimization problems with uncertain data.Unlike the stochastic optimization problem, which usually aims at optimizing expected values, the robust optimization paradigm tries to optimize the worst case.While stochastic optimization requires full knowledge of the probability distributions of all uncertain problem data, robust optimization only asks for a so-called uncertainty sets containing all scenarios that need to be taken into account.While generally leading to computationally easier problems than stochastic optimization, it is wellknown that robust counterparts of tractable combinatorial optimization problems usually turn out to be NP-hard for most types of uncertainty sets; see, e.g., [18] or the recent survey [10] and the references therein.
In this paper, we address robust counterparts of general combinatorial optimization problems of the type min c ⊤ x + c 0 s.t.x ∈ X, (P) where X ⊆ {0, 1} n is any set of binary vectors describing the feasible solutions of the problem at hand.The objective function coefficients (c 0 , c) ∈ R n+1 are considered uncertain.The robust counterpart of (P) is then given by min max (c 0 ,c)∈U c ⊤ x + c 0 where U ⊆ R n+1 is the so-called uncertainty set, collecting all likely scenarios.Note that allowing an uncertain constant c 0 makes the approach slightly more general, even though the latter is not relevant in the deterministic problem (P).With respect to the considered type of uncertainty set, our approach is rather general, but we will concentrate our exposition on the so-called discrete uncertainty case, where U is given as a finite set.Other classes of uncertainty sets often considered in the literature include polytopal or ellipsoidal sets.While many approaches devised in the literature consider special classes of combinatorial structures X, our aim is to devise an entirely oracle-based approach.We thus assume that we have at our disposition an algorithm that solves Problem (P), for any given objective c, but we do not pose any restrictions on how this algorithm works.Our approach is thus particularly well-suited in situations where the certain problem is already NP-hard but well-studied, such as, e.g., the traveling salesman problem, or where the underlying problem is not a classical textbook optimization problem, but given by some sophisticated and probably obscure solution software.Our approach does not require any knowledge about the underlying problem.
As mentioned above, robust counterparts are often NP-hard even in cases where the underlying problem (P) is tractable.Consequently, in order to solve (R), it cannot suffice to call the oracle a polynomial number of times.This is even true without assuming P = NP [8].Instead, we propose a branch-and-bound approach, where the main ingredient is the computation of the lower bound given by the straightfoward convex relaxation of (R), namely min max (c 0 ,c)∈U c ⊤ x + c 0 s.t.x ∈ conv(X). (C) This problem is well-defined and convex, as long as U is any compact set.While ellipsoidal uncertainty leads to a smooth objective in (C), which can be exploited algorithmically [12,11], the discrete and the polytopal uncertainty cases lead to piecewise linear objective functions, requiring different solution methods.
In our approach, Problem (C) is solved by an inner approximation algorithm; see, e.g., [5] and the references therein.It belongs to the class of Simplicial Decomposition (SD) methods.First introduced by Holloway in [16] and then further studied in [15,22,24,25], SD methods currently represent a standard tool in convex optimization.Our SD method makes use of two different oracles: the first one is an algorithm for solving the convex relaxation over an inner approximation of conv(X), being the convex hull of a subset X ′ of X.It is important to notice that such a subroutine implicitly defines the uncertainty set U , while the rest of our algorithm is independent of U .The second oracle is the one described above, which implicitly defines the set X and hence also conv(X).Our approach can thus be seen as an oracle-based version of a generalized SD algorithm; see, e.g., [5,6] for further details about generalized SD.The proposed method indeed performs a two-step optimization process by handling an ever expanding inner approximation of the relaxed feasible set conv(X).At a given iteration, the method first builds up a reduced problem (whose feasible set is given by the inner approximation) and solves it by means of the first oracle.It then feeds the second oracle with the information coming from the first step to hopefully generate new extreme points that guarantee a refinement of the inner approximation.If a new point cannot be found, then the solution obtained with the last reduced problem is the optimal one.The way the refinement step is carried out is crucial to guarantee finite convergence of our method in the end.
Dropping rules (i.e., rules that allow to get rid of useless points in the inner approximation) are often used in simplicial decomposition like algorithms to keep the computational cost deriving from the first oracle small enough; see, e.g., [5,7,25].As pointed out in [6], defining suitable dropping rules for a generalized simplicial decomposition, while guaranteeing finite convergence of the method, is a challenging task.We propose a simple dropping rule and analyze it in depth both from a theoretical and a computational point of view.Some other oracle-based algorithms for robust combinatorial optimization with objective function uncertainty have been devised in the literature.In particular, tailored column generation approaches for dealing with the continuous relaxation of the given combinatorial problem are studied in [9,17].When considering Problem (C), those column generation algorithms turn out to be closely related to a Kelly's cutting plane approach for the problem max which is equivalent to (C) in case of convex U by the minimax theorem.Another interesting approach to handle the relaxation (C) is described in [20], where the author proposes a projected subgradient method that approximately solves the projection problem at each iteration by the classical Frank-Wolfe algorithm.This approach is somehow related to gradient-sliding methods, see, e.g., [21] and the references therein, and hence obviously differs from the one described in this paper.
When aiming at general approaches that do no not exploit specific characteristics of the underlying problem (P), the main alternative to oracle-based algorithms are approaches based on an IP-formulation of (P).For discrete uncertainty, the non-linear objective in (C) can easily be linearized, and this approach can be extended to infinite uncertainty sets U using a dynamic generation of worst-case scenarios, provided that a linear optimization oracle over U is given; see [23] for a general analysis and [14] for an experimental comparison with reformulation-based approaches.The scenario generation method is still applicable when having only a separation algorithm for conv(X) at hand.In the experimental evaluation presented in this paper, we compare our SD approach to such a separation oracle based approach for the discrete uncertainty case, using CPLEX to solve the resulting integer linear problems.
In the subsequent section, we describe our SD approach in more detail, concentrating on the discrete uncertainty case and with a particular focus on dropping rules.In Section 3, we explain how we embedded this approach into a branch-and-bound framework.An experimental evaluation is presented in Section 4. Section 5 concludes.

Computation of lower bounds
The main ingredient in our approach is the computation of the lower bound given by the convex relaxation of the robust counterpart (R).Setting P := conv(X) and f (x) := max (c 0 ,c)∈U c ⊤ x+c 0 , the problem we address is thus given as min f (x) s.t.x ∈ P. (CR) It is easy to see that the objective function f in (CR) is convex for any uncertainty set U , however, it is not necessarily differentiable.E.g., in case of a finite set U , differentiability is guaranteed only in points x where the scenario (c 0 , c) ∈ U maximizing c ⊤ x + c 0 is unique.In the following, we first describe the general idea of the simplicial decomposition approach applied to the potentially non-differentiable problem (CR); see Section 2.1.Afterwards, we investigate a variant of the approach where vertices are dropped in case they are not needed to define the current simplex.This however requires to deal with the issue of cycling; see Section 2.2.

General approach
We now describe the two oracles that we embed in our SD framework.The first oracle SIM-O essentially minimizes f over a simplex given by the convex hull of a finite set V ⊂ R n .Beyond the optimal solution x * , we also need coefficients yielding x * as a convex combination of points in V and a subgradient c of f in x * such that −c belongs to the normal cone of conv(V ) in x * .The existence of such c is a necessary and sufficient condition of optimality for x * .
The second oracle, namely Oracle LIN-O, is the main oracle defining the underlying problem.It takes as input an objective vector c and returns a minimizer of min x∈X c ⊤ x, which is the same as solving Problem (P).
Using these oracles, Algorithm SD works as follows (see the pseudo-code below): the set V k is initialized as the singleton {x 0 }, where x0 is an arbitrary element of X.Then, we enter a loop.At each iteration k, oracle SIM-O is first called, in order to calculate a minimizer Figure 1: Illustration of Algorithm SD with X = {0, 1} 2 .over conv(V k ) and a subgradient Then, oracle LIN-O is called, giving as output a minimizer xk of (c k ) ⊤ x over x ∈ X.Note that both x k and xk belong to P = conv(X), but not necessarily to X. From the definition of N conv(V k ) (x k ) we have that This means that as long as (c k ) ⊤ xk < (c k ) ⊤ x k we can go further in the minimization of f over P by including the point xk in the set V k .Otherwise, if c k ⊤ xk ≥ c k ⊤ x k we can stop our algorithm, as x k is a minimizer of f over P and f (x k ) is a lower bound for Problem (R).See Fig. 1 for an illustration.
We claim that Algorithm SD terminates after finitely many iterations with a correct result.For showing this, first observe Lemma 1.At every iteration k of Algorithm SD, a lower bound for Problem (CR) is given by f and by the choice of xk , we obtain Theorem 1. Algorithm SD terminates after a finite number of iterations with a correct result.
Proof.Correctness immediately follows from Lemma 1, since f (x k ) is clearly an upper bound for Problem (CR) and the algorithm only terminates when (c This means that in case Algorithm SD does not terminate at iteration k, the point xk ∈ X does not belong to V k , so that V k+1 is a strict extension of V k .The result then follows from the finiteness of X.
Note that this proof of convergence relies on our general assumption that X is a finite set and on the fact that we never eliminate vertices of V k .The situation is more complicated when such an elimination is allowed, as discussed in Section 2.2 below.
In the remainder of this subsection, we concentrate on the important special case that U consists of a finite number of scenarios {c 1 , c 2 , . . ., c m } ⊆ R n+1 , where we denote c i = (c i , ci ) with the uncertain constant being ci .In this case, the oracle SIM-O can be realized as follows: first note that we essentially need to solve the problem min In the following, we denote by x k the minimizer of (1), adopting the same notation used within Algorithm SD.As mentioned in [6], having a finite number of scenarios is one of the special cases where the calculation of a subgradient )) can be obtained as a byproduct of the solution of (1).For sake of completeness, we report how the subgradient c k is derived.Problem (1) can be rewritten as min z From the optimality conditions of (2), we have that the optimal solution (x k , z k ), together with the dual optimal variables λ k j , satisfies It follows that m j=1 λ k j = 1 and, from Lagrangian optimality, we have It can be shown (see [4], p.199) that the vector c k := m j=1 λ k j cj is a subgradient of f at x k , and (3) implies that −c k belongs to the normal cone of conv(V k ) at x k , so that we indeed have . Summarizing, when the set U is finite, an oracle SIM-O suited for our purposes can be implemented by any linear programming solver able to address Problem (2), rewritten considering the α k v as variables.In this way, x k is obtained as the convex combination of α k v .In case of a differentiable function f , the choice of c k is unique.In case of finite U , one may ask the question whether there is some freedom in the choice of c k , which could potentially be exploited in order to find particularly promising search directions.However, it turns out that even in the discrete uncertainty case, the subgradient c k is unique with high probability when the scenarios are chosen (or perturbed) randomly.
Theorem 2. Assume that all scenarios in U = {c 1 , . . ., c m } are perturbed by any continuously distributed random vector in R m(n+1) with full-dimensional support.Then, with probability one, the set ∂f Proof.By definition, there exist z k and α k such that (z k , x k , α k ) is a basic optimal solution of min z Define and equality holds with probability one.Now ∂f . Consequently, with probability one, the sum of the two dimensions is at most n.Again with probability one, it follows that ∂f (x k ) and −N conv(V k ) (x k ) intersect in at most one point.

Vertex dropping rule
The running time of an iteration of Algorithm SD strongly depends on the size of V k .The overall performance could thus benefit from a dropping rule for elements of V k .A straightforward idea is to eliminate vertices not needed to define the minimizer of f over V k .We thus consider the following modified update rule: In the following, we will refer to Algorithm SD where V k is updated according to (drop) as Algorithm SD-DROP.In case of a non-differentiable function f , Algorithm SD-DROP may cycle, as shown in the following example.
Example 1.Let us consider the following problem Starting from x 1 = 0 0 , Algorithm SD-DROP will perform the following iterations: At iteration k = 3, we thus get V 4 = V 2 and the algorithm cycles.See Fig. 2 for an illustration.
Considering this example, two questions may arise.Firstly, the solution x 1 is actually optimal, so that choosing a better subgradient (namely zero) would have stopped the algorithm immediately.Secondly, the scenarios (0, 1, −1) ⊤ and (0, −1, 1) ⊤ defining the uncertainty set U contain negative entries.The following example shows that neither of the two features causes cycling: Example 2. Let us consider the following problem Starting from x 1 = 1 1 , Algorithm SD-DROP will perform the following iterations: It is easy to see that cycling cannot occur when f is differentiable.In fact, in this case, if x k is not optimal for Problem (CR), we have f (x k+1 ) < f (x k ), since −c k is a descent direction.In particular, Algorithm SD-DROP terminates after a finite number of iterations.
For the case of finite U , differentiability is not given, and cycling can occur as seen in the examples above.However, we can still show a weaker result: we will prove that a small random perturbation of the scenario entries can avoid cycling (with probability one).For this, we first need the following observation.
Lemma 2. Let L be an affine subspace of R n and assume that f | L , the restriction of f to L, Lemma 3. Consider an iteration k in which Algorithm SD-DROP does not terminate.Let L be an affine subspace of R n containing x k such that (iii) and c k is not orthogonal to L ∩ aff(V k+1 ).
Proof.By (ii), there exists some y ∈ L ∩ aff(V k+1 ) with y = x k .Since L and aff(V k+1 ) are affine spaces both containing x k , we may choose y such that (c k ) ⊤ (y − x k ) ≤ 0. By (iii), we may even assume that (c k ) ⊤ (y − x k ) < 0. Using Lemma 2 and (i), we thus derive that y − x k is a descent direction of f in x k .It thus remains to show that x k + ε(y − x k ) ∈ conv(V k+1 ) for some ε > 0, which implies that some x ∈ conv(V k+1 ) has a strictly smaller objective value than x k and hence f ( As x k belongs to the relative interior of conv( V k ), there exists ε > 0 such that ). Theorem 3. Assume that all scenarios in U = {c 1 , . . ., c m } are perturbed by any continuously distributed random vector in R m(n+1) with full-dimensional support.Then, with probability one, Algorithm SD-DROP terminates after finitely many iterations.
Proof.Consider any iteration k in which Algorithm SD-DROP does not terminate.Then it suffices to show that f (x k+1 ) < f (x k ) with probability one.For this, let L be the maximal affine space such that x k ∈ L and f | L is differentiable in x k .By the definition of f , the epigraph epi(f ) is a polyhedron, and L is obtained by projecting the minimal face of epi(f ) containing (x k , f (x k )) onto R n and taking the affine hull of the projection.Thus L is an affine subspace of R n containing x k , depending continuously on the perturbation of c 1 , . . ., c m .We claim that the conditions (ii) and (iii) of Lemma 3 are satisfied with probability one then, so that the result follows.Indeed, using the same notation as in the proof of Theorem 2, we have dim with probability one.Thus, with probability one, we obtain dim(L) + dim(aff V k ) = n and hence dim(L) + dim(aff V k+1 ) ≥ n + 1.Thus (ii) holds with probability one.For showing (iii), we use again that dim(L ∩ aff V k+1 ) ≥ 1 with probability one.This implies that the probability of the fixed vector c k being orthogonal to L ∩ aff V k+1 is zero.
Theorem 3 shows that cycling can be avoided by applying small random perturbations to the scenarios c 1 , . . ., c m , e.g., by choosing (ĉ j ) i ∈ [(c j ) i − ε, (c j ) i + ε] uniformly at random for some ε > 0, independently for all j = 1, . . ., m and i = 0, . . ., n.As X is finite, the optimal solution of the perturbed problem (R) will agree with an optimizer of the unperturbed problem if ε is small enough (even though this is not true for the relaxation (CR)).Note that Theorem 3 requires that also the constant in the objective function is perturbed.
Remark 1.In practice, the perturbation applied in Theorem 3 is not necessary, because small numerical errors arising in the optimization process will have the same effect.In our experiments described in Section 4, we do not explicitly apply any perturbation.
Note that Theorem 2 also holds when eliminating vertices.In particular, this implies that, when starting from the same set V k , the next subgradient c k is the same with or without elimination (with high probability).However, in a later iteration, the set V k and hence the optimal solution x k may be different in the two cases, and thus also the subgradients.In our experiments, we observed that vertices being eliminated were sometimes re-generated in subsequent iterations.

Embedding SD into a branch-and-bound scheme
In order to solve Problem (R) to optimality, we embed Algorithm SD into a branch-and-bound scheme, which we will denote by BB-SD.In this branch-and-bound scheme, we can exploit some specific features of the SD algorithm.Firstly, all binary vertices generated at some node of the branch-and-bound tree can be reused in the child nodes.Indeed, if we branch on fractional variables, each such vertex must be feasible in one of the child nodes, and can thus be inherited.This initial set of vertices enables us to warmstart the SD algorithm at every child node.Moreover, thanks to Lemma 1, every iteration of SD leads to a valid lower bound on the solution of the convex relaxation considered, meaning that early pruning can be performed.In other words, at every node we do not necessarily need to solve the convex relaxation to optimality: the node can be pruned as soon as SD computes a lower bound greater than the current upper bound.
As emphasized above, we assume that Problem (P) can be accessed only by an optimization oracle.Therefore, even when dealing with specific combinatorial problems, we do not exploit any structure to define primal heuristics within BB-SD.Nevertheless, at every node of BB-SD, we easily get an upper bound by evaluating the objective function on all the generated extreme points and taking the minimal value among them.
Having no problem-specific heuristics, we need an enumeration strategy that provides primal solutions quickly.Thus, we adopt a depth first search (DFS).Moreover, the branching rule implemented within BB-SD branches on variables that are fractional in the continuous relaxation, by means of the canonical disjunction.More precisely, we branch on the variable with the fractional part closest to one and produce two child nodes: in the node considered first, we fix the branching variable to 1, in the other node we fix it to 0. This choice, combined with DFS, typically allows to quickly find integer solutions, which are sparse for many combinatorial problems.Note that all nodes in BB-SD remain feasible.Indeed, regardless of how we select the fractional variable x i to branch on, the oracle must have already returned solutions with both x i = 0 and x i = 1.

Numerical results
To test the performance of our algorithm SD and of the branch-and-bound scheme BB-SD, we considered instances of Problem (R) with two different underlying problems: the Spanning Tree problem (Section 4.1) and the Traveling Salesman problem (Section 4.2).The standard models for these problems use an exponential number of constraints that can be separated efficiently.In the case of the Spanning Tree problem, this exponential set of constraints yields a complete linear formulation, while this is not the case for the NP-hard Traveling Salesman problem.For the robust Minimum Spanning Tree Problem, we report a comparison between BB-SD and the MILP solver of CPLEX [3].For the robust Traveling Salesman Problem, we focus on the continuous relaxations, thus reporting a comparison on the bounds obtained at the root node of the branch-and-bound tree.
In the implementation of SD, for both the robust Minimum Spanning Tree Problem (r-MSTP) and the robust Traveling Salesman Problem (r-TSP), oracle LIN-O is defined according to the underlying problem: for the r-MSTP we implemented the standard Kruskal algorithm [19], a well-known polynomial-time algorithm.For the r-TSP, we used the implementation of the solver Concorde [1].Since the TSP is NP-hard, the computational times needed to call the linear oracles differ significantly in the two problems, as seen later in the numerical experiments.
Except for the oracle LIN-O, we used exactly the same implementation for both problems.In particular, we applied the same oracle SIM-O for both r-MSTP and r-TSP.Problem ( 2) is rewritten by expanding the condition x ∈ conv(V k ).By using the LP formulation (4) and eliminating the x variables, we obtain the following equivalent formulation: where cv j = c ⊤ j v, for every v ∈ V k .Problem ( 5) is solved with CPLEX.Note that the number of constraints depends on the number of scenarios, and the number of variables corresponds to the cardinality of V k and thus increases at every iteration in our approach SD.The dropping rule implemented in SD-DROP may reduce the size of this problem, thus potentially leading to practical improvements in the running time.

Spanning Tree Problem
Given an undirected weighted graph G = (N, E), a minimum spanning tree is a subset of edges that connects all vertices, without any cycles and with the minimum total edge weight.We use the following formulation of the Robust Minimum Spanning Tree problem: The objective function can easily be linearized by introducing a new variable z ∈ R and constraints z ≥ c ⊤ x for all c ∈ U .In the above model, the number of inequalities is exponential in the input size, hence we have to use a separation algorithm within CPLEX.
For our experiments, we consider a benchmark of randomly generated instances of r-MSTP.We build complete graphs of five different sizes (from 20 to 60 nodes).The nominal costs are real numbers randomly chosen in the interval [1,2].For each size we randomly generate 10 different nominal cost vectors.The scenarios c ∈ U are generated by adding to the vector of nominal costs a random unit vector, multiplied by a scalar factor β. We consider three such factors 1, 2, and 3, and generate three different numbers of scenarios (#sc) 10, 100, and 1000.In total, then, we have a benchmark of 450 instances.
In a first experiment, we analyze different dropping rules within algorithm SD.Then, we show how performing a warm start along the branch-and-bound iterations leads to a significant reduction in terms of number of iterations compared to a cold start.Finally, our branchand-bound method BB-SD is compared to the MILP solver of CPLEX.Within CPLEX, we apply a dynamic separation algorithm using a callback adding lazy constraints, adopting a simple implementation based on the Ford-Fulkerson algorithm.

Comparison of dropping rules
In this section, we focus on the performance of algorithm SD for solving the continuous relaxation of our r-MSTP instances.In particular, we compare the performance of SD implementing three different dropping rules.Since all instances with 10 or 100 scenarios are solved in less than 0.1 seconds, we only consider the continuous relaxations of instances with 1000 scenarios, meaning that the evaluation is carried out on 150 instances.As dropping rules, we implemented the following: • d0: meaning that no dropping rule is applied within SD • d1: meaning that we update the set V k according to condition (drop) defined in Section 2.2, i.e., we eliminate all vertices v ∈ V k such that α k v = 0 at every iteration of SD As mentioned before, for each |N | we built 30 instances, 10 for each factor β. In Table 1, we report the average running times (time) and the average numbers of iterations (#it) for each version of SD; note that the number n of variables in (P) is given by |E| here.We further report the performance profiles [13] related to the CPU time and the number of iterations in Fig. 4. We notice that dropping rule d2 allows SD to have slightly better performance in terms of CPU time, despite being not always better with respect to SD with no dropping rule in terms of number of iterations.Indeed, in the MST problem, the linear oracle calls are  extremely fast, while most of the computational time is needed to solve the master problem ( 5).This explains why, although some more iterations are needed, dropping some vertices and hence reducing the size of the master problems can improve the overall performance of the algorithm.On the other hand, it is clear from the results that eliminating all inactive vertices as in Algorithm SD-DROP (rule d1) is not beneficial for SD as both the number of iterations and the CPU time increase.

Warmstart benefits
In the following, we evaluate our branch-and-bound method BB-SD.In particular, we will compare the performance of BB-SD considering both SD with no dropping rule (d0) and SD with dropping rule (d2).The comparison is done on all 450 instances of r-MST.As mentioned before, for each combination of |N | and #sc, we built 30 instances, 10 for each value of the scalar factor β. In Table 2, we first compare the performance of BB-SD by considering SD with no dropping rule with and without warmstart (d0 no ws vs d0 with ws).In the table, we report the number of instances solved within the time limit of one hour (#sol), the average running times (time), the average number of nodes (#nodes) and the average number of iterations (#it) for each version of SD.All averages are taken over the instances solved within the time limit.It is clear from the results that the warmstart leads to a considerable decrease in the average number of iterations and consequently a significant decrease in CPU time.The same behavior can be noticed when looking at Table 3, where we compare BB-SD using dropping rule d2 with and without warmstart (d2 no ws vs d2 with ws).In Fig. 5, we further report the performance profiles with respect to the CPU time of the four versions of BB-SD.The versions of BB-SD with no elimination (d0) and with dropping rule d2 show very similar performances.Note that from our results it is clear that the instances become harder with a higher number of scenarios.We can also notice that BB-SD with d2 with ws shows slightly better performances when looking at the hardest instances, namely those with 1000 scenarios.

Comparison between BB-SD and CPLEX
We now compare our branch-and-bound algorithm BB-SD with CPLEX on our r-MSTP instances.
As already mentioned, within the MILP solver of CPLEX, we apply a dynamic separation algorithm using a callback adding lazy constraints, adopting a simple implementation based on the Ford-Fulkerson algorithm.The comparison is made with BB-SD implementing the dropping rule d2 and allowing warmstart.In Table 4, we report for each combination of |N | and #sc, the number of instances solved within the time limit of one hour (#sol), the average running times (time) and the average number of nodes (#nodes).We recall that the averages are taken considering the results on 30 instances each.Performance profiles are presented in Fig. 6.We can notice that BB-SD strongly ouperforms CPLEX when considering instances with 10 and 100 scenarios.As already highlighted before, the higher the number of scenarios, the harder the instances become.Still, also when dealing with instances containing 1000 scenarios, BB-SD shows better performance with respect to CPLEX.On instances with 10 scenarios, we see that BB-SD is able to solve all instances within the time limit, while CPLEX fails in 46 cases.For instances on 100 scenarios, BB-SD and CPLEX show 50 and 75 failures, respectively, while for instances with 1000 scenarios, BB-SD and CPLEX show 88 and 95 failures.

Traveling Salesman Problem
Given an undirected, complete, and weighted graph G = (N, E), the Traveling Salesman problem consists in finding a path starting and ending at a given vertex v ∈ N such that all vertices in the graph are visited exactly once and the sum of the weights of its constituent edges is minimized.Our approach uses the following formulation of the Traveling Salesman problem: The number of inequalities is again exponential and for CPLEX we use essentially the same separation algorithm as for the Spanning Tree problem; see Section 4.1.
For our tests, we consider 10 instances from the TSPLIB library [2].For each instance, we generate different scenarios by adding to the nominal costs a random unit vector multiplied by some coefficient.This vector has non-negative components, to avoid negative distances, and the coefficients are 1, 2 and 3, as for the r-MST case.We again consider three different numbers of scenarios, namely 10, 100, and 1000, thus producing a benchmark of 90 instances in total.As mentioned above, we realized the linear oracle by using the solver Concorde.We used the default version and solved each linear problem exactly.In particular, in each linear oracle call an NP-hard problem is solved, so that the time needed by LIN-O is now much larger than the time needed by SIM-O, unlike in the MST case.Therefore, eliminating variables is not effective: it would slightly reduce the time for the linear master problems, while increasing the number of iterations and hence increasing the time to solve the NP-hard oracles.For this reason, for our tests, we only consider SD where no dropping rule is applied.
In the following, we compare the performance of SD applied to solve the continuous relaxation of the instances considered and the performance of CPLEX at the root node.We notice that CPLEX minimizes the non-linear objective function max c∈U c ⊤ x over the subtour relaxation of the problem, while in our formulation we implicitly optimize the same function over the convex hull of feasible tours, thus obtaining a tighter lower bound.However, our approach needs to solve NP-hard problems to achieve this.It is thus not surprising that the computing time needed by SD to solve the relaxation is often larger than the time needed by CPLEX to solve its weaker relaxation.However, when requiring CPLEX to obtain the same stronger bound, the required computational time increases significantly.In Table 5, we show the results for the TSP instances.
For every instance and every number of scenarios, we report the average bound and computing time obtained by SD (SD root node) and by CPLEX (CPLEX root node) to solve the continuous relaxation.In the last column, we report the time needed by CPLEX to obtain the same bound as the SD bound (CPLEX -SD bound).The table shows that, on average, the SD bound is much stronger than the subtour relaxation bound, but it is obtained in a longer time.Furthermore, the time needed by CPLEX to reach the same bound as SD is often much larger than the SD time and in one case the time limit of one hour was reached.

Conclusion
We presented an algorithm for the exact solution of strictly robust counterparts of combinatorial optimization problems, entirely based on a linear optimization oracle for the underlying problem.Concentrating on the discrete scenario case, our experimental evaluation shows that the approach is competitive both in case the underlying problem is very easy to solve, as in the MST case, and in case it is a hard problem, as in the TSP case.In particular, in the latter case, we have seen that solving the underlying problem to optimality can be beneficial even when it is NP-hard: in the same amount of time, our approach produces much better dual bounds than CPLEX based on a linearized IP formulation of the problem, using the standard subtour formulation.We emphasize again that our approach is not restricted to the case of discrete uncertainty.However, the oracle SIM-O must be adapted when considering other classes of uncertainty sets.In case of ellipsoidal uncertainty, SIM-O turns out to be a second-order cone program.As mentioned above, since f is a smooth function in this case, cycling is not possible even if the most aggressive dropping rule (drop) is used.For the case of polyhedral uncertainty, SIM-O can again be realized as a linear program, and the statement of Theorem 3 holds analogously.
The investigation of other generalizations of our approach is left as future work.In particular, it may be interesting to extend it to more general classes of uncertain objective functions, e.g., of the form c ⊤ g(x), where a convex function g : R n → R m is given and the coefficients c ∈ R m + are uncertain.In this case, the function f decribing the worst case over all scenarios is still a convex function, and it suffices to adapt the oracle SIM-O.

Algorithm 3 :
SD given : oracles LIN-O and SIM-O output: optimizer x * of (CR) compute any x0 ∈ X by calling LIN-O with arbitrary objective set

Table 2 :
Using BB-SD with and without warmstart.

PP on CPU time -1000 scenarios
Figure 5: Comparison of BB-SD with and without warmstart.

Table 3 :
Using BB-SD with and without warmstart, applying dropping rule d2.

Table 4 :
Comparison between BB-SD and CPLEX on r-MST instances.

Table 5 :
Average results for r-TSP, continuous relaxations.