Robust approach to restricted items selection problem

We consider the robust version of items selection problem, in which the goal is to choose representatives from a family of sets, preserving constraints on the allowed items’ combinations. We prove NP-hardness of the deterministic version, and establish polynomially solvable special cases. Next, we consider the robust version in which we aim at minimizing the maximum regret of the solution under interval parameter uncertainty. We show that this problem is hard for the second level of polynomial-time hierarchy. We develop exact solution algorithms for the robust problem, based on cut generation and mixed-integer programming, and present the results of computational experiments.


Introduction
In this paper we consider a robust variant of combinatorial optimization problem of selecting p i items from m sets of items, i = 1, . . ., m, with the objective to minimize their total cost.We assume the possibility that some choices of pairs of items are forbidden.We further assume that costs of items are not given precisely, but only intervals of their true values are known.
This problem models many practical situations, when we need to design a feasible configuration, subject to uncertain costs of its elements.For example, consider designing a production line in a flexible manufacturing system [14,19].The production line consists of m sites arranged for performing a sequence of operations.Execution of each operation requires a specific machine tool, and thus for each operation the designer must choose a single eligible tool among a variety of alternatives (for example, belonging to different types or brands of tools, suitable for a given operation).The costs of tools are subject to random fluctuations, due to the number of factors, for example: time difference between design phase and the purchase phase, additional installation costs, additional customization costs, etc.Consequently, it is usually not possible to determine the exact cost of each tool at the design stage.Moreover, not every tool is compatible with each other, and it may not be possible to install arbitrary selection of tools in a common production line.Thus the choice of possible tool configurations is restricted.For decision maker, the goal is to select eligible tools for each operation's site, so that the total cost would be minimal.
As another example, consider the workflow planning for an engineering project.Decision makers need to assign teams of people to work on developing components of the whole system.For each component they would list a group of candidates that are suitable to working on it.Then they would decide on the actual team members by selecting a required number of people from each group of candidates.Some people, being adequate candidates for more than one team, may be listed more than once.However, due to the assumed work schedule, they cannot be assigned to two teams that work simultaneously.The final cost of completing each component is uncertain, as it depends not only on the skills and experience of selected people, but also on components' complexity, applied technologies, and possibly many other factors.

Related Work
The basic variant of this problem has been first considered in [9] under the name Representatives Selection Problem, where we are allowed to select one item from each set of alternatives.In order to alleviate the effects of cost uncertainty on decision making, the min-max and min-max regret criteria [3,16] have been proposed to assess the solution quality.The problem formulations using these criteria belong to the class of robust optimization problems [21].Such approach appears to be more suitable for large scale design projects than an alternative stochastic optimization approach [15], when: 1) decision makers do not have sufficient historical data for estimating probability distributions; 2) there is a high factor of risk involved in one-shot decisions, and a precautionary approach is preferred.The robust approach to discrete optimization problems has been applied in many areas of industrial engineering, such as: scheduling and sequencing [10,11,18], network optimization [5,6], assignment [2,20], and others [13].
Note that deterministic version of Representatives Selection Problem is easily solvable in polynomial time.For interval uncertainty representation of cost parameters the problem can still be solved in polynomial time, both in case of minimizing the maximum regret and the relative regret [9].However, in case of discrete set of scenarios, the problem becomes NP-hard even for 2 scenarios, and strongly NP-hard when the number of scenarios K is a part of the input.In [8] authors prove that strong NP-hardness holds also when sets of eligible items are bounded.In [17] an O(log K/ log log K)-approximation algorithm for this variant was given.
In this paper we consider a generalization of the Representatives Selection Problem, where it is required to select a specific number of items from each set, and constraints may be imposed on the selected items' configurations, by specifying the pairs of items that cannot be selected simultaneously.We assume the interval representation of cost uncertainty, since this case appears to be more interesting from both practical and theoretical point of view.

Problem Formulation
We start from defining a deterministic version of the considered problem, which we call Restricted Items Selection Problem (abbreviated RIS).Given are m sets I i of items, with integer values c ij associated with each j ∈ I i , r i = |I i |.Given is a set T of tuples (i, k, j, l), i, j ∈ {1, . . ., m}, k ∈ I i , l ∈ I j , indicating that items k and l cannot be selected simultaneously (i.e., if k is selected, then l cannot be selected, and if l is selected, then k cannot be selected).The set T contains all the forbidden pairs.Given are m positive integers p i < |I i |.The goal is to select for each i = 1, . . ., m a subset S i ⊂ I i of exactly p i items, so that the total value of selected items is maximized.
Let x ij = 1 if jth item from ith set is selected, x ij = 0 otherwise.The problem can be stated as: subject to: Let us now define the version of the problem with uncertain data, which we call Interval Min-Max Regret Restricted Items Selection Problem (abbreviated IRIS).For each item, given is an interval of possible costs, The set of all cost scenarios is a product of intervals, denoted: Let x be the vector of variables x ij (indexing (i, j) agreeing with the cost vector c).We define the regret of a solution x as the difference between the solution value in the worst-case scenario, and the best possible value in the worst-case scenario: where: is the set of feasible solutions (choices of items from each set I i ).The objective of the IRIS problem is to find such a choice x that minimizes the maximum regret:

Related Problems
The considered problem generalizes two related fundamental combinatorial optimization problems, namely, the Selecting Items Problem [4,7], and the aforementioned Representatives Selection Problem [9], of which the robust variants have been investigated in the literature.

Selecting items problem (SI)
Given is a set I of items, with a positive integer c i , denoting item's cost, associated with each i ∈ I. Given is positive integer p < |I|.The goal is to select a subset S ⊂ I of exactly p items, so that the total value of S is minimized.Let x be the characteristic vector of S, i.e., x i = 1 if i ∈ S, and x i = 0 otherwise.The problem can be stated as:

Representatives selection problem (RS)
Given are m sets I i of items, with a positive integer costs c ij associated with each j ∈ I i , r i = |I i |.The goal is to select a single item from each set, J = (j 1 , j 2 , . . ., j m ), j i ∈ I i , so that the total value is minimized.Let x ij = 1 if jth item from ith set is selected, x ij = 0 otherwise.The problem can be stated as: If p i = 1 for each i = 1, . . ., m, and T = ∅, then problem RIS is equivalent to RS.If T = ∅, then problem RIS is equivalent to m independent instances of SI problem.

Problem Complexity
Deterministic problems SI and RS can be solved in polynomial time (trivially using sorting).However, forbiddance constraints make these problems much more difficult in general.
Proof.We present reduction from the independent set problem [12].Let G = (V, E) be an undirected graph with |V | = n, and we ask if G has an independent set of size at least K.We construct an instance of RIS as follows.Let us enumerate vertices in V by consecutive numbers 1, . . ., n, denoting ith vertex by v i .For each vertex v i we create an item set I i , containing an item of value 0. For each edge (v i , v j ) ∈ E, we add tuple (i, 1, j, 1) to set T , forbidding simultaneous choice of items corresponding to v i and v j .To each set I i we also add a dummy item with value 1.
Note that a feasible solution to RIS instance always exists (dummy items are not involved in constraints T ).In such a solution we have a choice of items, one from each set I i , such that the corresponding nodes v i are not directly connected in G.The cost of an optimal solution of RIS instance is less or equal to n − K if and only if there is an independent set of size K in G.
Consequently, the uncertain data problem IRIS is NP-hard as well.We can, however, identify the following special cases of RIS by imposing additional restrictions on the set of constraints.By exploiting these properties we are able to improve the solution algorithms of uncertain IRIS problem in these special cases (see Section 4).
Theorem 2. Problem RIS can be solved in polynomial time in any of the following special cases: 1) Each item appears at most once in any constraint in T .
Proof.To see that RIS can be solved in polynomial time in case 1), we observe that if each item is involved in at most one constraint in T , then the constraint matrix of integer linear program (1)-( 4) is totally unimodular.This can be verified using the following characterization (see [22], chapter 19): a (0, ±1) matrix with at most two nonzeros in each column is totally unimodual, if the set of rows can be partitioned into two classes, such that two nonzero entries in a column are in the same class of rows if they have different signs and in different classes of rows if they have the same sign.These conditions are satisfied if we take the rows corresponding to constraints (2) as the first class, and rows corresponding to constraints (3) as the second class.
To prove case 2), we construct an instance of min-cost max-flow network problem that is equivalent to the corresponding RIS instance.First, note that due to the transitivity of "forbiddance" relation, items can be grouped into equivalence classes.From each equivalence class,  at most one item can be selected.If an item is not involved in any constraint, it would belong to an equivalence class of cardinality 1.The network consists of m source nodes, two inner layers, followed by a single terminal node (see also Example 1 and Fig. 1).Each source node corresponds to the set I i , i = 1, . . ., m, and provides a supply of p i units of flow.Nodes in the first inner layer correspond to items in sets I i , and are connected by unit-capacity arcs to their corresponding source nodes.Each such arc has costs c ij .The second inner layer consists of nodes corresponding to each equivalence class of items.Nodes from the first inner layer, that correspond to the items involved in a common equivalence class, are connected to the common node in the second layer.Arcs between first and second inner layer have unit-capacity and zero cost.Finally, there are unit-capacity and zero-cost arcs from each node in the second layer and the terminal node.The flow network can be constructed in polynomial time given an instance of RIS.
Example 1.Consider an instance of RIS with m = 3 item sets, each containing r i = 3 items, with the requirement to select p i = 2 items from each set.The set of forbidden pairs is T = {(1, 1, 2, 1), (2, 1, 3, 1), (1, 1, 3, 1), (1, 2, 2, 2)}.Note that the first three pairs form a transitive subset, thus only one item can be selected from it.The corresponding flow network is presented on the Fig. 1.The first inner layer nodes correspond to items, denoted i ij , and the second inner layer nodes, denoted e k , correspond to the subsets of items from which only a single item can be selected.
We conclude this section with the last statement regarding the complexity of general robust IRIS problem.It turns out that this problem is likely to be even much harder than its already NPhard deterministic counterpart RIS.We show that it is hard for the complexity class Σ p 2 = N P N P , a set of problems that remain NP-hard to solve even if we can use an oracle that answers NP queries in O(1) time (i.e., the set of problems that can be solved in polynomial time on nondeterministic Turing machine that can access an oracle for NP-complete problem in every step of its computations) [23].
The class Σ p 2 can be also characterized as the set of all decision problems that can be stated in the second-order logic using a pair of universal and existential quantifiers.Let us define the following prototypical problem in this class: Problem: 2-Quantified 3-DNF Satisfiability (abbreviated ∃∀ 3SAT) Instance: Two sets X = {x 1 , . . ., x s }, Y = {y 1 , . . ., y t } of Boolean variables.A Boolean formula φ(X, Y ) over X ∪ Y in disjunctive normal form, where every clause consists of exactly 3 literals. Question: Given any 3-DNF formula φ we construct and instance of Interval Minmax Regret RIS as follows.For each clause C j , j = 1, . . ., m, we create a set of items I j .Each set contains three types of items: 1) an X-item that corresponds to the unique satisfying assignment of x-variables in clause C j , 2) a collection of Y -items, denoted Y j,k , each corresponding to every assignment of y-variables that make the clause C j evaluate to false, 3) a special item.We can assume that each clause contains at least one y-variable (otherwise the instance answer is always "yes").If there are no x-variables in a given clause, then there is no X-item in the set I j (only Y -items and a special item).
For each pair of item sets I i , I j , and for each pair of items Y i,k ∈ I i , Y j,l ∈ I j , we add a "forbidding" constraint (i, k, j, l) to T , whenever: 1) both items correspond to assignment of common variable y, and, 2) item Y i,k corresponds to the assignment of variable y = 0, while item Y j,l corresponds to the assignment y = 1, or vice-versa.
Similarly, we add "forbidding" constraints for pairs of X-items from two sets I i and I j , whenever their satisfying x-assignments contain a common Boolean variable, and they correspond to conflicting assignments of x-variables.Special items are not bound by any "forbidding" constraints.
For example, Let C 1 = (x 1 ∧ y 1 ∧ ȳ2 ), and C 2 = (x 1 ∧ y 1 ∧ y 3 ).Then X-item from set I 1 cannot be taken simultaneously with X-item from set I 2 , since they both correspond to satisfying assignment of variable x 1 , that is x 1 = 1 in C 1 , and x 1 = 0 in C 2 .Moreover there are 3 Y -items in I 1 , corresponding to unsatisfying assignments of (y 1 , y 2 ), that is 00, 01 and 11; and there are 3 Y -items in I 2 , corresponding to unsatisfying assignments of (y 1 , y 3 ), that is 00, 01 and 10.Note that since both clauses share variable y 1 , items that correspond to a conflicting assignment form a "forbidden pair" (e.g., Y -item 00 from the first set and Y -item 10 from the second set, etc.).
Let B > m be a large constant.For each set I j , each X-item corresponding to the unique x-satisfying assignment has cost interval [0, B].Each Y -item corresponding to unsatisfying assignment of y-variables has cost interval [0, B 2 ].The special item has cost c − = c + = B + 1, regardless of the scenario.
Note that in the worst-case scenario each item selected by the decision maker will have the interval upper-bound cost c + , while all other items will have their interval lower-bound costs c − .
Observe that any optimal solution of the considered IRIS problem corresponds to a valid assignment of truth values to all variables in X ∪ Y : the choice of items encoded by the characteristic vector x corresponds to the assignment of x-variables, while the best alternative solution in the worst-case scenario, encoded by the characteristic vector y in (5), corresponds to the assignment of y-variables.Since p j = 1 for j = 1, . . ., m, from each set I j the decision maker selects a single item, deciding on the solution x.Subsequently, the worst-case scenario is fixed, and the adversary selects a single item from each set I j , deciding on an alternative solution y.
The regret minimizing decision maker would always select either X-item or a special item from any set I j .This follows from the fact that, if decision maker selects an Y -item, then in the worst-case scenario its cost will be B 2 , while the adversary would always be able to choose an X-item with cost 0.The decision maker would be better off never selecting an Y -item, having guaranteed cost no greater than B + 1 (the cost of the special item, selected only when it is not possible to select X-item, due to "forbidding" constraints).On the other hand, the regret maximizing adversary would select one of Y -items, whenever possible, which would have cost 0 in the worst-case scenario.Only when it is not possible to choose such an item (due to "forbidding" constraints) would the adversary select another item: either a special item, or an X-item, resulting in the cost B or B + 1.This happens when the adversary must choose an y-satisfying assignment in the corresponding clause.
Since the items must be selected preserving all the "forbidding" constraints, their choice corresponds to an assignment of truth values to all Boolean variables without any conflicts.
We will show that there exists an assignment of truth values to x-variables, such that for every assignment of y-variables at least one clause is satisfied, if and only if the value of a solution of the constructed instance is at most Z = (m − 1)B + m − 1.
Suppose that ∃∀ 3SAT instance is positive (i.e., at least one clause would always be satisfied both by x-and y-variables).Then there exists an assignment of x-variables, consisting of xsatisfying assignment in at least k clauses (1 ≤ k ≤ m), while in every y-assignment, at least one x-satisfied clause is also y-satisfied.In such case the total cost of decision maker's choice is mB + (m − k) (k X-items and (m − k) special items).In at least one set I j the adversary cannot select Y -item, and thus is forced to select an X-item.But since that item is also selected by the decision maker, its cost is cancelled when evaluating the regret.Consequently, the total regret is no greater than mB Suppose now that ∃∀ 3SAT instance is negative.Then for every assignment to x-variables consisting of k x-satisfied clauses, the adversary can select an y-assignment such that all these k clauses evaluate to false.Thus decision maker's cost is mB + (m − k), while the adversary's cost is always zero; the adversary can always select either an Y -item with cost 0, or, from a set I j that corresponds to a clause that is unsatisfied by x-variables, an X-item with cost 0 (since in that case the decision maker must have taken a special item from I j ).Consequently, the maximum regret is at least equal to mB > (m − 1)B + (m − 1) = Z, which concludes the proof.
Example 2. The following formula φ over X = {x 1 , x 2 , x 3 } and Y = {y 1 , y 2 } is a positive instance of ∃∀ 3SAT problem: Consider the assignment of x-variables (x 1 , x 2 , x 2 ) = (1, 1, 0).Then first 3 out of 4 clauses are x-satisfied, and it can be easily verified that all four possible y-assignments to (y 1 , y 2 ) would result in 1 out of the first 3 clauses also y-satisfied.This formula corresponds to the following instance of IRIS problem shown on Fig. 2.
Each item set contains an X-item that corresponds to the satisfying assignment of x-variables, and Y -items that correspond to all possible unsatisfying assignments of y-variables, as well as special items.Forbidden pairs between items from different sets are marked with dashed lines (for example, Y -item from item set 1 cannot be taken simultaneously with two first Y -items from item set 4, as the former corresponds to y 1 = 0, while the latter to y 1 = 1, etc.).The solution in which X-items are selected from first 3 sets, and the special item from the set 4, has maximum regret equal to 3B + 1, since the adversary is forced to select an X-item from one of the first 3 sets.
special item special item special item special item

Solution Algorithm
A standard technique for solving a general class of min-max regret optimization problems, for which the deterministic (nominal) version can be written as integer linear program, is based on solving a sequence of relaxed problems, with successively generated cuts in order to tighten the relaxation (see e.g., [3]).We develop a solution algorithm for IRIS based on this approach.Note that the worst-case cost scenario c(x) is an extreme point in U, defined as: , which can be written as Using this fact we can rewrite the problem (7) as where F is as defined in (6), and further: subject to: where C = F is the set of all feasible solutions of the RIS problem (6).
The above problem has exponentially many constraints (excluding trivial cases).Instead of solving it directly, we relax (9) by selecting a subset C ⊂ F of small size.After solving relaxed problem we obtain a solution (x, ẑ).Vector x is a feasible solution of (7), but not necessarily optimal.We can evaluate the maximum regret (5) for x, and use the value ẑ to check if x is also feasible for the non-relaxed problem, by testing if If this is the case, then x must be optimal for (7).Otherwise, the computed maximum regret R(x) produces an alternative solution ŷ, which can be added to the set C, tightening the relaxation.We resolve the new relaxed problem, and repeat the above procedure until either an optimal solution is found, or the stopping condition is met.The overview of this solution method is presented as Algorithm 1.
Algorithm 1 Solving decomposition by cut generation.
3: Compute ŷ and R(x) by solving the RIS problem in scenario c(x).
Return x * = x and terminate the algorithm.6: else

7:
Add cut ŷ to set C, and go to step 2. 8: end if Remarks.
1. Note that each time we solve the relaxed problem ( 8)-( 9) we establish an increasingly better lower bound value LB on (7) (by taking the optimal value of the objective function ( 8)).We may also discover subsequently better upper bound values U B, since each x is feasible for (7).Thus the above method allows to determine the relative gap g = (U B − LB)/U B to the optimal solution value.This is very useful in assessing the quality of suboptimal solutions.
2. For large problem instances, the number of generated cuts |C| may become very large in later iterations of the algorithm.This may significantly slow down the algorithm.However, many of these cuts would be inactive for optimal solution.Consequently, we set the time limit (e.g., 1 minute) for solving the relaxed problem ( 8)-( 9) in step 2, and if this limit is exceeded before an optimal solution to relaxation was found, we remove from C few constraints with the largest linear slacks in the previous solution, and retry solving.This assures that the size of the relaxed problem remains within reasonable range.Note, however, that in this process some constraints that would become essential at later iterations might be removed; they would need to be generated again and added to C, and this may potentially consume additional computational time.
3. When solving the relaxed problem ( 8)-( 9) using branch and bound scheme, we may obtain a number of feasible solutions that can be used for the warm-start initialization when resolving with a new cut added.

4.
For "transitive" instances (as in the second case of Theorem 2), solving the auxiliary RIS subproblem in step 3 can be performed using a dedicated polynomial time algorithm for min-cost max-flow [1].Otherwise the resulting integer linear problem can be solved via general branch & bound.

Initializing the set of cuts
The performance of the algorithm described above depends on the choice of initial set of vectors C = {y (k) } in Step 1.A simplest way to initialize it is to select a number of extreme scenarios c (k) (vertices of U), and obtain vectors y (k) by solving deterministic RIS problem for these scenarios.
We initialize this set by randomly sampling 100 extreme scenarios and solving RIS problem for them.We also add to this set the set of solutions found using an evolutionary heuristic method, and use their best value as an initial upper bound U B in the main algorithm.
In the initialization stage, we first compute the mid-point scenario solution (it cannot be the worst-case, but often achieves a good solution value), and use it as an initial solution in an evolutionary heuristic.The latter method iteratively searches for improved solutions by applying mutation and crossover operations to the pool of current best solutions (population).Mutation operation randomly changes the selection of items within each set (the number of changes, as well as the number of affected sets, is also selected randomly).The crossover operation takes a pair of solutions, and replaces randomly selected sets from the first solution with the corresponding sets from the second solution.Each change is performed only if it results in a feasible solution.Both operations are repeated a fixed number of times, resulting in a new population of currently best solutions, which are passed to the next iteration.The final population is returned as a result.
In the experiments we used 20 iterations on a population of size 10, applying 100 crossovers and mutations in each iteration.

Experimental Results
In this section we present the results of computational experiments conducted on a sequence of randomly generated problem instances.Each experiment consists of running the relaxationbased algorithm (with the initial set of cuts generated by the heuristic algorithm described in the previous section) on 10 instances for each fixed set of parameters.The considered parameters of IRIS problem instances are denoted: m -the number of item sets, r i -the number of items in set i, p i -the number of items required to be selected from set i, K -the number of constraints of type (3).Column labeled n contains the number of (binary) decision variables in the resulting MIP.Bounds of cost intervals of all items were randomly generated integers between 1 and 100.All computations have been performed using CPLEX 12.8 optimization software and Python programming language.
We distinguish two main types of instances: (a) normal instances and (b) transitive-constraints instances.For the former, the constraints (3) are generated by randomly sampling K pairs of items, each from different set.For the latter, additionally, a transitive closure of the "forbiddance" relation is generated, and the additional constraints are added to restrict the selection of items to one per equivalence class.In the results, an average number of these constraints is reported in the column labeled "K".
Each row in Tables 1-2 contains the results from running the algorithm on 10 instances.First, the instance parameters are given, followed by the mean value and standard deviation of elapsed time for the algorithm to terminate.The stopping condition was that either an optimal solution has been found, or iteration limit of 500 has been reached.The next two columns contain the mean value and standard deviation of the iteration count of these runs that resulted in optimal solution.Finally, the number of instances (out of 10) solved to optimality is reported in the column "opt.",followed by the the mean solution value.For the suboptimal ones, the last column contains the mean relative optimality gap, i.e., the value g = (U B − LB)/U B. We observe that the hardest instances often contain only few constraints of type (3).This is due to the fact that the larger the number K, the smaller is the set of feasible solutions F , making it easier to explore.On the other hand, they are typically difficult when it is required to select about the half of the items in each set.

Conclusions
We presented a generalization of the representatives selection problem with interval costs uncertainty, and considered it in the robust optimization framework with the maximum regret criterion.We characterized the computational complexity of the problem, and developed both exact and heuristic solution method.The main solution algorithm consists of cut generation and solving a sequence of relaxed problems.An evolutionary heuristic stage was proposed to initialize the algorithm with a set of cuts.While the considered robust problem is difficult to solve to optimality, we have demonstrated that it is possible to solve it for moderately-sized problem instances, and approximate solutions using a combination of heuristics and exact method.
The analysis of this problem can be extended in the future for different types of restrictions on item choices.For instance, some items may be required to be selected in bundles, and would be useless without simultaneously selecting specific items from other sets.

Figure 1 :
Figure 1: Example of flow network illustrating case 2) of Theorem 2.

Figure 2 :
Figure 2: Illustration for Example 2, showing the construction used in the proof of Theorem 4. Each of the 4 sets contains: an X-item with cost interval [0, B], a collection of Y -items with cost intervals [0, B 2 ], and a special item with cost interval [B + 1, B + 1].In the worst-case scenario, the items selected by decision maker have upper-bound costs, while all other items have lower bound costs.item set 1 item set 2 item set 3 item set 4

Table 1 :
Results for transitive-constraints instances for increasing average number of constraints.

Table 2 :
Results for normal instances.