Submodular maximization of concave utility functions composed with a set-union operator with applications to maximal covering location problems

We study a family of discrete optimization problems asking for the maximization of the expected value of a concave, strictly increasing, and differentiable function composed with a set-union operator. The expected value is computed with respect to a set of coefficients taking values from a discrete set of scenarios. The function models the utility function of the decision maker, while the set-union operator models a covering relationship between two ground sets, a set of items and a set of metaitems. This problem generalizes the problem introduced by Ahmed S, Atamtürk A (Mathematical programming 128(1-2):149–169, 2011), and it can be modeled as a mixed integer nonlinear program involving binary decision variables associated with the items and metaitems. Its goal is to find a subset of metaitems that maximizes the total utility corresponding to the items it covers. It has applications to, among others, maximal covering location, and influence maximization problems. In the paper, we propose a double-hypograph decomposition which allows for projecting out the variables associated with the items by separately exploiting the structural properties of the utility function and of the set-union operator. Thanks to it, the utility function is linearized via an exact outer-approximation technique, whereas the set-union operator is linearized in two ways: either (i) via a reformulation based on submodular cuts, or (ii) via a Benders decomposition. We analyze from a theoretical perspective the strength of the inequalities of the resulting reformulations, and embed them into two branch-and-cut algorithms. We also show how to extend our reformulations to the case where the utility function is not necessarily increasing. We then experimentally compare our algorithms inter se, to a standard reformulation based on submodular cuts, to a state-of-the-art global-optimization solver, and to the greedy algorithm for the maximization of a submodular function. The results reveal that, on our testbed, the method based on combining an outer approximation with Benders cuts significantly outperforms the other ones.

in two ways: either (i) via a reformulation based on submodular cuts, or (ii) via a Benders decomposition. We analyze from a theoretical perspective the strength of the inequalities of the resulting reformulations, and embed them into two branch-and-cut algorithms. We also show how to extend our reformulations to the case where the utility function is not necessarily increasing. We then experimentally compare our algorithms inter se, to a standard reformulation based on submodular cuts, to a state-

Introduction
The problem of maximizing a concave, strictly increasing 1 , and differentiable utility function f : R → R over a discrete set plays an important role in many applications. Examples can be found in, among others, investment problems with discrete choices such as infrastructure projects, venture capital, and private equity deals (Ahmed and Atamtürk [2]), project selection (Klastorin [21], Mehrez and Sinuany-Stern [27], Weingartner [36]), competitive facility location (Berman and Krass [4], Aboolian et al. [1], Ljubić and Moreno [25]), and combinatorial auctions (Feige et al. [11], Lehmann et al. [23]). In such utility-maximization problems, the function f typically models the decision maker's risk-averse attitude. When some of the coefficients of f are subject to uncertainty, the tools of stochastic optimization are used to maximize it in expectation over a typically discrete set of scenarios.
Together with his coauthors, Shabbir Ahmed made substantial contributions in advancing the state-of-the-art of mathematical programming methods for solving this challenging set of problems (Ahmed and Atamtürk [2], Yu and Ahmed [38,39]). In this article, we study optimization problems in which the expected value of the function f composed with a set union operator is maximized over a discrete feasible region. The expected value of the objective function is computed w.r.t. a set of coefficients taking values from a discrete set of scenarios. In this way, we generalize the family of problems addressed in the three aforementioned articles.

The problem introduced in Ahmed and Atamtürk [2]
Let N be a ground set of n items and M a finite set of m scenarios, each occurring with a strictly positive probability π i , i ∈ M, and let a i j ∈ R + , i ∈ M, j ∈ N , and d i ∈ R + , i ∈ M, be nonnegative real numbers. Letting the variable x j ∈ {0, 1} n be equal to 1 if and only if item j ∈ N is chosen, the problem tackled in Ahmed and Atamtürk [2] can be cast as the following Mixed Integer Nonlinear Program (MINLP): Here, X is a polyhedron encapsulating the constraints of the problem and X ∩ {0, 1} n is the discrete set of feasible solutions. From a stochastic-optimization perspective, Problem (1) asks for maximizing It is not difficult to see that the function h i , i ∈ M, is submodular, see Ahmed and Atamtürk [2], Yu and Ahmed [39], and so is the whole objective function of the problem. Thanks to this, Problem (1) can be reformulated as a Mixed Integer Linear Program (MILP) by exploiting a reformulation of Nemhauser et al. [29] featuring an exponential number of linear inequalities, one per subset S of the ground set N . Ahmed and Atamtürk [2] showed how to tighten such inequalities by a sequenceindependent sequential lifting procedure. In a follow-up article, Yu and Ahmed [39] further improved the strength of such inequalities for the case where the set of discrete choices X is subject to a single knapsack constraint. Further results on the nature of the exact lifting function adopted in Ahmed and Atamtürk [2] are reported in Shi et al. [33].
The specific function f considered in Ahmed and Atamtürk [2] and Yu and Ahmed [39] is the negative exponential function which is frequently used to model the behavior of risk-averse decision makers. Here, λ > 0 represents the risk-tolerance parameter and, the larger the value of λ and, thus, the closer f is to being linear, the less risk-averse the decision maker is. An illustration of such a function for different values of λ is provided in Fig. 1.

A generalization of Problem (1)
In this work, we study a generalization of Problem (1) in which, besides the set of items N , we are given an additional ground setN ofn metaitems. We assume that the two ground sets N andN are linked by a covering relationship modeled by a bipartite graph G = (N ∪ N , E) where, for each j ∈ N and ∈N , { j, } ∈ E if and only if item j is covered by metaitem . For each item j ∈ N , we define byN ( j) ⊆N the set of metaitems ofN by which j is covered and, for each ∈N , we define by N ( ) ⊆ N the set of items that are covered by .
The key feature of this generalized problem is that, in order to access an item j of N and benefit from its profit, the decision maker has to select at least one of the metaitems ∈N covering it. An illustration of an instance of the problem is reported in Fig. 2.
After introducing a binary variable y ∈ {0, 1} for each ∈N , equal to 1 if and only if metaitem is chosen, the problem can be cast as the following MINLP: max x∈{0,1} n y∈Y ∩{0,1}n i∈M where Y is a polyhedron encapsulating the constraints of the problem and Y ∩ {0, 1}n is the discrete set of feasible choices of metaitems. The parameters π , a, and d have the same meaning as in Problem (1), i.e., M contains the discrete-scenario realizations of the uncertain parameters a and d, and the objective function corresponds to the expected value of f over such realizations. For each j ∈ N , Constraints (2b) and (2c) imply that an item j ∈ N is taken if and only if at least a metaitem inN is chosen fromN ( j) ⊆N . Since the objective function of the problem is strictly increasing, Constraints (2c) are automatically satisfied in any optimal solution and can be dropped. Besides being binary, no further constraints are assumed for x.
To see that Problem (2) generalizes Problem (1), it suffices to observe that the latter is obtained from the former by definingN as a copy of N and defining the edge set E as a matching; this way, x = y is imposed and the constraints x ∈ X are translated into y ∈ Y .
We define the set-union operator expressing the covering relationship betweenN and N as With it, we can rewrite the objective function of Problem (2) as a linear combination with weights π i , i ∈ M, of m set functionsĥ i : 2N → R + , each of which is defined as the composition of f with a set functionq i : 2N → R + that maps the choice of metaitemsŜ into the value of the items thatŜ covers. For each i ∈ M,ĥ i is defined aŝ The functionq i frequently occurs in problems featuring a set covering component and it has been heavily studied in the combinatorial optimization literature-see, e.g., Schrijver [32], pag. 768. Due to d i and a i j being nonnegative for all i ∈ M, j ∈ N ,q i is submodular. Thanks to f being strictly increasing and concave, it is not difficult to see that its composition withq i (i.e.,ĥ i ) is submodular as well.
Problem (2) is N P-hard even in the simple setting where Y is defined by a single cardinality constraint and f is the trivial identity function f (z) := z. This is because, letting m = 1, a 1 j = 1 for all j ∈ N , d 1 = 0, and Y = {Ŝ ⊆N : |Ŝ| ≤ k}, Problem (2) admits a solution of value n if and only if there is a feasible solution to the N P-complete problem of covering N with at most k subsets from {N ( )} ∈N .

Applications of Problem (2)
Problem (2) arises in different applications. An important one is found in socialnetwork analysis when looking for "key players" or "influencers" to quickly distribute a piece of information through the network (Güney et al. [16], Wu and Küçükyavuz [37], Kempe et al. [20], Karimi et al. [19]). Both of the widely adopted independent cascade (IC) and linear threshold (LT) propagation models (based on probabilistic networks and random sampling) described in Kempe et al. [20] fit into the framework of Problem (2). In IC, the influence propagates on the graph according to a specified edge-propagation probability. In LT, each node of the graph is influenced only if the fraction of the node's neighbors that has become active (weighted by the corresponding influence weight) exceeds a node-specific stochastic threshold. Given a graphG = (Ṽ ,Ẽ), m so-called live-edge graphsG i = (Ṽ ,Ẽ i ), i ∈ M := {1, . . . , m}, are created by sampling the edges. For each i ∈ M, π i is defined as probability of sampling graph i. The sampling process and, accordingly, the live-edge graphs one obtains, depend on the propagation mechanism in use. 2 In it, the set of items N is equal toṼ (therefore, N contains all the users of the social network) and the set of metaitemsN is equal to the set of influencers inṼ . For each user-influencer pair j, , E i contains the edge { j, } if and only if there is a path between and j inG i . Thus, for each i ∈ M and ∈N , N i ( ) ⊆ N is the subset of users that can be reached by influencer in the live-edge graphG i due to belonging to the same connected component as (see also Wu and Küçükyavuz [37] for further details). The goal is then to choose a subset of influencers so as to maximize the expected number of users that can be reached. Modeling such a problem as an instance of Problem (2) requires a slight modification of the set-union operator as well as of the functionq i defined in (3)-(4) to make them scenario-dependent: For each i ∈ M, j ∈ N , a i j is set to 1. Without loss of generality, all the results presented in this paper can be applied to such a scenario-dependent case as well. To simplify the presentation, we will restrict ourselves to the original definitions given in (3)-(4). We remark that, while in the standard applications of influence maximization referenced above the function f is the identity function, the methods we propose in this paper allow for taking a more general risk-averse utility function into account to measure the total influence in the network. A second application arises in marketing problems. Let M be the set of products that a marketer can advertise to a set N of potential customers, and letN be a set of marketing campaigns, where each campaign ∈N can only reach a subset N ( ) of customers. In such a case, the value of a i j , i ∈ M, j ∈ N , corresponds to customer j's demand of product i. Letting π i be a weight measuring the relevance of a product i ∈ M, the utility function f expresses the decreasing marginal utility of serving an additional unit of customer demand.
A third application arises in a class of stochastic competitive facility location problems with uncertain customer demands. In the deterministic competitive facility location problem (see Ljubić and Moreno [25], Berman and Krass [4], Aboolian et al. [1] and further references therein), a subset of facilities has to be opened subject to a budget constraint in such a way that the customer demand that is satisfied is maximized. In this application, the setN corresponds to a set of potential facility locations and the customers are represented by the set N . When adopting, e.g., the multinomial logit model, one can show that the market share function satisfies the properties of function f . In the stochastic setting, the expected market share is then calculated over the set M of possible scenario realizations, each with probability π i , i ∈ M, where a i j corresponds to the demand of customer j ∈ N under scenario i ∈ M.

Contribution and outline of the paper
Throughout the paper, we focus on the case with n n, i.e., with a number of items orders of magnitude larger than the number of metaitems, which holds in many applications, including those we introduced before. The main contribution of the paper is an exact method for solving Problem (2) based on a double-hypograph decomposition by which the many variables associated with the item set N are projected out. Such a decomposition exploits the structural properties of the utility function f and of the set-union operator. In it, f is linearized via an outer-approximation method, whereas the set-union operator is linearized in two ways: either (i) via a reformulation based on submodular cuts or (ii) via a Benders decomposition. In particular, we show that, assuming f can be computed in constant time, the inequalities arising from the outer approximation combined with the Benders decomposition can be separated in linear time even for fractional points. We compare the strength of the two resulting mixed integer linear programming reformulations from a theoretical perspective, and also show how to extend them to the case where the utility function is not necessarily increasing. Lastly, we embed the reformulations in two branch-and-cut algorithms, the most efficient of which, according to our computational experiments, allows for solving to optimality instances with up to n = 20,000,n = 60, and m = 100 in a short amount of computing time.
The paper is organized as follows. After recalling some background notions on submodularity in Sect. 2, we introduce our decomposition approach and our two reformulations in Sect. 3. The strength of the inequalities of these two reformulations is compared in Sect. 4. Section 5 presents an extension of our method to the case where the utility function is not necessarily increasing. In Sect. 6, we demonstrate via computational experiments the advantages offered by the algorithms we propose. Concluding remarks are reported in Sect. 7. In the Appendix, we also investigate the construction of worst-case instances for the case where Problem (2) is approximately solved via the classical greedy algorithm proposed in Nemhauser et al. [29] for maximizing an increasing submodular function subject to a k-cardinality constraint.

Preliminaries
In this section, we briefly summarize the basic terminology and notions used in optimization problems involving submodular functions as well as the key result of Nemhauser et al. [29] that leads to a direct MILP reformulation of Problem (2).
Let h : 2 N → R be a generic set function defined for some ground set N . The associated set function h j (S) := h(S ∪ { j}) − h(S), j ∈ N , S ⊆ N , is called the marginal contribution of j with respect to S. We call the function h increasing if h j (S) ≥ 0 for all j ∈ N , S ⊆ N (in the literature, such a function is sometimes called nondecreasing), and submodular if h j (S) ≥ h j (T ) for all S, T ⊆ N with S ⊆ T and j ∈ N . Throughout the paper, we report the name of the submodular function to which a marginal contribution is referred as a superscript (as in h j ). Every submodular set function h enjoys the following properties: Proposition 1 (Nemhauser et al. [29]) If h is submodular, then: By applying Proposition 1 to each functionĥ i , i ∈ M, featured in Problem (2), we straightforwardly obtain the following MILP reformulation, which we refer to as the Reformulation based on Submodular Cuts, or Ref(SC): For the correctness of the reformulation, we refer the reader to Nemhauser et al. [29]. An alternative correct reformulation is obtained by replacing Constraints (SCĥ 1 ) by Such constraints are dominated by Constraints (SCĥ 1 ) if there is an item j ∈ N which is covered by a single metaitem ∈N (i.e., such that |N ( j)| = 1) and a i j > 0, but as strong otherwise. This is because, by definition ofĥ i , ĥ i (N \ { }) = 0 unless there is an item j ∈ N such thatN ( j) = { } and a i j > 0. Constraints (SCĥ 3 ) will be useful in the analysis carried out in Sect. 4. The main advantage of Ref(SC) is that it is linear and it solely works in the y space. Similar reformulations have been extensively used in many works involving submodular functions. In spite of this, they are known to be rather weak ((Ahmed and Atamtürk [2], Ljubić and Moreno [25], Nemhauser et al. [28]). This is confirmed by the computational experiments we report in Sect. 6. on either submodular cuts or Benders cuts. This results in two MILP reformulations that only work in the y space. As we will see, both reformulations are solved much more efficiently by a branch-and-cut method than Ref(SC).

Double-hypograph decomposition
The key idea of our decomposition approach is reformulating Problem (2) in such a way that the functionĥ i , i ∈ M, is decomposed into its two parts f andq i .
Letting y be the characteristic vector ofŜ ⊆N , we rewrite the set functionq i (Ŝ), i ∈ M, as the following functionQ i (y) : {0, 1}n → Z + : can be interpreted as the value function of the subproblem of computingq i (Ŝ) for a subset of metaitemsŜ ⊆N with incidence vector y ∈ Y ∩ {0, 1}n. Such a function is studied, e.g., by Cordeau et al. [6] in the context of maximal-covering location problems. The problem of computingQ i (y) can be solved in linear time O(|E|) by letting x * j := 1 for each j ∈ N with ∈N ( j) y ≥ 1 and letting x * j := 0 otherwise. Moreover, as the problem only features upper bounds on the x variables (equal to min{1, ∈N ( j) y } for each j ∈ N ), its Linear Programming (LP) relaxation is integer. This property will be useful later.
Let η i , w i , i ∈ M be two auxiliary variables. We introduce the following double hypograph decomposition in which two hypograph reformulations are applied in sequence: This decomposition is correct as, in any optimal solution, w i = f (η i ) holds due to π i > 0, i ∈ M, and η i =Q i (y), i ∈ M, holds due to f being strictly increasing. Starting from this decomposition, in the following subsections we propose two MILP reformulations for Problem (2) that belong to the (w, y) space and feature O(n+ m) variables. Both reformulations are obtained by projecting out the n decision variables x associated with the items j ∈ N as well as the m auxiliary variables η i , i ∈ M.

Projecting out the Á variables via an outer approximation
The auxiliary variable η i , i ∈ M, can be projected out via Fourier-Motzkin elimination after applying an outer-approximation technique to Constraints (HYPO 1 ).

Proposition 2 Constraints (HYPO 1 )-(HYPO 2 ) are equivalent to
Proof Since f is concave, for each i ∈ M the set of pairs (w i , η i ) ∈ R 2 that satisfy Constraint (HYPO 1 ) forms a convex set. By means of an outer-approximation technique, we can therefore restate Constraints (HYPO 1 ) as: where f is the first derivative of f and [0, j∈N a i j + d i ] is a superset of the set of values that η i (and, therefore,Q i andq i ) can take. As f > 0 due to f being strictly increasing, we can combine Constraints (5) and (HYPO 2 ) using Fourier-Motzkin elimination to project out the η i variables. This results in Constraints (OA).
In the following two subsections, we discuss two ways of obtaining MILP reformulations of Problem (2) starting from Constraints (OA). The first one exploits the submodularity ofq i , i ∈ M. The second one relies on the integrality property of the LP relaxation of (VF), and it exploits LP duality in a Benders-cuts fashion. The idea is to combine Constraints (OA) with a finite collection of affine functions yielding a relaxation of the hypograph of, respectively,q i andQ i , i ∈ M, that is tight at every integer point.

Reformulation based on outer approximation plus submodular cuts: Ref(OA+SC)
Asq i is submodular, the following constraints are valid due to Proposition 1: Combining these constraints with Constraints (OA) of Proposition 2, we obtain the following constraints for each choice ofŜ ⊆N , i ∈ M, and p ∈ [0, j∈N a i j + d i ]: By restricting ourselves to p =q i (Ŝ) (which suffices to obtain a correct reformulation-see Proposition 3), we obtain the following MILP reformulation of Problem (2), to which we refer as the Reformulation based on Outer Approximation plus Submodular Cuts, or Ref(OA+SC):
When solving the Ref(OA+SC) with a branch-and-cut method, to obtain a correct algorithm it suffices to check for the existence of violated constraints among (OA+SC 1 ) and (OA+SC 2 ) after a solution (w * , y * ) with y * integer is found (thus carrying out a so-called lazy cut separation). Such solutions can be separated very efficiently:

Proposition 4
Assume that f and f can be computed in constant time. Given a pair (w * , y * ) with w ∈ R m and y * ∈ {0, 1}n, the separation problem asking for a constraint among (OA+SC 1 )-(OA+SC 2 ) that is violated by (w * , y * ) can be solved in linear time O(|E|) for each i ∈ M.
Proof As observed in the proof of Proposition 3, Constraints (OA+SC 1 )-(OA+SC 2 ) impose w i ≤ f (q i (Ŝ)) for every setŜ ⊆N . LetŜ * := { ∈N : y * = 1}. If (w * , y * ) is infeasible, there is some i ∈ M such that the two constraints in (OA+SC 1 )-(OA+SC 2 ) corresponding to i ∈ M andŜ =Ŝ * are violated. To solve the separation problem, we only need to compute the coefficients of such constraints. Sinceq i can be computed in linear time by construction and f and f can be computed in constant time by assumption, all these coefficients can be computed in linear time O(|E|). Moreover, the coefficients q i (N \ { }), ∈Ŝ, and q i (∅), ∈N \Ŝ, can be precomputed since they do not depend onŜ * .

Reformulation based on outer approximation plus Benders cuts: Ref(OA+BC)
Let us focus on the subproblem underlying the value functionQ i , i ∈ M, defined in (VF). Assuming y ∈ Y ∩ {0, 1}n, every optimal solution to the LP relaxation of the subproblem of computingQ i in which x ∈ {0, 1} n is replaced by x ∈ [0, 1] n can be assumed to be binary without loss of generality. As the subproblem is bounded, we can consider its LP dual. Let δ and σ be the dual variables associated with, respectively, the constraints linking the x and the y variables and the unit upper-bound constraints on x. Let be the polyhedron of the dual feasible solutions to (VF) (for simplicity, we drop the index i when referring to (δ, σ )). By LP duality, the following holds: In the Benders decomposition literature, (BS) is called the Benders subproblem.
Letting P e i , i ∈ M, denote the set of extreme points of P i , we derive the following Benders cuts: Thanks to Proposition 2 and by restricting ourselves to p =Q i (ŷ) for all the not necessarily continuousŷ ∈ Y (this suffices to obtain a correct reformulation-see Proposition 5), we obtain the following alternative MILP reformulation of Problem (2), which we refer to as the Reformulation based on Outer Approximation plus Benders Cuts, or Ref(OA+BC): whereŜ := { ∈N :ŷ = 1}. Notice that, even if we restrict ourselves toŷ ∈ Y ∩{0, 1}n (i.e., toŜ ∈N ), we have a valid formulation. This is shown in the following proposition: Proof Let y * ∈ Y be an optimal integer solution to Problem (2). Pick an optimal pair For each i ∈ M, it is easy to verify that the instance of Constraints (OA+BC) corresponding toŷ = y * and (σ ,δ) = (σ * , δ * ) boils down to w i ≤ f (Q i (y * )). In particular, due to π i > 0, i ∈ M, this guarantees that i∈M π i w i = i∈M π i f (Q i (y * )).
For a given (not necessarily integer) y * ∈ [0, 1]n, optimal solutions to (BS) can be computed in closed form according to the following lemma.

Lemma 1 A pair (δ,σ ) ∈ R n+n is an optimal solution to (BS) if and only if it satisfies the following conditions:
Proof Problem (BS) decomposes into n subproblems, one per item j ∈ N . Each subproblem asks for a value for δ j and σ j δ j + σ j ≥ a i j at minimum cost ( ∈N ( j) y )δ j + σ j . We refer to it as (δ,σ ). If ∈N ( j) y < 1, the coefficient of δ j is smaller than the coefficient of σ j and, thus, we haveδ j = a i j andσ j = 0. If ∈N ( j) y > 1, the coefficient of δ j is strictly larger than the coefficient of σ j and, thus, we haveδ j = 0 andσ j = a i j . If ∈N ( j) y = 1, the coefficients of δ j and σ j are identical and (σ ,δ) withδ j = γ j a i j andσ j = (1 − γ j )a i j is optimal for every γ j ∈ [0, 1].
The third case in the lemma arises when the Benders subproblem is (dual) degenerate and admits multiple optimal solutions. Thanks to this, Lemma 1 generalizes the results given in [Cordeau et al. [6], Propositions 1, 2, and 4].
Thanks to Lemma 1, the following holds:

Proposition 6
Assume that f and f can be computed in constant time. Given a pair (w * , y * ) with w * ∈ R m and y * ∈ [0, 1]n (not necessarily integer), the separation problem asking for a constraint among (OA+BC) that is violated by , which implies that w * i is strictly larger than the outer approximation of f atQ i (y * ). Thanks to Lemma 1, the value of the dual multipliers (δ * , σ * ) satisfying Moreover, if we assume that f and f can be computed in constant time, the separation problem for Constraints (OA+BC) can be solved in linear time even when y * is fractional, a property which is not enjoyed by Constraints (OA+SC 1 )-(OA+SC 2 ). From a branch-and-cut perspective, this allows for the efficient separation of Constraints (OA+BC) not just when a solution (w * , y * ) with an integer y * is found, but also at each node of the branch-and-bound tree to produce tighter bounds.

On the strength of the new inequalities
In this section, we provide a theoretical comparison of the relative strength of the inequalities featured in Ref(SC), Ref(OA+SC), and Ref(OA+BC). Throughout it, we will compare pairs of constraints of different families corresponding to the same subset S ⊆N and to the same scenario i ∈ M. When comparing two constraints, we say that one dominates the other one if the coefficients of the former are at least as tight as those of the latter, and that the domination is strict if at least one of the coefficients of the former is strictly tighter.
Let N 1 be the set of items in N that can be covered by a single metaitem ∈N , namely, N 1 := { j ∈ N : |N ( j)| = 1}, and let ( j) be such a metaitem. Similarly, for every subset of metaitemsŜ ⊆N , let N 1 (Ŝ) be the set of items in N which can be covered by a single metaitem ∈Ŝ, namely, The definition of N 1 will be useful in the following analysis.

Comparison between the inequalities in Ref(SC) and Ref(OA+SC)
In this subsection, we make a pairwise comparison between (SCĥ 1 ) and (OA+SC 1 ) and between (SCĥ 2 ) and (OA+SC 2 ). After that, we show that (OA+SC 1 ) and (OA+SC 2 ) do not dominate one another.

Comparison between (SCĥ 1 ) and (OA+SC 1 )
Sufficient conditions showing that there are instances in which (SCĥ 1 ) and (OA+SC 1 ) do not dominate one another are reported in the following proposition, as well as a sufficient condition subject to which (SCĥ 1 ) dominates (OA+SC 1 ).
The proof is reported in Appendix A. The appendix also reports a numerical illustration of the no domination result between the two constraints.

Comparison between (SCĥ 2 ) and (OA+SC 2 )
Sufficient conditions showing that there are instances in which (SCĥ 2 ) and (OA+SC 2 ) do not dominate one another are reported in the following proposition. The proposition also provides sufficient conditions subject to which (SCĥ 2 ) dominates (OA+SC 2 ) and vice versa.

Proposition 8
For each i ∈ M andŜ ⊆N , we have: The proof is reported in Appendix B, together with a numerical illustration of the no domination result between the two constraints.

Comparison between (OA+SC 1 ) and (OA+SC 2 )
One may wonder whether the two constraints featured in Ref(OA+SC) dominate one another. We answer the question in the negative:

Remark 1
There is at least an instance of Problem (2) where (OA+SC 1 ) and (OA+SC 2 ) do not dominate one another.
To see this, consider an instance with N : As the latter is tighter at the coefficient of 1 − y 2 (in boldface) and weaker at the coefficients of y 3 and y 5 (underlined), the two inequalities do not dominate one another. Notice that, as the OA part of the two constraints is identical, the same remark applies to (SCĥ 1 ) and (SCĥ 2 ).

Comparison between the inequalities in Ref(OA+SC) and Ref(OA+BC)
As

Comparison between (OA+SC 1 ) and (OA+BC)
To simplify the analysis, we first consider the following additional submodular cut: Similarly to what we observed in Sect. 2 when comparing (SCĥ 3 ) to (SCĥ 1 ), we have that, for every i ∈ M andŜ ⊆N , (SCq 3 ) and (SCq 1 ) are equally strong if for all ∈Ŝ, i.e., unless N 1 = ∅ and there is an item j ∈ N which is covered by a single metaitem ∈N (i.e., such that |N ( j)| = 1) with a i j > 0; in such a case, (SCq 1 ) strictly dominates (SCq 3 ). We establish the following: Proof Due to Lemma 1, (δ,σ ) is an extreme point of P e i obtained by setting γ j := 0 for all j ∈ N . By the choice of multipliers in (6), we have j∈Nσ j = j∈N (Ŝ) a i j . Thus, for all ∈N . The claim follows.
As observed, Constraints (SCq 3 ) are dominated by (SCq 1 ). We show that the latter are subsumed by Constraints (BC).

Proposition 10
For each i ∈ M,Ŝ ⊆N , (SCq 1 ) is equal to the (BC) obtained by setting for each j ∈ N : (i.e., by setting γ j = 1 for all j ∈ N 1 , according to Lemma 1). Such a (BC) reads: The proof is reported in Appendix C. Proposition 10 implies that, for each i ∈ M andŜ ⊆N , Constraints (OA+SC 1 ) coincides with the following (OA+BC) constraints:

Proposition 11
For each i ∈ M,Ŝ ⊆N , (SCq 2 ) is dominated by the (BC) obtained by setting for each j ∈ N : (i.e., by setting γ j = 1 for all j ∈ N according to Lemma 1), and strictly dominated if ∃ ∈N \Ŝ : Such a (BC) reads: The proof is reported in Appendix D. Proposition 11 implies that, for each i ∈ M andŜ ⊆N , Constraint (OA+SC 2 ) is dominated by the following (OA+BC) constraint:

Comparison between (OA+BC 1 ) and (OA+BC 2 )
One may wonder whether (OA+BC 1 ) and (OA+BC 2 ) dominate one another. We answer the question in the negative:

Remark 2
There is at least an instance of Problem (2) where (OA+BC 1 ) and (OA+BC 2 ) do not dominate one another.
To see this, consider the instance used in Remark 1 and letŜ = {1, 2}. (OA+BC 1 ) and (OA+BC 2 ) read: As the latter is tighter at the coefficient of 1 − y 2 (in boldface) and weaker at the coefficients of y 3 and y 5 (underlined), the two inequalities do not dominate one another.

Corollary 2
There are instances where (SCĥ 2 ) and (OA+BC 2 ) do not dominate one another.
Prop. 8 Prop. 11 The latter inequality is tighter at the coefficients of y 3 and y 5 (in boldface) but weaker at those of 1 − y 2 and 1 − y 4 (underlined). Thus, the two inequalities do not dominate one another.

Summary of the theoretical results on the relative strength of the inequalities
A visual summary of the relative strength of the families of inequalities studied in this paper is provided in Fig. 3. An arrow pointing from A to B indicates that A is strictly dominated by B (subject to the indicated conditions). An arrow in both directions indicates that the two inequalities are identical. A connection without directions indicates that domination between the two inequalities may or may not hold based on the conditions we derived. The proposition corresponding to each relationship is reported next to its arrow. Solid lines refer to the general case where no assumptions on f are made. Dashed lines refer to the case where f is an affine function. Such a case is analyzed in the following proposition.
To summarize, the theoretical results we obtained allow us to conclude that, given As sketched in Sect. 1, we first reduce Problem (1) to (2) by settingN = N and defining the edge set E of the graph G as a matching. This induces a bijection between the sets N andN thanks to which each ∈N is mapped into a single j = j( ) ∈ N and each j ∈ N is mapped into a single = ( j) ∈N . Thus, we have x j = y ( j) and x j( ) = y for each j ∈ N and ∈N . This impliesQ i (y) = j∈N a i j x j +d i . Letting Given a (fractional) point x * ∈ X ∩[0, 1] n , the separation problem for Constraints (9b) for each i ∈ M is solved in linear time by simply setting p := j∈N a i j x * j + d i and computing f ( p) and f ( p) (this takes linear time O(n) assuming that f and its derivative can be computed in constant time).
In Yu and Ahmed [39], the authors exploit the submodularity of the objective function of Problem (1) and propose a version of the inequalities obtained in Ahmed and Atamtürk [2] that is stronger when the problem is subject to a single knapsack constraint. In particular, they propose a branch-and-cut algorithm based on the separation of the uplifted version of the submodular constraint (SCĥ 2 ) (see Inequality (9) Since the latter is tighter at the coefficients of 1 − x 1 and 1 − x 2 (in boldface) and weaker at the coefficients of x 3 , x 4 , x 5 , x 6 (underlined), the two inequalities do not dominate one another.
Since the latter is tighter than the previous one at the coefficients of 1 − 4 (underlined) and weaker at the coefficients of x 5 and x 6 (in boldface), the two inequalities do not dominate one another. Computational experiments comparing the impact of these inequalities on solving instances of Problem (1) are reported in Sect. 6.

Generalization of Problem (2) to concave and differentiable utility functions
In this section, we discuss the extension of our methods to the case where the utility function is concave and differentiable, but not necessarily strictly increasing (as assumed up to this point). To distinguish this new utility function from the previous one, we will refer to it as g. Such a case arises in applications where, e.g., the utility function g is composed of two terms: a revenue term which increases with the amount of customers (having a positive decreasing derivative due to economies of scale) and a negative component which accounts for the total amount of money spent for running the service (assuming a unit cost per customer which decreases with the number of customers). In such cases, once the system becomes overloaded and the optimal running capacity is exceeded, g switches from increasing to decreasing. Applications of this type can be found in, e.g., competitive facility location problems where the objective function is the difference between the market capture and the investment cost (see Küçükaydın et al. [22], Lin and Tian [24] and further references therein).
As an example of a utility function of this type, consider the following one where λ is the risk-tolerance parameter and α > 0 is an additional parameter establishing a trade-off between the increasing and the decreasing terms. An illustration of this function for different values of λ is provided in Fig. 4. As shown in the figure, larger values of λ lead to the function achieving its maximum for larger values of z.
For the case illustrated in the figure (where α = 0.5), such a maximum is reached for values of z larger than 1 whenever λ ≥ 2.
When combining a not increasing function such as the one defined in (g) with the set functionq i , the obtained functionĥ λ for the case of (g)) is the difference of two submodular functions. As such a function is not submodular in the general case, Ref(SC) and the submodular constraints in Proposition 1 are not valid for Problem (2). This implies that the inequalities introduced in Ahmed and Atamtürk [2] and Yu and Ahmed [39] are not valid either.
To account for this case, we extend the double-hypograph decomposition we proposed in Sect. 3.1 as follows: The new epigraph constraints (EPI) are added since g(η i ) can be decreasing for a sufficiently large η i and, thus, imposing η i ≤Q i (y) with (HYPO 2 ) does not suffice to guarantee η i =Q i (y) in an optimal solution. Thanks to a result by Edmonds [10], Constraints (EPI) can be restated as the polymatroid constraints where Pq i := ϑ ∈ Rn : These constraints can be separated by maximizing a linear objective function over Pq i using Edmonds' (polynomial time) greedy algorithm. Given a pair (η * i , y * ) with η * i ∈ R + and y * ∈ Y ∩[0, 1]n, the algorithm sorts the metaitems inN in nonincreasing order of the associated y * values, i.e., such that y * μ(1) ≥ y * μ(2) ≥ · · · ≥ y * μ(n) holds, where μ is a suitable permutation ofN . LetŜ μ( ) := {μ(1), μ(2), . . . , μ( )} be the set of the first metaitems according to new order. By setting whereŜ μ(0) := ∅, the pair (η * i , y * ) violates one of the polymatroid constraints if and only if ∈N ϑ μ( ) y * + d i > η * i . Since ϑ only depends on the permutation μ, the polymatroid constraints can be restated as where ϑ is defined as in (10) and μ is any permutation of the elements ofN . Due toq i being submodular, Constraints (POLY) are facet defining for the epigraph ofQ i (y), see, e.g., Atamtürk and Narayanan [3]. As before, we apply an outer approximation to Constraints (HYPO 1 ) and combine them with Constraints (HYPO 2 ) via Fourier-Motzkin elimination. Following the same derivations used for the case with f strictly increasing, we obtain Constraints (OA+SC 1 )-(OA+SC 2 ) and (OA+BC 1 )-(OA+BC 2 ). Since g can be either 0 or negative, though, such constraints are valid only for a setŜ ⊆N satisfying g (q i (Ŝ)) ≥ 0 and for a vectorŷ satisfying g (Q i (ŷ)) ≥ 0, respectively (note that, at any point where g takes value 0, these constraints just boil down to w i ≤ U B, where U B is the maximum value of g). For every vectorŷ ∈ Y satisfying g (Q i (ŷ)) < 0, the combination of the outer approximation of Constraints (HYPO 1 ) with Constraints (EPI) via Fourier-Motzkin elimination leads to We call Ref(OA+SC+POLY) and Ref(OA+BC+POLY) the reformulations featuring (POLY) (imposed only forŜ ⊂N : g (Q i (ŷ)) < 0) and either (OA+SC 1 )-(OA+SC 2 ) or (OA+BC 1 )-(OA+BC 2 ) (the former imposed only forŜ ⊂N : g (q i (Ŝ)) ≥ 0 and the latter only forŷ ∈ Y : g (Q i (ŷ)) ≥ 0)). Experiments carried out with these two reformulations with the utility function (g) are reported in Sect. 6.
It is worth mentioning that for the case where the utility function is not strictly increasing and one has access to a decomposition ofĥ i (for each scenario i ∈ M) as a difference of two submodular functions, the recently-proposed method of Atamtürk and Narayanan [3] can be an alternative to the approach proposed in this section. Even though such a decomposition always exists, it is not always trivial to find it. Hence, one of the advantages of our method, compared to the one of Atamtürk and Narayanan [3], is its potentially wider applicability as it does not require finding such a decomposition.

Computational results
In this section, we assess the computational effectiveness of the MILP reformulations proposed in the paper. We carry out three sets of experiments, considering i) instances of Problem (2) featuring a strictly increasing function f (Sect. 6.1), ii) instances of Problem (2) featuring a not increasing function g (Sect. 6.2), and iii) instances of Problem (1), studied in Ahmed and Atamtürk [2] (Sect. 6.3).
Our experiments are performed on a single node of the IRIDIS 5 cluster at the University of Southampton, equipped with 40 dual 2.0 GHz Intel Skylake processors and 192 GB of DDR4 memory. Our code is written in C/C++ and compiled with gcc 9.2.1, using the -O3 compiler optimization flag. Our branch-and-cut algorithms are implemented in CPLEX 12.9.0 using the CALLABLE LIBRARY framework, and run in single-threaded mode. 3 We set a time limit of 600 seconds for each run. 4

Experiments for Problem (2) with a strictly increasing utility function
We experiment with four branch-and-cut (B&C) algorithms: SC, OA+SC, OA+BC, and OA+BC+f. SC is based on the standard reformulation Ref(SC) that relies on Constraints (SCĥ 1 ) and (SCĥ 2 ) (see Sect. 2). OA+SC is based on the reformulation Ref(OA+SC) that relies on Constraints (OA+SC 1 ) and (OA+SC 2 ) (see Sect. 3.3). OA+BC and OA+BC+f are based on the reformulation Ref(OA+BC) that relies on Constraints (OA+BC 1 ) and (OA+BC 2 ) (see Sect. 3.4); the separation is carried out only for an integer y * in the former and for both an integer and a fractional y * in the latter. The separation is performed with a violation tolerance of 10 −6 .
We consider four different sets of constraints to define the feasible region Y of Problem (2): (i) a k-cardinality constraint ∈N y ≤ k, used in, e.g., Nemhauser et al. [29]; (ii) a knapsack constraint ∈N β y ≤ B with weights β ≥ 0, ∈N , and budget B ≥ 0, used in, e.g., Sviridenko [34], Ahmed and Atamtürk [2], Yu and Ahmed [38]; (iii) a set of partition-matroid constraints ∈N q y ≤ k q , with 1 ≤ q ≤ p, according to whichN is partitioned into p groupsN 1 , . . . ,N p , each subject to a cardinality constraint with budget k p ; these constraints are used in, e.g., Friedrich et al. [14], Sviridenko [34]; (iv) a set of conflict constraints y + y ≤ 1, { , } ∈ E, used to model pairwise incompatibilities between metaitems; these constraints are based on a conflict graph G = (N , E), where E is the set of edges representing the conflicts.
Motivated by the fact that Problem (2) subsumes the maximal-coverage location problem with uncertain customer demands and a concave utility function, we generate our instances by extending the procedure proposed by ReVelle et al. [30] and Cordeau et al. [6] for the deterministic version of such a problem. To model the covering relationship between the sets of items N and metaitemsN , N andN are embedded in R 2 and the covering relationship is spatially induced by associating a covering radiuŝ R with each metaitem (see Fig. 2). The coordinates of each item and metaitem in N ∪N are chosen uniformly at random from [0, 30].
For each item j ∈ N , the setN ( j) contains all the metaitems ∈N whose Euclidean distance from j is no larger than the covering radiusR ∈ {2, 4, 5, 6} (smaller values ofR result in sparser instances). The items are sampled in such a way that each of them is covered by at least a metaitem. For each scenario i ∈ M and item j ∈ N , we set a i j to zero with probability r ∈ { 1 4 , 1 2 } and draw the value of u i j uniformly at random in {1, 2, . . . , 10} with probability 1 − r . We then introduce the coefficient Δ := i∈M π i j∈N u i j , corresponding to the expected maximum return across all scenarios, and define the normalized value a i j := u i j /Δ. For each choice of n andn and for each set of constraints, we generate 5 instances using a different seed for the random-number generator, 4 different values for the radiusR, and 2 different values for the scenario-item probability r .
We consider three main classes of instances: (i) the class k-card, containing 7200 instances subject to a k-cardinality constraint; they are obtained by choosing n ∈ {5000, 10,000, 20,000},n ∈ {40, 50, 60}, and k ∈ {10, 15}. (ii) the class kp, containing 3600 instances subject to a knapsack constraint; they are obtained by choosing n ∈ {5000, 10,000, 20,000} andn ∈ {40, 50, 60}; β and B are defined following the procedure described by Martello et al. [26] for generating strongly correlated instances, which are among the most difficult instances for the knapsack problem. 5 (iii) the class p-matroid, containing 1200 instances subject to partition-matroid constraints; they are obtained by choosing n ∈ {5000, 10,000, 20,000},n = 60, p = 3, and k q = 5, q = 1, . . . , 3. In all these instances, we set π i = 1 m and d i = 0, i ∈ M. For each of these three classes, we generate another class that is also subject to conflict constraints induced by an Erdös-Rény graph G = (N , E) with edge probability 0.01; in the following, we refer to these three classes as k-card & confl, kp & confl, and p-matroid & confl. Overall, the testbed contains six classes and a total of 24,000 instances.

Initialization of the B&C algorithms with a greedy solution
The greedy algorithm proposed in Nemhauser et al. [29] is a simple and effective way to obtain feasible solutions to the problem of maximizing a submodular function subject to a k-cardinality constraint. As shown in Nemhauser et al. [29], such solutions are (1 − 1 e )-approximate (where e is the base of the natural logarithm). For the basic variant of Problem (2) with a single k-cardinality constraint, the greedy algorithm starts with an empty set of metaitems (Ŝ = ∅), and, at each iteration, adds toŜ the metaitem ∈ arg max / ∈Ŝ i∈M π i ĥ i (Ŝ) (thereby selecting the metaitem with the largest marginal contribution) until |Ŝ| = k. The algorithm can be extended to work with all the constraints we introduced by selecting the item of maximum marginal contributions among those that can be added toŜ without violating any constraints. In the partition-matroid case, this algorithm achieves a 1 2 approximation ratio, see Cornuejols et al. [7]. In the knapsack case, an extension that combines it with a partial enumeration scheme achieves a 1 − 1 e The best results are highlighted in bold approximation, see Sviridenko [34]. As the problem of maximizing a linear function subject to a set of conflicts is inapproximable to withinn 1− for any > 0 unless ZPP = N P Håstad [17], no constant-factor approximation factor is obtained when conflict constraints are imposed. In all of our experiments, we adopt the greedy algorithm to provide a first feasible solution to our branch-and-cut algorithms. In Appendix E, we illustrate how to construct a set of worst-case instances of Problem (2) subject to a k-cardinality constraint on which the greedy algorithm achieves the worst-case approximation factor of 1 − 1 e .

Comparison of the four B&C algorithms
In Table 1, we compare the performance of the four B&C algorithms (SC, OA+SC, OA+BC, and OA+BC+f) for solving Problem (2) with f strictly increasing. The table is vertically divided in six parts. The first part reports the instance class and the total number of instances it contains (# total). The next four parts report, for each of the four exact methods, the total number of instances solved to optimality within the time limit (denoted by # opt) and the average gap for the unsolved instances at the time limit. The gaps are calculated as U B−L B

U B
· 100, where U B and L B are, respectively, the best upper bound and the incumbent solution value achieved by the corresponding method within the time limit. The last part reports the gap obtained with the greedy algorithm for the instances that are not solved to optimality by OA+BC+f.
When comparing OA+SC to SC, the number of instances solved to optimality by OA+SC is about 80% larger and the average gap is about 3 times smaller. For all classes of instances except for the kp one, OA+SC solves three to four times more instances than SC.
Further performance improvements are obtained with OA+BC, which solves about 1700 more instances than OA+SC and reduces the average gap of OA+SC by about 40%. This is in line with our theoretical results, showing that Constraints (OA+BC 2 ) strictly dominate Constraints (OA+SC 2 ) (see Proposition 11) and that Constraints (OA+BC 1 ) are equivalent to Constraints (OA+SC 1 ) (see Proposition 10).
OA+BC+f achieves a drastic improvement when compared to OA+BC thanks to separating Constraints (OA+BC 1 )-(OA+BC 2 ) also on fractional points. In particular, OA+BC+f is able to solve 94% of the instances across all classes, obtaining an average gap for the remaining 1553 unsolved instances of only 0.3%. Performance profiles that clearly demonstrate the superiority of OA+BC+f for each class of instances are reported in Appendix G. 6 We note that our implementation of the greedy algorithm requires on average ≈ 1 sec of computing time. Let us consider the gap of this algorithm for the instances that cannot be solved to optimality by OA+BC+f, as reported in the last column of Table 1 (this gap is obtained by replacing L B by the value of the greedy solution). By comparing the gaps obtained with OA+BC+f to those obtained with the greedy algorithm, we observe that the average gap of the instances that are not solved by OA+BC+f decreases from 1.5% to 0.3%. Overall, Table 1 shows that the quality of the incumbent solutions found by the B&C algorithm OA+BC+f is superior to the quality of the solutions found by the greedy algorithm not only on those which OA+BC+f manages to solve to optimality, but also on those where the algorithm incurs in the time limit.
We now focus on the 7200 instances from the class k-card to identify the main instance features by which the computing times of OA+BC+f are affected. Figure 5 reports the computing times of OA+BC+f after aggregating the instances by the covering radiusR, the value of λ, the number of items n, and the number of metaitemŝ n. 7 The four box plots in the figure show that, by doubling the number of items, the average computing time doubles. However, by increasing the number of metaitems by 50%, the average computing time grows by a factor of 10. Increasing the length of the covering radius (and, thus, the density of G) also results in slightly larger computing times. In spite of this, very sparse instances obtained forR = 2 8 are extremely easy to solve for our B&C methods. For larger values of λ, the instances tend to become easier. This can be due to the fact that, the larger λ, the more the function f is close to being linear. Further analysis (not illustrated in the tables for reasons of space) indicates that, by increasing the number of scenarios from 50 to 100, the average computing time doubles; differently, changing the value of k from 10 to 15 does not have a significant impact on the computing time of OA+BC+f. Table 2 illustrates the most relevant features concerning the execution of OA+BC+f (the best-performing algorithm) on the 22,447 instances it solves to optimality: LP gap (computed as Z LP −Z Z LP · 100, where Z LP is the optimal solution value of the LP relaxation at the root node and Z is the optimal solution value of the problem), number of generated inequalities, number of branching nodes, and average computing time. 6 We also experimented with solving the problem directly using the formulation reported in (2) with BARON 20.4 (Sahinidis [31], Tawarmalani and Sahinidis [35]), one of the state-of-the-art solvers for global optimization. Extensive preliminary tests showed that BARON is computationally outperformed by all of the B&C algorithms. It fails to solve many instances with n = 5,000 items, and it is unable to solve any instance with n = 10,000. 7 The computing times are reported in logarithmic scale through their quantiles. The lines extending vertically from the boxes indicate the variability outside the upper and lower quantiles. The outliers are plotted as individual points. 8 These are the only instances with a curvature strictly smaller than one-see Appendix F.

Fig. 5
Computing times (in logarithmic scale) of OA+BC+f for the k-card instances aggregated by: (i) the covering radiusR, (ii) the risk-tolerance λ, (iii) the number of items n, and (iv) the number of metaitemŝ n. The number of instances solved within the time limit of 600 secs is reported above each boxplot. In particular, the gaps are never larger than ≈ 1.25% and their average values are smaller than ≈ 0.1%. The time taken to solve the LP relaxation is quite small and it varies as a function of the class of the instances. For what concerns the average number of Constraints (OA+BC 1 ) and (OA+BC 2 ) generated when separating fractional and integer points, the table show that, as one may expect, many more cuts are generated when fractional solutions are separated. Such an increased number of inequalities is well compensated by the smaller LP gaps. The average number of nodes is small in most of the instances, further evidencing the superior performance of OA+BC+f. The only exceptions are the instances from the kp class, for which several thousands of nodes are explored, albeit the time taken to solve each of them is extremely smallsmaller, on average, than for the other instances. The average computing time is in line with the percentage of instances solved for the different classes, showing that the hardest instances are the ones subject to partition-matroid constraints. Interestingly, while the introduction of conflict constraints leads to larger LP gaps, it does not affect too much the computing times, leading to an increase by only 14%, 22%, and 12%, for, respectively, the k-card, kp, and p-matroid classes.

Experiments for the generalization of Problem (2) with a not increasing utility function
We report the results obtained on instances of Problem (2) where the utility function is not increasing andĥ i , i ∈ M, is not submodular. We consider the utility function (g) introduced in Sect. 5, and set λ ∈ {0.25, 0.5, 0.75, 1, 2, 3, 4} and α = 0.5, considering two values of m ∈ {50, 100}. We consider three B&C algorithms: OA+SC+POLY, OA+BC+POLY and OA+BC+f+POLY. In each of them, Constraints (OA+POLY) are separated at every integer point and also at every fractional point but (based on preliminary experiments) only at the root node. The other constraints are separated as we did before, but only if g is positive at the point being separated: in OA+SC+POLY, we separate Constraints (OA+SC 1 )-(OA+SC 2 ) at integer points; in OA+BC+POLY, we separate Constraints (OA+BC 1 )-(OA+BC 2 ) at integer points; in OA+BC+f+POLY, we separate Constraints (OA+BC 1 )-(OA+BC 2 ) at integer and fractional points.
As testbed, we adopt a subset of 1120 instances of the subclass k-card witĥ n = 40, n ∈ {5000, 10,000}, and k = 10. In line with the previous experiments, each B&C algorithm is initialized with the greedy algorithm (see Sect. 6.1.2), halting it as soon as the marginal contribution becomes negative (note that, sinceĥ i , i ∈ M, is not submodular, the greedy algorithm does not provide approximate solutions).
The computational results are summarized in Table 3. They are aggregated by the risk-tolerance parameter λ (for a total (# total) of 160 instances per value of λ). For each Table 3 Comparison of three B&C algorithms for the utility function (g), with α = 0.5, aggregated w.r.t. the value of λ. The best results are highlighted in bold of the three B&C algorithms, we report the number of instances solved to optimality within the time limit of 600 seconds (# opt) and the average gaps for the remaining instances (gap). The results are in line with those obtained for the strictly increasing case, and they indicate that the outer approximation method combined with Benders cuts (OA+BC+POLY) clearly outperforms the outer approximation one combined with submodular cuts (OA+SC+POLY). This phenomenon is even more pronounced if Benders cuts are also separated at fractional points (OA+BC+f+POLY). Indeed, OA+BC+f+POLY solves the largest number of instances (616 out of 1120) and, on average, provides the smallest gaps for the unsolved ones (0.2%).
We note that the problem is harder for small values of λ where the decreasing term of g becomes dominant. For λ = 0.25, 0.5, only 16 instances are solved to optimality by OA+BC+f+POLY and the largest average gaps (between 0.5 and 0.9) are obtained. On such instances, the performance of the three algorithms is almost comparable. For larger values of λ, though, OA+BC+f+POLY becomes much better than the other two methods. In particular, all the instances with λ = 2, 3, 4 are solved to optimality by it. On average around 20 thousand (POLY) constraints are generated at integer points (with a maximum of 50 thousand) and less than one thousand constraints are generated at fractional points at the root node of the branching tree (with a maximum of three thousand). It is worth noticing that the number of generated (POLY) constraints tends to decrease by increasing the value of λ. This is not surprising since in this case the increasing part of the objective function becomes dominant.
The tests show that our decomposition method based on a double-hypograph reformulation (and, in particular, algorithm OA+BC+f+POLY) can be effectively applied to tackle Problem (2) also when the utility function is not increasing and, thus, the objective function is not submodular and the Ref(SC) reformulation that relies on the submodular cuts of Nemhauser et al. [29] cannot be adopted.

Experiments when solving the problem of Ahmed and Atamtürk [2]
We now present the results obtained with our methods applied for the solution of Problem (1) with a single knapsack constraint that is studied in Ahmed and Atamtürk [2], Yu and Ahmed [39] and which Problem (2) generalizes. We adopt the reformulation Ref(OA) derived in Sect. 4.5. In the experiments, we separate its constraints at integer points only (the computational benefits of separating fractional points are negligible for these instances). We refer to the resulting B&C method as OA.
We compare OA to the state-of-the-art B&C method of Yu and Ahmed [39] (which outperforms the previous one proposed in Ahmed and Atamtürk [2]), to which we refer as BnC-LI (LI stands for lifted inequalities).
To test the performance of OA against BnC-LI, we consider the same set of instances tested in Yu and Ahmed [39] with n ∈ {100, 150, 200} (number of items), m ∈ {50, 100} (number of scenarios) and λ ∈ {0.8, 1, 2} (parameter of the utility function f (z) = 1 − exp(− z λ )). For the way a and β are constructed, we refer the reader to Ahmed and Atamtürk [2]. For each value of these parameters, we consider 20 random instances as in Yu and Ahmed [39], for a total of 360 instances. 9 Since the source code of BnC-LI is not available, its results reported in Table 4 are directly taken from the table reported on page 161 in Yu and Ahmed [39], for which the authors used a Python code based on Gurobi 5.6.3, running on a 2.3 GHz x86 Linux workstation with 7 GB of RAM. While these results were obtained with Gurobi with an optimality gap of 10 −5 , in ours we run CPLEX with a tighter optimality gap of 10 −9 , thus obtaining solutions that are numerically more accurate.
In Table 4, we compare the performance of OA, BnC-LI, as well as of BARON (the latter is used for solving the MILP formulation of Problem (1) given in Sect. 1). The results are aggregated by the value of n, λ and m. For the two B&C methods OA and BnC-LI, we report the average number of cuts, the average number of B&C nodes, and the average computing time. For BARON, we report the average computing time.
While all three methods are able to solve every instance to optimality, they exhibit a substantial difference in terms of performance. As far as the computing time is concerned, the best approach is OA, which solves all the instances in an average time always smaller than 0.2 seconds, whereas the average computing time of BnC-LI can be more than one hundred seconds. OA scales much better than BnC-LI for an increasing number of items n, while BnC-LI tends to struggle for small values of λ. On average, the number of B&C nodes of OA is smaller (albeit only slightly) than that of BnC-LI (despite the fact that OA adopts a more stringent optimality tolerance). The difference is more substantial for the instances with a larger number of items n and a smaller value of λ. As for BARON, its computing times, while smaller than that of BnC-LI, are up to three orders of magnitude larger than those of OA.
When inspecting the maximum values (not reported in Table 4 for reasons of space) rather than average ones, we observed that the maximum computing time of OA never exceeds 0.32 seconds, whereas for BnC-LI it is as high as 800 seconds and for BARON as high as 15 seconds. The maximum number of cuts for OA never exceeds Overall, our analysis shows that, in spite of the experiments being run on different architectures and with different solvers, the difference in the computing times between OA and BnC-LI is of various orders of magnitude (up to 3) in favor of OA. We believe that such a difference is mainly due to the different complexity of the corresponding separation algorithms. Indeed, while the one in OA only takes linear time O(n), the one in BnC-LI requires a larger time, at least quadratic as shown in Proposition 12 and Corollary 14 of Yu and Ahmed [39]. Moreover, the separation problem in BnC-LI can be solved in polynomial time only for integer solutions and, for fractional solutions, a heuristic separation approach is adopted, differently from our implementation of OA, where fractional points are not separated.

Conclusions
In this article, we have studied the generalization of the problem of maximizing a concave, strictly increasing, and differentiable utility function over a discrete set studied by Ahmed and Atamtürk [2] that is obtained by combining the utility function with a set-union operator. We have introduced a double-hypograph reformulation for the problem that has allowed us to decompose it into its two main components, one related to the utility function and the other one related to the set-union operator. We have then linearized the two components: we have replaced the utility function by its first order outer-approximation (OA) and the set-union operator by either submodular cuts (SCs) or via Benders cuts (BCs  (1), we have shown that there is no dominance between the constraints proposed in Ahmed and Atamtürk [2], Yu and Ahmed [39] and those of our OA-based reformulation.
We have then extended our methods to the case where the utility function is not necessarily increasing and, thus, the objective function of the problem is not submodular, combining our double-hypograph reformulation with an epigraph reformulation of a submodular function obtained via polymatroid inequalities.
In our computational study, we have shown that a branch-and-cut approach based on Ref(OA+BC) drastically outperforms the alternative MILP reformulations as well as the state-of-the-art solver for mixed-integer nonlinear optimization BARON, also for the case where the utility function is not increasing. This is particularly true when tackling problems in which the number of items is many orders of magnitude larger than the number of metaitems. We remark that our approach can be straightforwardly generalized to the case in which a different utility function is considered in each scenario. The same is true if the utility function is a piecewise linear function. In particular, since, in such a case, only as many supporting functions as the number of pieces are needed, the overall approach is likely to be more efficient.
Many open questions still remain to be addressed for this rich optimization problem. One way to linearize the problem is to work in the (x, y) space and proceed along the lines of Ahmed and Atamtürk [2] to "project f out onto the x variables", which is a way of deriving outer-approximation cuts for a hypograph reformulation of the problem. Once the linearization is performed in the (x, y) space, the x variables could be projected out in a Benders-like fashion. Finally, one could obtain an alternative MILP formulation in the y space by generating Generalized Benders Cuts (Geoffrion [15], Fischetti et al. [12]) derived from the associated convex continuous subproblem. The connection between these different types of decomposition and linearizations and the strength of the respective MILP reformulations (both in the (x, y) and in the y space) remain an open topic for future research.
Another line of research addresses the investigation of problems where the function q i is not induced by the set-union operator. In particular, our decomposition approach and the inequalities we derived based on the outer approximation combined with submodular cuts can be applied to any problem involving a function h i = f •q i whereq i is an increasing submodular set function. A notable example is the one whereq i is the rank (or weighted rank) function of a matroid, e.g., the value function of a graphic matroid whose ground set is determined by the set variableŜ. In such a case, one could transformq i into the value functionQ i of a suitable optimization problem and construct Benders cuts analogue to those we derived by leveraging the full description of the convex hull of the problem underlyingQ i .
As f is increasing, i.e., the coefficient of 1− y in (OA+SC 1 ) is at least as tight as the one in (SCĥ 1 ), and strictly tighter if in addition q i (N \{ }) > 0 due to f being strictly increasing. 2. As f is concave, strictly increasing, and differentiable andq i is increasing: Ifq i (N ) =q i (Ŝ), the right-hand side coincides with the coefficient of 1 − y in (OA+SC 1 ). Since the latter is tighter at the coefficient of 1 − y 4 (in boldface) but weaker at those of y 1 , y 2 , y 3 , and y 5 (underlined), the two inequalities do not dominate one another.
While the latter is tighter than the former at the coefficient of 1 − y 4 (in boldface) and weaker at those of y 1 , y 2 , y 3 , and y 5 (underlined), the difference in the coefficients of y 1 , y 2 , y 3 , and y 5 is less pronounced.

B Proof of Proposition 8 and related numerical example
The proof relies on the following two lemmata. Proof 1. As f is concave, strictly increasing, and differentiable andq i is increasing: As f is increasing, i.e., the coefficient of y in (OA+SC 2 ) is at least as tight as the one in (SCĥ 2 ), and strictly tighter if q i (∅) > 0 in addition holds, due to f being strictly increasing. 2. As f is concave, strictly increasing, and differentiable andq i is increasing: As f is increasing, f (q i (∅)) ≤ f (q i (Ŝ)) if and only ifq i (∅) ≥q i (Ŝ), which holds if and only ifq i (Ŝ) =q i (∅). In such a case, the coefficient of y in (SCĥ 2 ) is at least as tight as the coefficient of y in (OA+SC 2 ). Proof 1. Since f is concave, strictly increasing, and differentiable andq i is increasing, we have: Thus, the coefficient of 1 − y in (SCĥ 2 ) is at least as tight as the coefficient of w ≤ 0.148 + 0.059y 1 + 0.105y 2 + 0.049y 3 + 0.105y 4 − 0.148(1 − y 5 ) w ≤ 0.148 + 0.052y 1 + 0.094y 2 + 0.043y 3 + 0.094y 4 The latter is tighter at the coefficients of y 1 , y 2 , y 3 , and y 4 (highlighted in boldface), but weaker at the coefficient of 1 − y 5 (underlined). Thus, the two inequalities do not dominate one another.

C Proof of Proposition 10
Proof As a consequence of Lemma 1, the pair (δ,σ ) in Equation (7) corresponds to the extreme solution to the dual of the Benders subproblem defined in (BS) that is obtained by setting γ j := 1 if j ∈ N 1 and γ j := 0 otherwise. Due to (7), we have: Using (11) and (12), (BC) can be rewritten as: The coefficient of y in the third term is equal to q i (Ŝ). As the summation in the second term only considers items j ∈ N (Ŝ) which belong to N 1 , we have: Thus, we can reorder the terms of the last two summations in (13), obtaining (BC 1 ). The proof is concluded by noting that

D Proof of Proposition 11
Proof With the choice of dual multipliers in (8), we rewrite (BC) as: Since a i j , the right-hand side of (SCq 2 ) can be rewritten as: (15) can be restated as: Such a quantity is larger than the right-hand side of (14). By rewriting the term j∈N (Ŝ)\N 1 (Ŝ) a i j as j∈N (Ŝ) a i j − ∈Ŝ j∈N ( )∩N 1 (Ŝ) a i j and collecting y for all ∈Ŝ, (14) is shown to coincide with (BC 2 ).

E Worst-case for instances for the greedy algorithm
As mentioned in Sect. 6.1.2, the greedy algorithm proposed in Nemhauser et al. [29] allows to compute good-quality feasible solutions to the problem of maximizing a submodular function subject to a k-cardinality constraint. In this section, we show how to construct worst-case instances for Problem (2) subject to a k-cardinality constraint, i.e., instances on which the greedy algorithm achieves the worst-case approximation factor of 1 − 1/e, asymptotically.
Letting m = 1 and f (z) := z, Problem (2) with a k-cardinality constraint corresponds to the Maximum Covering Problem (MCP), see, e.g., Cordeau et al. [6]. A procedure for creating worst-case instances for the MCP can be found in Hochbaum and Pathria [18]. Here, we show how to adapt such a procedure to create worst-case instances for Problem (2) subject to a k-cardinality constraint and with the function f (z) defined as in (f).
Let us introduce the matrix U = [u st ] with s = 0, 1, . . . , k and t = 1, 2, . . . , k, for some k ∈ N. This matrix has k + 1 rows and k columns with entries Each entry (s, t) of the matrix U is associated with an item j (there are k (k + 1) items in N in total) whose value in the single scenario i = 1 is The constant Δ is equal to the sum of all the entries of the matrix U . In addition, we introduce 2k metaitems defined as follows. The first k metaitems, which we denote by R = {r 1 , r 2 , . . . , r k }, correspond to the rows 1, 2, . . . , k of the matrix U ; each metaitem r s , s = 1, . . . , k, covers all the items of row s. The second k metaitems, which we denote by C = {c 1 , c 2 Figure 6 illustrates the value of the approximation factor as a function of λ and k, also considering the case where f is the identity function. In the latter case, the optimal solution value is equal to 1 and the ratio Z G /Z is 1 − 1 − 1 k k (see Fig. 6). We observe that, with f (z) = 1 − exp(− z λ ), the ratio Z G /Z already falls below 0.65 for λ = 10 and k > 30.
We conclude by briefly commenting a set of computational experiments carried out with OA+BC+f on the worst-case instances we introduced above with values of k up to 1000 (these results are not shown in a table for reasons of space). The experiments reveal that OA+BC+f achieves an excellent computational performance on these instances. Indeed, it is capable to solve to optimality all the instances at the root node of the branching tree within a fraction of a second, including the largest instances with k = 1000 that feature n = 1,001,000 items andn = 2000 metaitems.

F On the relation with the curvature of submodular functions
The set N 1 , corresponding to the set of items in N which can be covered by a single metaitem ∈N , is related to the notion of curvature of a submodular function introduced by Conforti and Cornuéjols [5]. For a monotone increasing set function f : 2 N → R + , we call curvature w.r.t. a set S ⊆ N the quantity (where the equality holds due to the monotonicity of f ). The notion of total curvature plays a central role in establishing the approximation factor of the greedy algorithm used for solving the problem of maximizing a submod-ular function subject to a cardinality constraint. In particular, Conforti and Cornuéjols [5] show that the greedy algorithm for this problem achieves a (1 − 1 e c f ) 1 c f approximation factor, which ranges from 1 when c f = 0 (i.e., when f is linear and the greedy algorithm is exact) to 1 − 1 e when c f = 1 (the tight worst-case factor already established by Nemhauser et al. [29]).
Let, for each i ∈ M, cq i be the total curvature ofq i , and let cĥ i be the total curvature ofĥ i . The following holds: Observation 1 For each i ∈ M, we have cq i < 1 (respectively cĥ i < 1) if and only if N 1 = ∅ and, for all ∈N , there is some j ∈ N ( ) with a i j > 0 which is only covered by (i.e., withN ( j) = { }).
Indeed, by definition of cq i (respectively, cĥ i ), we have cq i < 1 (respectively, cĥ i < 1) if and only if q i (N \ { }) > 0 (respectively, q i (N \ { }) > 0) holds for all ∈ N , which is the case if and only if, for each ∈N , there is some j ∈ N ( ) with a i j > 0 which is only covered by (i.e., withN ( j) = { }). As shown in the following, some of the results we derived for the comparison of the different inequalities we considered in this paper can be related to the total curvature cĥ i and cq i ofĥ i andq i , respectively. Indeed, in Sect. 2, we notice that Constraints (SCĥ 3 ) are dominated by (SCĥ 1 ) if there exists an item j ∈ N which is covered by a single metaitem ∈N , i.e., such that |N ( j)| = 1, and a i j > 0. If cĥ i < 1, this is satisfied for all ∈N .
When comparing the strength of Constraints (SCĥ 1 ) and (OA+SC 1 ), we showed in Proposition 7 that a sufficient condition for Constraint (SCĥ 1 ) to dominate (OA+SC 1 ) is given byq i (Ŝ) =q i (N ). The latter condition implies cq i = 1.
Concerning the strength of Constraints (SCĥ 2 ) and (OA+SC 2 ), we showed in Propo- G Performance profiles comparing SC, OA+SC, OA+BC, and OA+BC+f for the case of a strictly increasing utility function We provide a graphical representation of the relative computational performance of the four branch-and-cut algorithms SC, OA+SC, OA+BC, and OA+BC+f by the performance profiles (see Dolan and Moré [8]) reported in Fig. 7 for classes k-card and k-card & confl, in Fig. 8 for classes kp and kp & confl, and in Fig. 9 for classes p-matroid and p-matroid & confl.  For each instance and algorithm, let τ be the normalized time defined as the ratio of the computing time required to solve the instance taken by the algorithm and the minimum computing time taken by all algorithms. For each algorithm and for each value of τ in the horizontal axis, the vertical axis reports the percentage of the instances for which the corresponding algorithm spent at most τ times the computing time of the fastest algorithm. Each curve starts from the percentage of instances in which the corresponding algorithm is the fastest. At the right end of the chart, we can read the percentage of instances solved by a specific algorithm. The best performance is achieved by the algorithm whose curve occupies the upper part of the figure. The values on the horizontal axis are reported in logarithmic scale.
The figure clearly shows that OA+BC+f is the best algorithm for this sets of instances, as it is able to solve more instances and it is the fastest algorithm in most of the cases. The second best algorithm is OA+BC. These figure show that separat- ing Benders cuts not just for integer points but also for for fractional ones is crucial for the efficiency of the method. The two branch-and-cut algorithms based on submodular cuts, i.e., OA+SC and SC, achieve the worst performance and they are both outperformed by OA+BC+f (and by OA+BC).