1 Introduction

The bilevel programming framework corresponds to a hierarchical decision-making process, where two players are involved: an upper level player, or leader, and a lower level one, or follower. Each one controls some of the variables, has some constraints to satisfy and an objective function to optimize. The follower decisions depend on the choices made by the leader (and vice versa), resulting in an optimization problem with a nested structure. This hierarchical framework naturally applies to many real-life problems, where it commonly happens that several stakeholders, with possibly conflicting interests, are involved in the decision process. Bilevel aspects are also hidden in single-level programming; the Benders decomposition procedure (Benders 1962) derives from a bilevel interpretation of a single-level problem; some separation problems can be formulated as bilevel problems (Lodi et al. 2014; Mattia 2012, 2013); the idea of worst case realization in robust optimization (Ben-Tal et al. 2004; Bertsimas and Sim 2003) corresponds to a nested optimization problem, as in bilevel programming. The mixed integer linear bilevel programming problem LBP that we consider in this paper is reported below.

$$\begin{aligned} \text{ LBP } \quad&\min \textbf{c}^T \textbf{x}+ \textbf{d}^T {\varvec{\zeta }}{} & {} \nonumber \\&\textbf{A}\textbf{x}+ \textbf{L}{\varvec{\zeta }}\ge \textbf{b}\nonumber \\&x_i \text{ integer } \quad i \in N \nonumber \\&{\varvec{\zeta }}\in Y(\textbf{x}) \subseteq Z(\textbf{x}) = \arg \min \quad{} & {} \textbf{f}^T \textbf{y}{} & {} \nonumber \\{} & {} {}&\textbf{H}\textbf{y}\ge \textbf{r}- \textbf{G}\textbf{x}{} & {} \nonumber \\{} & {} {}&y_i \text{ integer }{} & {} i \in F \nonumber \end{aligned}$$

Vector \(\textbf{x}\in {\mathbb {R}}^{n_x}\) contains the leader variables and vector \(\textbf{y}\in {\mathbb {R}}^{n_y}\) the follower ones. Sets N and F include the indices of the leader and the follower variables, respectively, that are restricted to take integer values. Matrices \(\textbf{A}\in {\mathbb {R}}^{m_l \times n_x}\), \(\textbf{L}\in {\mathbb {R}}^{m_l \times n_y}\) and vector \(\textbf{b}\in {\mathbb {R}}^{m_l}\) define the leader constraints. Matrices \(\textbf{G}\in {\mathbb {R}}^{m_f \times n_x}\), \(\textbf{H}\in {\mathbb {R}}^{m_f \times n_y}\) and vector \(\textbf{r}\in {\mathbb {R}}^{m_f}\) correspond to the follower ones. Both the leader and the follower variables can appear in both the leader and the follower constraints. Vector \({\varvec{\zeta }}\in {\mathbb {R}}^{n_y}\) represents, for a given \(\textbf{x}\) chosen by the leader, the corresponding follower solution, if one exists. This solution must be optimal for the lower level problem for \(\textbf{x}\), or follower problem, below.

$$\begin{aligned} \text{ FOLL }(\textbf{x}) \quad&\min \textbf{f}^T \textbf{y}{} & {} \nonumber \\&\textbf{H}\textbf{y}\ge \textbf{r}- \textbf{G}\textbf{x}{} & {} \nonumber \\&\textbf{y}_i \text{ integer }{} & {} i \in F \nonumber \end{aligned}$$

The right-hand sides of the follower problem depend on the current values of the leader variables, that are constant values, as far as the follower is concerned. If \(\text{ FOLL }(\textbf{x}) \) admits an optimal solution, \(opt(\textbf{x})\) is the corresponding optimal value, and \(Z(\textbf{x})\) is the set of the optimal solutions to which the selected \({\varvec{\zeta }}\) must belong. Otherwise, if \(\text{ FOLL }(\textbf{x}) \) is infeasible or unbounded, that is, \(Z(\textbf{x}) = \emptyset \), leader solution \(\textbf{x}\) cannot be completed by a suitable follower solution and must be discarded. If \(Z(\textbf{x})\) contains more than one solution, either the leader or the follower must operate a selection, based on some criteria or policies, which define set \(Y(\textbf{x})\). That is, the bilevel problem becomes, indeed, a three level problem, where we have the leader level, the policy level and the follower level (Zeng 2020).

Two policies are considered for breaking ties when, for some \(\textbf{x}\), \(|Z(\textbf{x})| >1\). In an optimistic setting, it is assumed that ties are broken in favor of the leader, that is, \({\varvec{\zeta }}\) must be chosen among the solutions \(Y(\textbf{x}) = Y_o(\textbf{x})\) that minimize \(\textbf{d}^T \textbf{y}\) and have the property that \((\textbf{x},\textbf{y})\) satisfies the leader constraints. In a pessimistic setting, it is supposed that ties are broken against the leader, that is, \({\varvec{\zeta }}\) must have the property that \((\textbf{x},{\varvec{\zeta }})\) does not satisfy the leader constraints or, if no such \({\varvec{\zeta }}\) exists, it must be chosen among the solutions \(Y(\textbf{x}) = Y_p(\textbf{x})\) that maximize \(\textbf{d}^T \textbf{y}\) (Lozano and Smith 2017). Hence, the pairs \((\textbf{x},\textbf{y})\) with \(\textbf{y}\in Z(\textbf{x})\) that do not satisfy the leader constraints have consequences that depend on the policy. In an optimistic setting, such a pair authorizes the leader to force the follower into discarding \(\textbf{y}\); in a pessimistic setting, it allows the follower to forbid leader solution \(\textbf{x}\). In general, when \(|Z(\textbf{x})| > 1\) for some \(\textbf{x}\), the optimistic and the pessimistic policy may lead to different solutions, even when \(\textbf{f}= \textbf{0}\) (see Sect. 2.4). However, we derived necessary and sufficient conditions for the optimistic and the pessimistic setting to be equivalent (see Sect. 2.3). These conditions are satisfied by a well-known class of bilevel programming problems, the interdiction ones (Mattia 2021; Smith and Song 2020; Tang et al. 2016). The same happens for bilevel models coming from the single-level problems in Ben-Tal et al. (2004); Benders (1962); Lodi et al. (2014); Mattia (2012, 2013) mentioned above.

Another aspect to be considered is who breaks the ties, that is, if the policy level is managed by the leader or by the follower. One can assume that the follower has complete information on the leader objective function and constraints and that ties are broken directly by the follower (Lozano and Smith 2017). If the follower has no such information, ties must be broken by the leader (Bard and Moore 1990). In the optimistic setting, ties can be broken in two different ways: ex ante, using a feasibility argument, that is, eliminating the solutions not respecting the policy (\(Y(\textbf{x}) = Y_o(\textbf{x})\)); ex post, using an optimality argument, that is, without eliminating in advance the solutions not respecting the policy (\(Y(\textbf{x}) = Z(\textbf{x})\)), because all solutions not belonging to \(Y_o(\textbf{x})\) are guaranteed to be either infeasible or suboptimal anyway (see Sect. 2.2). The final outcome is independent of when ties are broken, if optimistic LBP admits an optimal solution, whereas it may be infeasible or unbounded, depending on the choice that has been made for breaking ties (see Sect. 5.1). It is not possible to break ties ex post, if the pessimistic policy is considered.

Below, we review some literature (Sect. 1.1) and discuss the contribution of the present manuscript (Sect. 1.2).

1.1 Literature review

Bilevel programming problems are very hard, both theoretically and computationally. Even when the leader and the follower variables are continuous (\(N=F=\emptyset \)), LBP is hard to solve in both the optimistic (Jeroslov 1985) and the pessimistic setting (Wiesemann et al. 2013). If the problem has upper-level continuous variables and lower-level integer ones (\(N=\emptyset \), \(|F|=n_y\)), the optimum may not even be attained (Bard and Moore 1990; Köppe et al. 2010). In the optimistic case, if the lower level problem is a linear programming one (\(F=\emptyset \)), it is possible to use linear duality to reduce LBP to a compact single-level (quadratically constrained or mixed integer) problem (Wen and Yang 1990). This is not possible, in general, for follower problems with some integer variables (\(|F|>0\)), where linear duality cannot be used, and for pessimistic problems, which are significantly more complicated than their optimistic counterparts (Xu and Wang 2014). For general results on bilevel programming and additional references, we address the reader to surveys (Colson et al. 2007; Dempe 2003; Liu et al. 2018; Vincente and Calamai 1994) and books (Bard 1998; Dempe 2002). Due to the nature of LBP, it is difficult to devise a general solution algorithm and several different approaches have been proposed. The common idea behind these approaches is to use a branch-and-cut algorithm, which starts from a suitable single-level relaxation of the bilevel problem and eliminates integer solutions that are not feasible from a bilevel perspective, by user-defined cutting planes or branching rules.

For single-level problems, the branch-and-cut approach (Padberg and Rinaldi 1991) is a well-known solution framework, which performs a branch-and-bound-type exploration of the solution space (Wolsey 1998) and possibly adds valid inequalities at each node of the branch-and-bound tree. The linear relaxation provides a valid bound on the solution of the integer problem. The purpose of the inequalities is to cut solutions of the linear relaxation of the current problem that do not belong to the integer hull. It may happen that a class of cuts is not able to separate a fractional vertex from the integer polyhedron, whereas another class is able to do that. There is a rich literature that investigates the relative strength of different valid inequalities and compares alternative formulations for the same integer problem.

In a bilevel context, most of these notions cannot be applied. For LBP, the problem obtained by relaxing the integrality restrictions on the variables does not provide a lower bound on the optimal value (Bard and Moore 1990). The only known single-level relaxation of LBP is the high point one (Bard and Moore 1990), that is obtained by removing the assumption that the follower solution must be optimal for \(\text{ FOLL }(\textbf{x}) \). When the high point relaxation is unbounded, it does not mean that the bilevel problem is necessarily unbounded as well (see Sect. 5.1). Bilevel cuts are used only on integer solutions; their purpose is not to strengthen the current formulation eliminating fractional vertices, but to remove integer solutions that are not feasible for the bilevel problem. If an integer solution is infeasible, any class of bilevel cuts is supposed to eliminate it. Algorithms for bilevel problems either branch on fractional solutions or rely on standard single-level cuts for fractional vertices not belonging to the integer hull of the high point relaxation (DeNegre and Ralphs 2009; Fischetti et al. 2017). No theoretical comparison of bilevel cuts and of the corresponding formulations is given, so far, in the literature.

For the optimistic setting, the first approach for integer variables is outlined in Bard and Moore (1990). In DeNegre and Ralphs (2009), a branch-and-cut algorithm using no-good cuts to enforce bilevel feasibility in purely integer problems is proposed. In Xu and Wang (2014), the authors solve problems where \(\textbf{G}\) is integer and develop an algorithm where bilevel infeasible integer solutions are eliminated by suitable branching rules. An enhanced branching rule is discussed in Liu et al. (2021). In Lozano and Smith (2017), a branch-and-cut approach based on the value function is presented; it requires indicator variables and the integrality of \(\textbf{G}\textbf{x}\) to enforce the bilevel feasibility of the produced solutions. In Wang and Xu (2017), the authors introduce an algorithm for problems with integer parameters and variables; it is based on the notion of scoop and uses a dedicated branching approach to eliminate bilevel infeasible points. In Fischetti et al. (2017), the authors define bilevel free sets, obtained by solving scoop problems, and use them within an algorithm based on intersection cuts, where some results require the integrality of \(\textbf{G}\textbf{x}+ \textbf{H}\textbf{y}- \textbf{r}\). Different cut categories are presented in Tahernejad et al. (2020). See (Kleinert et al. 2021) for a survey.

On the contrary, very few algorithms have been implemented for the pessimistic setting, which received less attention than the optimistic policy. Moreover, most of the papers either focus on one setting or on the other, while the contributions that consider both policies and investigate the relationship between the two settings are very limited. No conditions for the two policies to be equivalent when \(|Z(\textbf{x})| > 1\) for some \(\textbf{x}\) have been proposed so far in the literature. For theoretical studies on the pessimistic problem, see (Dempe et al. 2014) and other papers of the same authors. While there exists a natural way to impose that an optimistic solution of a bilevel problem is selected (see Sect. 2.2), the same result does not apply to the pessimistic setting. This makes the pessimistic version of the problem harder to solve than the optimistic one (Xu and Wang 2014). Apart from Lozano and Smith (2017), none of the algorithms mentioned above considers the pessimistic case. An alternative approach for pessimistic LBP is proposed in Zeng (2020).

1.2 The contribution

The aim of the paper is to study bilevel mixed integer programming problems, both in the optimistic and in the pessimistic version. We consider problems where integer variables can appear in both the leader and the follower problem and, hence, the problem cannot be reformulated as a compact single level one by using linear duality. The relationship between the optimistic and the pessimistic setting is investigated; necessary and sufficient conditions to identify cases where the two policies are equivalent, for problems with \(|Z(\textbf{x})| > 1\) for some \(\textbf{x}\), are derived (see Sect. 2.3). We also show that, although possibly counterintuitive, having \(\textbf{f}=\textbf{0}\) does not ensure that the policy can be neglected or that the bilevel problem can be transformed into a single-level problem for both policies (see Sect. 2.4). As far as we know, this is the first example of such analysis in the literature.

For both policies, a non-compact single-level reformulation of LBP based on a new family of inequalities, the follower optimality cuts, is presented (see Sects. 3.2 and 4.2). These inequalities are bigM constraints obtained by binarizing the general integer leader variables (Roy 2007), if any (see Sect. 3.1). Binarization has known disadvantages, due to the increase in the number of variables. However, it was proved that it may lead to stronger theoretical properties and to significantly smaller search trees, with respect to integer approaches (Bonami and Margot 2015; Dash et al. 2018). We generalize to the pessimistic case the approach based on the no-good inequalities, which were used, so far, only for optimistic problems (see Sect. 4.3) and present an alternative version of these inequalities that uses binarization (see Sect. 3.3), instead of requiring to satisfy a slack condition, as in their original definition (DeNegre and Ralphs 2009). This leads to some theoretical (see Sect. 3.3) and computational advantages (see 6.4).

Both the no-good inequalities and the follower optimality cuts define single-level reformulations of the bilevel problem in the original space, if the variables are binary. A theoretical comparison of the corresponding formulations is presented, thus investigating the relative strength of the cuts. We prove that, in general, neither the follower optimality cuts dominate the no-good cuts nor the opposite, but the no-good cuts are dominated by the cover inequalities of the follower optimality cuts (see Sect. 3.4). To our knowledge, this is the first theoretical comparison of bilevel cuts and formulations for integer problems in the bilevel literature. The cuts used in other approaches do not define a single-level reformulation of a bilevel problem in the original space. The inequalities in Lozano and Smith (2017) require indicator variables, while a known issue of the intersection cuts in Fischetti et al. (2017) is that they are not able to eliminate integer infeasible points on the frontier of the bilevel free set, unless some additional requirements are satisfied.

Finally, we describe a branch-and-cut algorithm to solve the problem (see Sect. 5). Since the cuts we use define single-level reformulations of the bilevel problem, we can adopt a standard branch-and-cut scheme that requires neither dedicated branching rules (Wang and Xu 2017; Xu and Wang 2014) nor branching on integer solutions (Bard and Moore 1990; Fischetti et al. 2017) nor accessing the current basis (DeNegre and Ralphs 2009; Fischetti et al. 2017) nor any assumptions on the integrality of the coefficients (DeNegre and Ralphs 2009; Lozano and Smith 2017; Fischetti et al. 2017). We also show that under some assumptions, the follower optimality cuts can be used to separate some fractional solutions as well, thus providing the first example of bilevel cuts able to do that (see Sect. 5.2). A computational testing is conducted to evaluate the proposed algorithm, using instances from the literature. The results demonstrate that the algorithm based on the follower optimality cuts outperforms the other approaches on most of the tested sets of instances (see Sects. 6.4 and 6.5).

Differently from the no-good cuts, the follower optimality cuts can be used also when the follower variables are continuous. In fact, similar inequalities are well-known in the context of stochastic programming for problems where \(\textbf{x}\) is binary and \(\textbf{y}\) continuous (Laporte and Louveaux 1993). They have also been used for optimistic bilevel problems with binary leader variables and continuous follower ones arising from electricity market applications (see Kleinert and Schmidt 2019 and other papers of the same authors). The results in Kleinert and Schmidt (2019) show that even for problems with purely continuous follower variables, where one could have chosen the compact reformulation via linear duality (Wen and Yang 1990), a branch-and-cut approach based on inequalities of this type has to be preferred.

In Sect. 2, we introduce some notation, make the necessary assumptions and investigate the relationship between the optimistic and the pessimistic setting. In Sects. 3 and 4, we define single-level reformulations for the optimistic and the pessimistic problem and theoretically compare the follower optimality cuts and the no-good ones. In Sect. 5, we outline the algorithm to solve LBP. In Sect. 6, we discuss the computational experience. Conclusions are given in Sect. 7.

2 The optimistic versus the pessimistic policy

Table 1 Summary of the notation

The relevant notation that is used throughout the paper is summarized in Table 1. For a matrix \(\textbf{T}\), \(\textbf{T}_j\) denotes its j-th column and \(\textbf{T}^i\) its i-th row. Let

$$\begin{aligned} P =&\{(\textbf{x},\textbf{y}): \textbf{A}\textbf{x}+ \textbf{L}\textbf{y}\ge \textbf{b}, \textbf{G}\textbf{x}+ \textbf{H}\textbf{y}\ge \textbf{r}, \\&x_i \text{ integer } \text{ for } i \in N, y_i \text{ integer } \text{ for } i \in F\} \end{aligned}$$

be the set of the leader/follower pairs satisfying both the leader and the follower constraints. The single-level problem obtained optimizing the leader objective function over P is known as the high point relaxation (HPR) of LBP (Bard and Moore 1990).

$$\begin{aligned} \text{ HPR } \quad \min _{(\textbf{x},\textbf{y}) \in P} \textbf{c}^T \textbf{x}+ \textbf{f}^T \textbf{y}\end{aligned}$$

The relationship between HPR and LBP is discussed in Sect. 5.1. Let \(P^{\textbf{x}}= \{\textbf{x}: \exists \textbf{y}\) such that \((\textbf{x},\textbf{y}) \in P\}\) be the projection of P in the space of the \(\textbf{x}\) variables. Denote by

$$\begin{aligned} P(\textbf{x})=\{\textbf{y}: \textbf{H}\textbf{y}\ge \textbf{r}- \textbf{G}\textbf{x}, y_i \text{ integer } \text{ for } i \in F \} \end{aligned}$$

the feasible region of the follower problem for a given leader solution \(\textbf{x}\).

2.1 Assumptions

We assume that all the variables have finite bounds or, equivalently, that P and \(P(\textbf{x})\) for any \(\textbf{x}\) are bounded sets. This ensures that the follower problem is not unbounded and that \(Z(\textbf{x})\) is a bounded set for any \(\textbf{x}\) (see Sect. 2.2); excludes that HPR, which is used as initial formulation of our branch-and-cut approach, is unbounded; guarantees that we can binarize the general integer variables (see Sect. 3.1). What happens when this assumption is relaxed is discussed in Sect. 5.1.

We also suppose that all the leader variables are integer, that is, \(|N| = n_x\). This eliminates the possibility to have an optimal value that cannot be attained, as in the examples in Bard and Moore (1990); Köppe et al. (2010). The proposed approach can handle the presence of some continuous leader variables, under Assumption 1 in Fischetti et al. (2017), which states that none of the continuous leader variables appears in the follower constraints. For the pessimistic case, we also require that no leader constraint contains both follower variables and continuous leader variables. Indeed, under these assumptions the continuous leader variables are of no relevance in the leader/follower relation, which is only affected by the integer ones. Formulations and algorithms based on the follower optimality cuts do not require any specific assumption on the follower variables that can be either integer (\(|F| = n_y\)) or mixed integer (\(0< |F| < n_y\)) or continuous (\(F = \emptyset \)). However, we do not consider the latter in the experiments. On the contrary, the no-good cuts require the integrality of all the follower variables, that is, \(|F| = n_y\).

2.2 Bilevel feasible solutions

Let B be the set of the pairs \((\textbf{x},\textbf{y}) \in P\) such that \(\textbf{y}\) is lower level optimal for \(\textbf{x}\).

$$\begin{aligned} B = \{(\textbf{x},\textbf{y}) \in P: \textbf{y}\in Z(\textbf{x})\} \end{aligned}$$

Belonging to B is, in general, a necessary but not sufficient condition for a pair \((\textbf{x},\textbf{y})\) to be a feasible solution of LBP. Indeed, as we discuss below, extra conditions are needed to ensure that the policy (optimistic or pessimistic) is enforced, if \(|Z(\textbf{x})|>1\) for some \(\textbf{x}\). Denote by

$$\begin{aligned} ZF(\textbf{x}) = \{\textbf{y}\in Z(\textbf{x}): \textbf{A}\textbf{x}+ \textbf{L}\textbf{y}\ge \textbf{b}\} \end{aligned}$$

the set of the optimal solutions \(\textbf{y}\) of \(\text{ FOLL }(\textbf{x}) \) such that \((\textbf{x},\textbf{y})\) is feasible for the leader constraints. Let

$$\begin{aligned} ZI(\textbf{x}) = Z(\textbf{x}) \setminus ZF(\textbf{x}) = \{\textbf{y}\in Z(\textbf{x}): \textbf{A}\textbf{x}+ \textbf{L}\textbf{y}< \textbf{b}\} \end{aligned}$$

be the solutions \(\textbf{y}\in Z(\textbf{x})\) with the property that \((\textbf{x},\textbf{y})\) does not satisfy \(\textbf{A}\textbf{x}+ \textbf{L}\textbf{y}\ge \textbf{b}\). \(ZI(\textbf{x})\) and \(ZF(\textbf{x})\) are a partition of \(Z(\textbf{x})\). Since \(B\subseteq P\), any pair in B must satisfy both the leader and the follower constraints. Then, pairs \((\textbf{x},\textbf{y})\) with \(\textbf{y}\in ZF(\textbf{x})\) belong to B, whereas the ones with \(\textbf{y}\in ZI(\textbf{x})\) do not belong to B. Define by \(Y_o(\textbf{x})\) and \(Y_p(\textbf{x})\) the following sets.

$$\begin{aligned}&Y_o(\textbf{x}) = \{\textbf{y}^* \in ZF(\textbf{x}): \textbf{d}^T \textbf{y}^* \le \textbf{d}^T \textbf{y}, \forall \textbf{y}\in ZF(\textbf{x})\}\\&Y_p(\textbf{x})=\{\textbf{y}^* \in ZF(\textbf{x}): \textbf{d}^T \textbf{y}^* \ge \textbf{d}^T \textbf{y}, \forall \textbf{y}\in ZF(\textbf{x})\} \end{aligned}$$

The set of the feasible solutions of the optimistic problem is defined as follows:

Definition 1

Let \(\varOmega \) be the set of the feasible solutions of optimistic LBP. Any \((\textbf{x},\textbf{y}) \in \varOmega \) must satisfy: \((\textbf{x},\textbf{y}) \in B\); \(\textbf{y}\in Y_o(\textbf{x})\).

The one for the pessimistic problem can be defined as below.

Definition 2

Let \(\varPi \) be the set of the feasible solutions of pessimistic LBP. Any \((\textbf{x},\textbf{y}) \in \varPi \) must satisfy: \((\textbf{x},\textbf{y}) \in B\); \(ZI(\textbf{x}) = \emptyset \); \(\textbf{y}\in Y_p(\textbf{x})\).

For the optimistic case, a non-empty \(ZF(\textbf{x})\) ensures that there is at least one feasible pair \((\textbf{x},\textbf{y}) \in \varOmega \) for the considered leader vector \(\textbf{x}\). This is not necessarily the case for the pessimistic setting. In fact, even when \(ZF(\textbf{x}) \ne \emptyset \) and, hence, \(Y_p(\textbf{x}) \ne \emptyset \), no pair \((\textbf{x},\textbf{y})\) for the considered \(\textbf{x}\) can belong to \(\varPi \), if \(ZI(\textbf{x})\) is non-empty. The presence of an optimal solution \(\textbf{y}\) of FOLL(\(\textbf{x}\)) with the property that \((\textbf{x},\textbf{y})\) does not satisfy the leader constraints, completely forbids the choice of that \(\textbf{x}\) vector (Lozano and Smith 2017).

In the optimistic setting, we can find a solution in \(\varOmega \) by optimizing the leader objective function over B, without the need to consider \(Y_o(\textbf{x})\), as we prove below.

Theorem 1

Optimistic LBP can be rephrased as

$$\begin{aligned} \min _{(\textbf{x},\textbf{y}) \in B} \textbf{c}^T \textbf{x}+ \textbf{d}^T \textbf{y}\end{aligned}$$

Proof

Although B may include integer pairs \((\textbf{x},\textbf{y})\) not belonging to \(\varOmega \), those solutions are never optimal. Suppose that this is not true, that is, problem \(\min _{(\textbf{x},\textbf{y}) \in B}\) \(\textbf{c}^T \textbf{x}+\) \(\textbf{d}^T \textbf{y}\) admits an optimal solution not respecting the optimistic policy. Let \((\textbf{x},\textbf{y}) \in B\) be an optimal solution of the problem such that \(\textbf{y}\in Z(\textbf{x}) \setminus Y_o(\textbf{x})\) and let \(\textbf{w}\) be a follower solution in \(Y_o(\textbf{x})\). Then, \(\textbf{d}^T \textbf{w}< \textbf{d}^T \textbf{y}\) and, hence, \((\textbf{x},\textbf{y})\) cannot be optimal, as \((\textbf{x},\textbf{w})\) has a smaller objective value. \(\square \)

The approaches for the optimistic setting in the literature, optimize over B and not over \(\varOmega \) and all bilevel feasibility cuts for the optimistic problem only eliminate pairs not belonging to B, whereas they are not able to cut away solutions in \(B \setminus \varOmega \). This is true for the follower optimality cuts and for the algorithm for the optimistic setting presented in this paper, as well. Moreover, separating over B is difficult, as it amounts to solve the follower problem.

Note that if optimistic LBP does not admit an optimal solution, optimizing over B or over \(\varOmega \) may lead to different outcomes. This may be the case, if we relax the assumption that the \(\textbf{y}\) variables have finite bounds (see Sect. 5.1). If \(Y_o(\textbf{x}) = \emptyset \) for any \(\textbf{x}\), because the problem of minimizing \(\textbf{d}^T \textbf{y}\) over \(ZF(\textbf{x})\) is unbounded, then \(\varOmega \) is empty and LBP is infeasible. Instead, the problem in Theorem 1 is unbounded. Optimizing over B corresponds to a situation where ties are broken ex post (see Sect. 1). In most of the literature on optimistic LBP, B is regarded as the set of the bilevel feasible solutions (DeNegre and Ralphs 2009; Fischetti et al. 2017; Xu and Wang 2014). It is easy to see that Theorem 1 does not apply to the pessimistic case, where ties cannot be broken ex post.

2.3 When the policy does not matter

In general, the sets of the optimal solutions for the optimistic and the pessimistic problem (\(\varOmega \) and \(\varPi \)) may be different when \(|Z(\textbf{x})| > 1\) for some \(\textbf{x}\). Now, we provide necessary and sufficient conditions to have \(\varOmega = \varPi \), that is, we identify cases where the policy does not matter, although the follower problem may have multiple optimal solutions.

Theorem 2

\(\varOmega = \varPi \) if and only if \(\varPi = B\).

Proof

Suppose that \(\varPi \ne B\) and let \((\textbf{x},\textbf{y})\) be a pair in \(B {\setminus } \varPi \). Since \((\textbf{x},\textbf{y}) \in B\), then \(\textbf{y}\in ZF(\textbf{x})\) and, since \((\textbf{x},\textbf{y}) \notin \varPi \), either \(ZI(\textbf{x}) \ne \emptyset \) or there exists \(\textbf{w}\in ZF(\textbf{x})\) such that \(\textbf{d}^T \textbf{w}> \textbf{d}^T \textbf{y}\). The former implies that there is no pair in \(\varPi \) for the given \(\textbf{x}\), whereas there exists at least one pair for \(\textbf{x}\) in \(\varOmega \), as \(ZF(\textbf{x}) \ne \emptyset \). The latter means that any pair \((\textbf{x},\textbf{y}^p) \in \varPi \) has the property that \(\textbf{d}^T \textbf{y}^p \ge \textbf{d}^T \textbf{w}\). On the contrary, any pair \((\textbf{x},\textbf{y}^o) \in \varOmega \) satisfies \(\textbf{d}^T \textbf{y}^o \le \textbf{d}^T \textbf{y}\). Then, \(\textbf{d}^T \textbf{y}^o \le \textbf{d}^T \textbf{y}< \textbf{d}^T \textbf{w}\le \textbf{d}^T \textbf{y}^p\) and \(\varOmega \ne \varPi \). Suppose now that \(\varOmega \ne \varPi \). Then, by definition, there exists \((\textbf{x},\textbf{y}) \in B\) such that it either belongs only to \(\varOmega \) or only to \(\varPi \). Therefore, either \(\varPi \ne B\), and the proof is complete, or \(\varOmega \ne B\). Assume that \(\varOmega \ne B\) and let \((\textbf{x},\textbf{y})\) be a pair in \(B{\setminus } \varOmega \). Since \((\textbf{x},\textbf{y}) \in B\), then \(ZF(\textbf{x})\ne \emptyset \) and, since \((\textbf{x},\textbf{y}) \notin \varOmega \), there exists at least another solution \(\textbf{w}\in ZF(\textbf{x})\) such that \(\textbf{d}^T \textbf{w}< \textbf{d}^T \textbf{y}\). As \(\textbf{y}\) is worse than \(\textbf{w}\) from the leader perspective, in the pessimistic setting the follower will prefer \(\textbf{y}\) over \(\textbf{w}\). Hence, there exists at least one pair, \((\textbf{x},\textbf{w})\), that belongs to B but not to \(\varPi \). It follows that \(\varPi \ne B\). \(\square \)

Now, we derive conditions to guarantee that \(\varPi = B\).

Definition 3

LBP is insensitive to the leader objective function if, for any \(\textbf{x}\in P^{\textbf{x}}\) with \(|ZF(\textbf{x})| > 1\), all \(\textbf{y}\in ZF(\textbf{x})\) have the same leader objective value.

A sufficient, but not necessary, condition for being insensitive to the leader objective function is that the leader and the follower objective coefficients can be obtained from one another by applying some scaling factor.

Theorem 3

\(\varOmega = B\) if and only if LBP is insensitive to the leader objective function.

Proof

If LBP is insensitive to the leader objective function, all the solutions in \(ZF(\textbf{x})\) minimize \(\textbf{d}^T \textbf{y}\). It follows that for any \((\textbf{x},\textbf{y}) \in B\), \(\textbf{y}\in Y_o(\textbf{x})\) and, hence, \(B = \varOmega \). Suppose now that LBP is not insensitive to the leader objective function, that is, there exist \(\textbf{x},\textbf{y},\textbf{w}\) such that \(\textbf{y},\textbf{w}\in ZF(\textbf{x})\) and \(\textbf{d}^T \textbf{y}> \textbf{d}^T \textbf{w}\). Therefore, \((\textbf{x},\textbf{y}) \in B \setminus \varOmega \). \(\square \)

Unfortunately, being LBP insensitive to the leader objective function does not ensure that \(\varPi = B\). In fact, even if the condition is satisfied, there may exist \(\textbf{y}\in ZI(\textbf{x})\) that makes \(\textbf{x}\) infeasible and, as a result, \(\varPi \ne B\).

Definition 4

LBP is independent of the leader constraints if, for any \(\textbf{x}\in P^{\textbf{x}}\), either \(ZF(\textbf{x}) = \emptyset \) or \(ZI(\textbf{x}) = \emptyset \).

This implies that \(ZI(\textbf{x}) = \emptyset \) for all \(\textbf{x}\) such that there exists \(\textbf{y}\) with the property that \((\textbf{x},\textbf{y}) \in B\). A sufficient, but not necessary, condition for LBP to be independent of the leader constraints is that \(\textbf{L}= \textbf{0}\), that is, the leader constraints do not depend on the follower variables.

Theorem 4

\(\varPi =\varOmega \) if and only if LBP is independent of the leader constraints and insensitive to the leader objective function.

Proof

Suppose that the problem is not independent of the leader constraints. Then, there exists \(\textbf{x}\) such that both \(ZI(\textbf{x})\) and \(ZF(\textbf{x})\) are non-empty. It follows that no pair including \(\textbf{x}\) belongs to \(\varPi \), whereas B contains at least one pair for \(\textbf{x}\), because \(ZF(\textbf{x}) \ne \emptyset \). Hence, \(\varPi \ne B\) and, by Theorem 2, \(\varPi \ne \varOmega \). Suppose now that LBP is independent of the leader constraints, but not insensitive to the leader objective function. Then, there exist \(\textbf{x},\textbf{y},\textbf{w}\) such that \(\textbf{y},\textbf{w}\in ZF(\textbf{x})\) and \(\textbf{d}^T \textbf{y}< \textbf{d}^T \textbf{w}\). It follows that \((\textbf{x},\textbf{y}) \in B {\setminus } \varPi \) and, hence, by Theorem 2, \(\varPi \ne \varOmega \). Assume now that LBP is independent of the leader constraints and insensitive to the leader objective function. Therefore, \(ZI(\textbf{x}) = \emptyset \) for the leader solutions \(\textbf{x}\) such that there exists \((\textbf{x},\textbf{y}) \in B\) and \(ZF(\textbf{x}) = Y_o(\textbf{x}) = Y_p(\textbf{x})\). Hence, \(\varPi = B\) and, by Theorem 2, \(\varOmega = \varPi \). \(\square \)

2.4 When the policy does matter

Contrary to what one may think, \(\textbf{f}=\textbf{0}\) does not imply that the policy can be ignored. If \(\textbf{f}=\textbf{0}\), then \(Z(\textbf{x}) = P(\textbf{x})\) and optimistic LBP reduces to optimize the leader objective function over P, that is, to solve HPR. In fact, for a given \(\textbf{x}\), any solution \(\textbf{y}\in P(\textbf{x})\) is equivalently good for the follower and ties are broken according to the leader preferences. Then, the leader can choose both \(\textbf{x}\) and the corresponding \(\textbf{y}\in P(\textbf{x})\) at the same time, while the follower plays no role. On the contrary, in the pessimistic case, we still have to solve a bilevel problem. Indeed, the follower solutions in \(P(\textbf{x})\) are equivalent for the follower, but not for the leader, who cannot pick the preferred one. However, if LBP is independent of the leader constraints, then pessimistic LBP can be solved by solving an optimistic bilevel problem.

Theorem 5

If \(\textbf{f}= \textbf{0}\) and LBP is independent of the leader constraints, then pessimistic LBP is equivalent to the optimistic LBP where \(\textbf{f}\) is replaced by \(-\textbf{d}\).

Proof

Assumption \(\textbf{f}=\textbf{0}\) implies that \(Z(\textbf{x}) = P(\textbf{x})\). The independence of the leader constraints guarantees that, for any \(\textbf{x}\in P^{\textbf{x}}\), \(ZI(\textbf{x}) = \emptyset \). In fact, if \(\textbf{x}\in P^{\textbf{x}}\), there exists \(\textbf{y}\in P(\textbf{x})\) with the property that \((\textbf{x},\textbf{y}) \in P\), that is, \((\textbf{x},\textbf{y})\) satisfies the leader constraints. Since \(P(\textbf{x})=Z(\textbf{x})\), then \(\textbf{y}\in ZF(\textbf{x}) \ne \emptyset \) and, as a consequence, \(ZI(\textbf{x}) = \emptyset \). The pessimistic setting requires to compute a follower solution in \(Y_p(\textbf{x})\), that is, a solution maximizing \(\textbf{d}^T \textbf{y}\) over \(P(\textbf{x})\). Consider now the optimistic bilevel problem where \(\textbf{f}=\textbf{0}\) is replaced by \(\textbf{f}=-\textbf{d}\). Its follower problem computes the solutions minimizing \(\textbf{f}^T \textbf{y}\), that is, maximizing \(\textbf{d}^T \textbf{y}\), over \(P(\textbf{x})\). \(\square \)

Hence, any optimistic problem with \(\textbf{f}=-\textbf{d}\) that is also independent of the leader constraints can be seen as an equivalent optimistic reformulation of a corresponding pessimistic problem with \(\textbf{f}=\textbf{0}\).

3 Single-level reformulations for optimistic LBP

Here, we present valid inequalities that describe B, thus leading to a (possibly non-compact) single-level reformulation of optimistic LBP according to Theorem 1. A description of B can be obtained by adding to P some valid inequalities to eliminate integer pairs \((\textbf{x},\textbf{y}) \in P \setminus B\). We consider two alternative sets of inequalities that reach this same purpose: the optimistic follower optimality cut (see Sect. 3.2) and our version of the no-good cuts introduced in DeNegre and Ralphs (2009) (see Sect. 3.3). Then, we theoretically compare the obtained formulations, to evaluate the relative strength of the inequalities from a polyhedral perspective (see Sect. 3.4).

3.1 Binarization

To avoid additional assumptions on the problem parameters, we binarize some of the general integer variables of the problem, if any. Let \({\varvec{\ell }}^{\textbf{x}}\) and \(\textbf{u}^{\textbf{x}}\) be lower and upper bounds on the leader variables. Both for the formulation that uses the follower optimality cuts and for the one that uses our version of the no-good cuts, for each \(i \in N\), auxiliary variables \(x_i^1, \ldots , x_i^{u_i^{\textbf{x}} - \ell _i^{\textbf{x}}}\) are added to the problem, together with the following constraints.

$$\begin{aligned}&x_i = \ell _i^{\textbf{x}} + x_i^1 + x_i^2 + \ldots + x_i^{u_i^{\textbf{x}} - \ell _i^{\textbf{x}}}{} & {} \\&x_i^k \le x_i^{k-1}{} & {} k = 2, \ldots , u_i^{\textbf{x}} - \ell _i^{\textbf{x}} \\&x_i^k \in \{0,1\}{} & {} k \in \{1, \ldots , u_i^{\textbf{x}} - \ell _i^{\textbf{x}}\} \end{aligned}$$

That is, variable \(x^k_i\) is the k-th binary variable in the binarization scheme for original integer variable \(x_i\). Since \(x_i\) may take values in \(\{\ell ^{\textbf{x}}_i, \ldots , u^{\textbf{x}}_i\}\) and the integer value is computed from the binarized variables using the first equation, then \(u^{\textbf{x}}_i - \ell ^{\textbf{x}}_i\) binary variables are needed to represent all the possible values that \(x_i\) can take, that is, \(k \in \{1, \ldots , u_i^{\textbf{x}} - \ell _i^{\textbf{x}}\}\). In this way, \(\textbf{x}\) is expressed as the sum of its lower bound and a set of auxiliary binary variables, while the precedence constraints are used to reduce symmetry. This particular binarization scheme (Roy 2007) is proved to have better theoretical properties than other schemes (Bonami and Margot 2015; Dash et al. 2018). In the following, when we say that \(\textbf{x}\) takes integer value \(\textbf{s}\), we mean that \(s^h_i\) is binary for each \(h \in \{1, \ldots , u_i^{\textbf{x}} - \ell _i^{\textbf{x}}\}\) and \(i \in N\). If two integer leader solutions \({\varvec{\mu }}\) and \(\textbf{s}\) are different, it means that there exist \(i \in N\) and \(h \in \{1, \ldots , u_i^{\textbf{x}} - \ell _i^{\textbf{x}}\}\) such that \(\mu _i^h \ne s_i^h\). If \({\varvec{\mu }}=\textbf{s}\), then \(s_i^h = \mu _i^h\) for all \(i \in N\) and \(h \in \{1, \ldots , u_i^{\textbf{x}} - \ell _i^{\textbf{x}}\}\). If \(x_i={\bar{x}}_i\), since the binary variables are set to one in lexicographic order by constraints \(x_i^k \le x_i^{k-1}\), we have that \(x_i^k = 1\) for \(k \le {\bar{x}}_i - \ell _i^{\textbf{x}}\), whereas \(x_i^k = 0\) for \(k \ge {\bar{x}}_i - \ell _i^{\textbf{x}}+1\). We denote by \(l1_{{\bar{\textbf{x}}}_i}\) value \({\bar{x}}_i - \ell _i^{\textbf{x}}\), which represents the index of the last binary variable \(x_i^k\) taking value 1 when \(x_i = {\bar{x}}_i\).

The follower optimality cuts are valid inequalities including the original \(\textbf{y}\) variables and the artificial variables \(x^h_i\), \(h \in \{1, \ldots , u_i^{\textbf{x}} - \ell _i^{\textbf{x}}\}\), \(i \in N\). Our version of the no-good cuts needs that the general integer \(\textbf{y}\) variables are binarized as well, and they are inequalities including the artificial variables \(x^h_i\), \(h \in \{1, \ldots , u_i^{\textbf{x}} - \ell _i^{\textbf{x}}\}\), \(i \in N\) and the artificial variables \(y^h_i\), \(h \in \{1, \ldots , u_i^{\textbf{y}} - \ell _i^{\textbf{y}}\}\), \(i \in F\), introduced to binarize the follower variables \(\textbf{y}\), with \({\varvec{\ell }}^{\textbf{y}}\) and \(\textbf{u}^{\textbf{y}}\) being lower and upper bounds on the follower variables.

3.2 Optimistic follower optimality (OFO) cuts

Consider an integer solution \((\textbf{s},{\varvec{\eta }}) \in P\), where \({\varvec{\eta }}\) is not optimal for \(\text{ FOLL }(\textbf{s}) \). Such a pair, which belongs to P but not to B, violates the optimistic follower optimality cut (1) for \(\textbf{s}\).

Definition 5

Let \(\textbf{s}\in P^{\textbf{x}}\), the corresponding follower optimality (OFO) cut is the inequality below.

$$\begin{aligned} \textbf{f}^T \textbf{y}\le opt(\textbf{s})&+ M^O \left( \sum _{i \in N: s_i > \ell _i^{\textbf{x}}} (1-x_i^{l1_{s_i}}) \right. \nonumber \\&\left. + \sum _{i \in N: s_i < u_i^{\textbf{x}}}x_i^{l1_{s_i}+1} \right) \end{aligned}$$
(1)

\(M^O\) is a sufficiently large value, whose purpose is to guarantee that the constraint is automatically satisfied for integer \(\textbf{x}\ne \textbf{s}\) (see Sect. 5.5). Note that \(\text{ FOLL }(\textbf{s}) \) is feasible (and bounded) for any \(\textbf{s}\in P^{\textbf{x}}\), as this implies that there exists \(\textbf{y}\) such that \((\textbf{s},\textbf{y}) \in P\), that is, \(P(\textbf{s}) \ne \emptyset \).

Consider now the model that optimizes the leader objective function and whose feasible region oSLF is obtained by adding to P an OFO cut for every \(\textbf{s}\in P^{\textbf{x}}\). The description of oSLF is non-compact, as it may include an exponential number of OFO inequalities. Let \(B^{\textbf{s}} \subseteq B\) be the set including the pairs \((\textbf{x},\textbf{y})\) with \(\textbf{x}=\textbf{s}\), that is, \(B^{\textbf{s}}= \{(\textbf{s},\textbf{y}) \in B\}\).

Theorem 6

The OFO inequalities are valid for \(B \supseteq \varOmega \) and, according to Theorem 1, optimistic LBP can be solved by the (possibly non-compact) single-level problem below.

$$\begin{aligned} \min _{(\textbf{x},\textbf{y}) \in oSLF} \textbf{c}^T \textbf{x}+ \textbf{d}^T \textbf{y}\end{aligned}$$

Proof

If \(\textbf{x}=\textbf{s}\), the OFO inequality reduces to \(\textbf{f}^T \textbf{y}\le opt(\textbf{s})\), which is valid for \(B^{\textbf{s}}\), as it forces to choose a follower solution \(\textbf{y}\), whose follower objective cost is optimal for the given leader choice \(\textbf{s}\). For integer \(\textbf{x}\ne \textbf{s}\), the bigM part makes the inequality automatically satisfied and, hence, valid for B. In the binarized setting, to ensure that \(x_i \ne s_i\) for some i, we must ensure that either \(x_i\) is decreased by at least one (at least \(x_i^{l1_{s_i}}\) becomes 0), or \(x_i\) is increased by at least one (at least \(x_i^{l1_{s_i} + 1}\) becomes 1). Since oSLF includes an OFO inequality for any leader solution in \(P^{\textbf{x}}\), according to the reformulation of optimistic LBP in Theorem 1, we can solve optimistic LBP by optimizing the leader objective function over oSLF. \(\square \)

3.3 Binarized no-good cuts

Suppose now that the follower variables are integer. An integer pair \((\textbf{s},{\varvec{\eta }}) \in P\) with \({\varvec{\eta }}\) not optimal for \(\text{ FOLL }(\textbf{s}) \) can be eliminated by the corresponding no-good cut (2).

Definition 6

For an integer pair \((\textbf{s}, {\varvec{\eta }}) \notin B\), a no-good cut is the inequality below.

$$\begin{aligned}{} & {} \sum _{i \in N: s_i> \ell _i^{\textbf{x}}} (1-x_i^{l1_{s_i}}) + \sum _{i \in N: s_i< u_i^{\textbf{x}}}x_i^{l1_{s_i}+1} \nonumber \\{} & {} \qquad + \sum _{i \in F: \eta _i > \ell _i^{\textbf{y}}} (1-y_i^{l1_{\eta _i}}) + \sum _{i \in F: \eta _i < u_i^{\textbf{y}}}y_i^{l1_{\eta _i}+1} \ge 1 \end{aligned}$$
(2)

The original definition of the no-good cuts for general integer variables requires \(\textbf{x}\) and \(\textbf{y}\) to satisfy a slack condition (see step 2 of Algorithm 1 in DeNegre and Ralphs 2009), which makes them valid only for problems with purely integer parameters (constraints coefficients and right-hand sides). The version with the binarized variables (2) that we propose extends their application to problems where some parameters are not integer. The original inequalities can be simplified when the variables are binary and, in that case, the definition in DeNegre and Ralphs (2009) coincides with the one we give here. Let oSLG be the set obtained by adding to P a no-good inequality for any \((\textbf{s},{\varvec{\eta }}) \in P \setminus B\). These inequalities can be, in general, exponentially many. Using the same argument of Theorem 6, it is easy to see that the binarized no-good cuts are valid for \(B \supseteq \varOmega \) and that according to the reformulation of optimistic LBP in Theorem 1, optimistic LBP can be solved by the (possibly non-compact) single-level problem below.

$$\begin{aligned} \min _{(\textbf{x},\textbf{y}) \in oSLG} \textbf{c}^T \textbf{x}+ \textbf{d}^T \textbf{y}\end{aligned}$$

3.4 A comparison between oSLF and oSLG

Trivially, the no-good cuts eliminate a single infeasible solution \((\textbf{x},\textbf{y})\) at a time, whereas the follower optimality cuts remove all the solutions \((\textbf{x},\textbf{y})\) such that \(\textbf{y}\notin Z(\textbf{x})\) at the same time. Hence, oSLG has, in general, more constraints than oSLF. We now compare oSLF and oSLG from a polyhedral perspective, thus investigating the relative strength of the follower optimality and of the no-good inequalities. The results we derive are independent of binarization, as all the theorems refer to binary problems, where no binarization is needed. We assume that the reader is familiar with the definition of formulation for an integer problem and with the criteria to investigate different formulations for the same problem (Wolsey 1998). In what follows, we use symbols PoSLF and PoSLG to denote the polyhedra corresponding to the sets oSLF and oSLG defined above, when the integrality restrictions on the variables are removed.

Theorem 7

Neither PoSLF is stronger than PoSLG nor the opposite holds.

Proof

Consider the bilevel programming problem below.

$$\begin{aligned}&\min x + \zeta _3{} & {} \\&x \le 0{} & {} {} & {} \\&x \in \{0,1\}{} & {} {} & {} \\&{\varvec{\zeta }}\in \arg{} & {} \min 3y_1 + y_2 + y_3 + y_4{} & {} \\{} & {} {}&3y_1 + y_2 + y_3 + y_4 \ge 3 + 3x{} & {} \\{} & {} {}&\textbf{y}\in \{0,1\}^4{} & {} \end{aligned}$$

Value \(x=1\) is not feasible for the leader problem. The OFO cut corresponding to \(x=0\) is \(3y_1 + y_2 + y_3 + y_4 \le 3 + M^O x\), while the no-good cuts associated with solutions not optimal for the follower problem when \(x=0\) are reported below.

$$\begin{aligned}&(0,(1,0,0,1)) \;{} & {} 1-y_1 + y_2 + y_3 + 1-y_4 + x \ge 1\\&(0,(1,0,1,0)){} & {} 1-y_1 + y_2 + 1-y_3 + y_4 + x \ge 1 \\&(0,(1,0,1,1)){} & {} 1-y_1 + y_2 + 1-y_3 + 1 - y_4 + x \ge 1 \\&(0,(1,1,0,0)){} & {} 1-y_1 + 1- y_2 + y_3 + y_4 +x \ge 1\\&(0,(1,1,0,1)){} & {} 1-y_1 + 1 -y_2 + y_3 + 1 -y_4 + x \ge 1\\&(0,(1,1,1,0)){} & {} 1-y_1 + 1 - y_2 + 1 - y_3 + y_4 + x \ge 1\\&(0,(1,1,1,1)){} & {} 1-y_1 + 1- y_2 + 1-y_3 + 1-y_4 + x \ge 1 \end{aligned}$$

Fractional solution \((0,{\varvec{\nu }})=(0,(1/3,1,1,0))\) satisfies the OFO inequality, but it violates the no-good cut for (0, (1, 1, 1, 0)). Fractional solution \((0,{\varvec{\eta }}) = (0,(1,1/2,\) 1/2, 1)) satisfies the no-good cuts, but not the OFO inequality. The problem above is a counterexample both of being PoSLF stronger than PoSLG (PoSLF \(\subseteq \) PoSLG) and of being PoSLG stronger than PoSLF (PoSLG \(\subseteq \) PoSLF). \(\square \)

If the variables are binary, any OFO inequality is a binary knapsack. A binary knapsack is an inequality \(\textbf{q}^T \textbf{v}\le t\), where \(\textbf{v}\) are binary variables. Sets \(C^+ \subseteq \{i: q_i > 0\}\) and \(C^- \subseteq \{i: q_i < 0\}\) such that \(\sum _{i \in C^+} q_i > t - \sum _{i: q_i <0, i \notin C^-} q_i\) define a cover. The corresponding cover inequality \(\sum _{i \in C^+} v_i - \sum _{i \in C^-}v_i \le |C^+|-1\) is valid for the convex hull of the integer feasible solutions of the knapsack polyhedron (Balas 1975). For an OFO constraint, a cover inequality has the form

$$\begin{aligned} \sum _{i \in X^+} x_i + \sum _{i \in Y^+} y_i - \sum _{i \in Y^-} y_i \le |Y^+|+|X^+|-1 \end{aligned}$$

where \(X^+ = \{i: s_i = 1\}\), \(X^- = \emptyset \), \(Y^+ \subseteq \{i: f_i > 0\}\), \(Y^- \subseteq \{i: f_i<0\}\) and

$$\begin{aligned} \sum _{i \in Y^+} f_i > opt(\textbf{s}) - \sum _{i: f_i <0, i \notin Y^-} f_i \end{aligned}$$

In fact, as soon as a variable in \(\{i: s_i = 1\}\) is not selected or a variable in \(\{i: s_i = 0\}\) is selected, then the knapsack is automatically satisfied by the bigM part and no cover can exist. Hence, any cover must have \(X^+ = \{i: s_i = 1\}\) and \(X^- = \emptyset \).

Consider set oSLFcov, that includes, for each OFO inequality, also its cover inequalities and denote by PoSLFcov the corresponding polyhedron, when the integrality restrictions on the variables are removed.

Theorem 8

PoSLFcov is stronger than both PoSLF and PoSLG.

Proof

Being PoSLFcov better than PoSLF follows from the results on the knapsack problem. We focus here on PoSLFcov being better than PoSLG. We first prove that there is no fractional solution satisfying the inequalities in PoSLFcov but not the no-good cuts in PoSLG, that is, PoSLFcov is at least as good as PoSLG. Then we provide an example where PoSLFcov is strictly better than PoSLG, completing the proof. Suppose that it is not true that PoSLFcov is at least as good as PoSLG, that is, assume that there exists a fractional solution \((\textbf{w},\textbf{z})\) that satisfies the inequalities in PoSLFcov, but it violates the no-good cut for a pair \((\textbf{s},{\varvec{\eta }})\). This means that

$$\begin{aligned} \sum _{i: s_i=0} w_i + \sum _{i: s_i = 1} (1-w_i) + \sum _{i: \eta _i=0} z_i + \sum _{i: \eta _i = 1} (1-z_i) < 1 \end{aligned}$$

Now, consider sets \(Y^+ = \{i: f_i > 0\} \cap \{i: \eta _i = 1\}\), \(Y^- = \{i: f_i < 0\} \cap \{i: \eta _i = 0\}\), \(X^+ = \{i: s_i = 1\}\) and \(X^-=\emptyset \). Then, the elements of the left-hand side of the inequality above can be rewritten as follows: (i) \(\sum _{i: s_i=1} (1-w_i)\) \(=\) \(\sum _{i \in X^+} (1-w_i)\) \(=\) \(|X^+| - \sum _{i \in X^+} w_i\); (ii) \(\sum _{i: \eta _i=0} z_i\) \(=\) \(\sum _{i \in Y^-} z_i\) \(+\) \(\sum _{i: f_i > 0, \eta _i=0} z_i\); (iii) \(\sum _{i: \eta _i = 1} (1-z_i)\) \(=\) \(\sum _{i \in Y^+} (1-z_i)\) \(+\) \(\sum _{i: f_i < 0, \eta _i = 1} (1-z_i)\) \(=\) \(|Y^+| + |\{i: f_i < 0, \eta _i = 1\}|\)\(\sum _{i \in Y^+}z_i\)\(\sum _{i: f_i <0, \eta _i = 1}z_i\). Hence, we obtain the inequality below.

$$\begin{aligned}&\sum _{i \in X^+} w_i + \sum _{i \in Y^+} z_i + \sum _{i: f_i<0,\eta _i=1} z_i - \sum _{i: s_i = 0} w_i - \sum _{i \in Y^-} z_i\\&\quad - \sum _{i: f_i>0, \eta _i=0} z_i > |X^+| + |Y^+| + |\{i: f_i < 0, \eta _i = 1\}|- 1 \end{aligned}$$

Note that since \(\textbf{0}\le \textbf{w}\le \textbf{1}\) and \(\textbf{0}\le \textbf{z}\le \textbf{1}\), then \(-\sum _{i: f_i>0, \eta _i=0} z_i\) \(\le \) 0, \(- \sum _{i: s_i = 0} w_i\) \(\le \) 0 and \(\sum _{i: f_i <0, \eta _i=1} z_i\) \(\le \) \(|\{i: f_i < 0, \eta _i = 1\}|\). Then, it follows that

$$\begin{aligned} \sum _{i \in X^+} w_i + \sum _{i \in Y^+} z_i - \sum _{i \in Y^-} z_i > |X^+| + |Y^+| - 1 \end{aligned}$$

Therefore, if \(X^+,X^-,Y^-,Y^+\) is a cover of the OFO cut for \(\textbf{s}\), the corresponding cover inequality is violated by \((\textbf{w},\textbf{z})\), contradicting the assumption we made and confirming that PoSLFcov is at least as good as PoSLG. If the no-good inequality is valid, then \((\textbf{s},{\varvec{\eta }}) \notin B\), which means that \(\textbf{f}^T {\varvec{\eta }}> opt(\textbf{s})\). Since \(\sum _{i: \eta _i = 0} f_i \eta _i = 0\), then it must hold that \(\sum _{i: f_i> 0, \eta _i = 1} f_i + \sum _{i: f_i < 0, \eta _i = 1} f_i > opt(\textbf{s})\), that is,

$$\begin{aligned} \sum _{i \in Y^+} f_i > opt(\textbf{s}) - \sum _{i: f_i < 0, i \notin Y^-} f_i \end{aligned}$$

Hence, \(X^+,X^-,Y^+,Y^-\) is a cover whose inequality is violated. Therefore, there exists no solution that satisfies the cover inequalities but not the no-good cuts. On the contrary, there may exist solutions satisfying the no-good cuts but not the cover inequalities. A counterexample is the following. Consider again the problem in the proof of Theorem 7. Solution \((0,{\varvec{\nu }}) = (0,(1,\) 1/2,  1/2,  1)) satisfies the no-good cuts, but it violates the cover inequality \(y_1 + y_2 \le 1\) for the OFO cut for \(x = 0\) corresponding to sets \(X^-=Y^-=X^+=\emptyset , Y^+=\{1,2\}\). \(\square \)

3.5 Strengthening the optimistic cuts

OFO and no-good cuts can be strengthened by eliminating some variables from them. To do that, we must prove that the considered inequality remains valid for B, independently of the value of the eliminated variables. In Theorem 9, we provide conditions guaranteeing that some constraints can be strengthened. All the conditions are independent of binarization.

Theorem 9

The following results hold:

  1. 1.

    let \(\textbf{x},{\varvec{\mu }}\in P^{\textbf{x}}\): if \(\exists \textbf{z}\in Z(\textbf{x}) \cap P({\varvec{\mu }})\), then \(\textbf{f}^T \textbf{y}\le opt(\textbf{x})\) is valid for \(B^{{\varvec{\mu }}}\);

  2. 2.

    let \((\textbf{x},\textbf{y}),(\textbf{x},{\varvec{\nu }}) \in P\): if \(\textbf{y}\in P(\textbf{x}){\setminus } Z(\textbf{x})\) and \(\textbf{f}^T {\varvec{\nu }}\ge \textbf{f}^T \textbf{y}\), then \((\textbf{x},{\varvec{\nu }}) \notin B\);

  3. 3.

    let \((\textbf{x},\textbf{y}), ({\varvec{\mu }},\textbf{y}) \in P\): if \(\textbf{y}\in P(\textbf{x}) {\setminus } Z(\textbf{x})\) and \(\exists \textbf{z}\in Z(\textbf{x}) \cap P({\varvec{\mu }})\), then \(({\varvec{\mu }},\textbf{y}) \notin B\).

Proof

If \(\exists \textbf{z}\in Z(\textbf{x}) \cap P({\varvec{\mu }})\), then \(opt({\varvec{\mu }}) \le opt(\textbf{x})\). It follows that \(\textbf{f}^T \textbf{y}\le opt(\textbf{x}) \le opt({\varvec{\mu }})\) is valid for \(B^{{\varvec{\mu }}}\). If \(\textbf{y}\in P(\textbf{x}){\setminus } Z(\textbf{x})\), then \(\textbf{f}^T \textbf{y}> opt(\textbf{x})\). Therefore, if \(\textbf{f}^T {\varvec{\nu }}\ge \textbf{f}^T \textbf{y}\), then \(\textbf{f}^T {\varvec{\nu }}\ge \textbf{f}^T \textbf{y}> opt(\textbf{x})\) and, hence, \({\varvec{\nu }}\notin Z(\textbf{x})\). If \(\exists \textbf{z}\in Z(\textbf{x}) \cap P({\varvec{\mu }})\), then \(opt({\varvec{\mu }}) \le opt(\textbf{x})\). It follows that \(\textbf{y}\notin Z({\varvec{\mu }})\). \(\square \)

A simple case where \(\exists \textbf{z}\in Z(\textbf{x}) \cap P({\varvec{\mu }})\) is when \(P(\textbf{x}) \subseteq P({\varvec{\mu }})\).

Definition 7

A leader variable \(x_i\) is positively redundant if \(\textbf{G}_i \ge \textbf{0}\), whereas \(x_i\) is negatively redundant if \(\textbf{G}_i \le \textbf{0}\). A variable that is either positively or negatively redundant is said coherent.

When we increase a positively redundant variable or decrease a negatively redundant one, the feasible region of the follower problem enlarges. Hence, all solutions \({\varvec{\mu }}\) obtained from \(\textbf{x}\) increasing a positively redundant variable or decreasing a negatively redundant one have the property that \(P(\textbf{x}) \subseteq P({\varvec{\mu }})\).

Theorem 10

The following strengthening is correct for the OFO and the no-good cuts: if \(x_i\) is positively redundant, then remove \(x_i^{l1_{s_i}+1}\); if \(x_i\) is negatively redundant, then remove \(x_i^{l1_{s_i}}\).

Proof

We must prove that \(\textbf{f}^T \textbf{y}\ge opt(\textbf{s})\) is valid for \(B^{{\varvec{\mu }}}\) for any \({\varvec{\mu }}\) obtained from \(\textbf{s}\) by arbitrarily setting the values for the removed variables. The removal allows to arbitrarily increase the positively redundant variables or decrease the negatively redundant ones. Since the increase (decrease) in a positively (negatively) redundant leader variable enlarges the feasible region, that is, \(P(\textbf{s}) \subseteq P({\varvec{\mu }})\), it follows that the strengthening is correct. \(\square \)

The above strengthening assumes that \(P(\textbf{s}) \subseteq P({\varvec{\mu }})\) for any considered \({\varvec{\mu }}\) and, hence, that \(Z(\textbf{s}) \subseteq P({\varvec{\mu }})\). However, we only need that one optimal solution \(\textbf{y}\in Z(\textbf{s})\) belongs to \(P({\varvec{\mu }})\), to obtain a correct strengthening. In fact, if \(\textbf{y}\in P({\varvec{\mu }})\) for any \({\varvec{\mu }}\), then \(opt({\varvec{\mu }}) \le \textbf{f}^T\textbf{y}=opt(\textbf{s})\).

Definition 8

A leader variable \(x_i\) is singular if it does not share follower constraints with other leader variables.

Given an integer vector \(\textbf{z}\), let \(\textbf{z}^{i+}\) be the vector having \(z^{i+}_j = z_j\) for \(i \ne j\) and \(z^{i+}_i = z_i+1\). In the same way, \(\textbf{z}^{i-}\) is the vector with \(z^{i-}_j = z_j\) for \(j \ne i\) and \(z^{i-}_i = z_i-1\). Let \(\textbf{s}\) be a leader solution and let \(\textbf{y}\) be an optimal solution of \(\text{ FOLL }(\textbf{s}) \).

Theorem 11

The following strengthening is correct for the OFO cuts: if \(x_i\) is singular, \(s_i=u_i^{\textbf{x}}-1\) and \(\textbf{y}\in P(\textbf{s}^{i+})\), then remove \(x_i^{l1_{s_i}+1}\); if \(x_i\) is singular, \(s_i=\ell _i^{\textbf{x}}+1\) and \(\textbf{y}\in P(\textbf{s}^{i-})\), then remove \(x_i^{l1_{s_i}}\).

Proof

We must prove that for any leader solution \({\varvec{\mu }}\) obtained from \(\textbf{s}\) arbitrarily increasing (decreasing) \(x_i\) for which \(x_i^{l1_{s_i}+1}\) (\(x_i^{l1_{s_i}}\)) has been removed, it holds that \(\textbf{y}\in P({\varvec{\mu }})\) and, hence, \(opt({\varvec{\mu }}) \le opt(\textbf{s})\). Let k be a follower constraint with a nonzero coefficient for singular variable \(x_i\). If k is satisfied by \((\textbf{s}^{i+},\textbf{y})\) and \(x_i^{l1_{s_i}+1}\) was removed (by \((\textbf{s}^{i-},\textbf{y})\) and \(x_i^{l1_{s_i}}\) was removed), then it remains satisfied for any modifications of the other leader variables, which do not appear in the constraint. Hence, if \(\textbf{y}\in P(\textbf{s}^{i+})\) (\(\textbf{y}\in P(\textbf{s}^{i-})\)), it also belongs to \(P({\varvec{\mu }})\). \(\square \)

For the no-good cuts, we can also remove some \(\textbf{y}\) variables.

Theorem 12

The following strengthening is correct for the no-good cuts: if \(f_i \ge 0\), then remove \(y_i^{l1_{\eta _i}+1}\); if \(f_i \le 0\), then remove \(y_i^{l1_{\eta _i}}\).

Proof

If \(f_i \ge 0\), for any solution \({\varvec{\nu }}\) obtained by increasing \(y_i\) with respect to \(\eta _i\), we have that \(\textbf{f}^T {\varvec{\nu }}\ge \textbf{f}^T {\varvec{\eta }}\). The same happens when \(f_i \le 0\) and \(y_i\) is decreased. Therefore, if \({\varvec{\eta }}\) is not optimal for \(\text{ FOLL }(\textbf{s}) \), \({\varvec{\nu }}\) is not optimal as well. \(\square \)

4 Single-level reformulations of pessimistic LBP

We now define a single-level non-compact reformulation of the pessimistic problem by adding to P some valid inequalities that eliminate the pairs \((\textbf{x},\textbf{y}) \in P \setminus \varPi \). Since the property outlined in Theorem 1 does not hold anymore, we must exclude from P: (i) the pairs \((\textbf{x},\textbf{y})\) where \(\textbf{y}\notin Z(\textbf{x})\); (ii) the pairs \((\textbf{x},\textbf{y})\) with \(\textbf{y}\in Z(\textbf{x})\), but \(ZI(\textbf{x}) \ne \emptyset \); (iii) the pairs \((\textbf{x},\textbf{y})\) such that \(\textbf{y}\in Z(\textbf{x})\), \(ZI(\textbf{x}) =\emptyset \), but \(\textbf{y}\notin Y_{p}(\textbf{x})\).

Pairs \((\textbf{x},\textbf{y})\) where \(\textbf{y}\notin Z(\textbf{x})\) can be eliminated using the same cuts applied in the optimistic framework (optimistic follower optimality or no-good cuts). If a leader solution \(\textbf{x}\) has the property that \(ZI(\textbf{x}) \ne \emptyset \), then no \((\textbf{x},\textbf{y})\) is feasible for the pessimistic problem. Those solutions are eliminated by the restricted no-good inequalities (see Sect. 4.1). Pairs \((\textbf{x},\textbf{y})\), where \(ZI(\textbf{x}) = \emptyset \) but \(\textbf{y}\in ZF(\textbf{x}) {\setminus } Y_p(\textbf{x})\), are treated using either the pessimistic follower optimality cuts, introduced here for the first time (see Sect. 4.2), or the no-good cuts, used here for the first time in a pessimistic setting (see Sect. 4.3). We assume that the binarization technique described in Sect. 3.1 is applied to the \(\textbf{x}\) variables and, when the no-good cuts are used, also to the \(\textbf{y}\) variables that must be restricted to be integer.

4.1 Restricted no-good inequalities

Let \((\textbf{s},{\varvec{\eta }}) \in B\) be an integer pair such that \({\varvec{\eta }}\in Z(\textbf{s})\), but \(ZI(\textbf{s}) \ne \emptyset \). Then, the current leader solution can be eliminated by inequality (3).

Definition 9

Let \(\textbf{s}\in P^{\textbf{x}}\) be an integer leader solution having \(ZI(\textbf{s}) \ne \emptyset \), then the corresponding restricted no-good cut is the inequality below.

$$\begin{aligned} \sum _{i \in N: s_i > \ell _i^{\textbf{x}}} (1-x_i^{l1_{s_i}}) + \sum _{i \in N: s_i < u_i^{\textbf{x}}}x_i^{l1_{s_i}+1} \ge 1 \end{aligned}$$
(3)

Restricted no-good cuts are trivially valid for \(\varPi \), as they impose to modify at least one entry, with respect to an infeasible solution \(\textbf{s}\), for an integer vector \(\textbf{x}\) to be feasible. To verify if \(ZI(\textbf{s}) \ne \emptyset \), one can solve the problem below, where \(\varphi _i\) is a binary variable that takes value 1 when leader constraint i can be violated and 0 otherwise, while \(\delta _i\) is the corresponding amount of violation.

$$\begin{aligned} \text{ FEAS }(\textbf{s}) \quad&\max \sum _{i \in \varPhi } \delta _i{} & {} \nonumber \\&\textbf{H}\textbf{y}\ge \textbf{r}- \textbf{G}\textbf{s}{} & {} \end{aligned}$$
(4)
$$\begin{aligned}&\textbf{f}^T \textbf{y}\le opt(\textbf{s}){} & {} \end{aligned}$$
(5)
$$\begin{aligned}&\textbf{L}^{iT} \textbf{y}+ \delta _i \le \textbf{b}_i - \textbf{A}^{iT}\textbf{s}\nonumber \\&\quad + M^F_i(1-\varphi _i) \quad{} & {} i \in \varPhi \end{aligned}$$
(6)
$$\begin{aligned}&\delta _i \le \varphi _i{} & {} i \in \varPhi \nonumber \\&{\varvec{\varphi }}\in \{0,1\}^{|\varPhi |}, {\varvec{\delta }}\ge \textbf{0}{} & {} \nonumber \\&y_i \text{ integer }{} & {} i \in F \end{aligned}$$
(7)

\(M^F_i\) is a sufficiently large value, whose purpose is to guarantee that the corresponding constraint is automatically satisfied when \(\varphi _i = 0\) (see Sect. 5.5). Set \(\varPhi \) includes the indices of the leader constraints and \({\varvec{\delta }}\in {\mathbb {R}}^{|\varPhi |}\). Constraints (4) and (5) guarantee that \(\textbf{y}\in Z(\textbf{s})\). When \(\varphi _i = 1\), the bigM part of constraint (6) disappears and, if \(\delta _i > 0\), it follows that \(\textbf{L}^{iT} \textbf{y}< b_i - \textbf{A}^{iT} \textbf{s}\). Hence, leader constraint \(\textbf{A}^{iT} \textbf{x}+ \textbf{L}^{iT} \textbf{y}\ge b_i\) is violated by \((\textbf{s},\textbf{y})\) and \(ZI(\textbf{s}) \ne \emptyset \). If \(\varphi _i = 0\), then \(\delta _i = 0\) by constraints (7), while constraint (6) is deactivated by the bigM part. If \(\text{ FOLL }(\textbf{s}) \) is feasible, then \(\text{ FEAS }(\textbf{s}) \) is feasible either, as any solution \(\textbf{y}\in Z(\textbf{s})\) can be completed by \({\varvec{\delta }}= {\varvec{\varphi }}= \textbf{0}\). Moreover, \(\text{ FEAS }(\textbf{s}) \) is not unbounded, as all the variables in the objective function are bounded from above.

4.2 Pessimistic follower optimality (PFO) cuts

Let \((\textbf{s},{\varvec{\eta }})\) be an integer pair in B, such that \(ZI(\textbf{s}) = \emptyset \), but \({\varvec{\eta }}\notin Y_p(\textbf{s})\). Checking if \(\textbf{y}\in Y_p(\textbf{s})\), amounts to solve the worst case problem below.

$$\begin{aligned} \text{ WORST }(\textbf{s}) \quad&\max \textbf{d}^T \textbf{y}{} & {} \nonumber \\&\textbf{H}\textbf{y}\ge \textbf{r}- \textbf{G}\textbf{s}{} & {} \nonumber \\&\textbf{f}^T \textbf{y}\le opt(\textbf{s}){} & {} \nonumber \\&y_i \text{ integer }{} & {} i \in F \nonumber \end{aligned}$$

Since \(P(\textbf{s})\) is bounded, problem \(\text{ WORST }(\textbf{s}) \) cannot be unbounded. Since \({\varvec{\eta }}\in Z(\textbf{s})\), \(\text{ WORST }(\textbf{s}) \) cannot be infeasible. Moreover, since \(ZI(\textbf{s}) = \emptyset \), \((\textbf{s},\textbf{y}) \in ZF(\textbf{s})\) for any feasible \(\textbf{y}\). Denote by \(optw(\textbf{s})\) the optimal value of \(\text{ WORST }(\textbf{s}) \). If \(\textbf{d}^T{\varvec{\eta }}< optw(\textbf{s})\), then \({\varvec{\eta }}\notin Y_p(\textbf{s})\) and solution \((\textbf{s},{\varvec{\eta }})\) can be eliminated by inequality (8).

Definition 10

Let \(\textbf{s}\in P^{\textbf{x}}\) be an integer leader solution such that \(ZI(\textbf{s}) = \emptyset \), then the pessimistic follower optimality (PFO) cut corresponding to \(\textbf{s}\) is the inequality below.

$$\begin{aligned}{} & {} \textbf{d}^T \textbf{y}\ge optw(\textbf{s}) - M^P \left( \sum _{i \in N: s_i > \ell _i^{\textbf{x}}} (1-x_i^{l1_{s_i}}) \right. \nonumber \\{} & {} \quad \left. + \sum _{i \in N: s_i < u_i^{\textbf{x}}}x_i^{l1_{s_i}+1} \right) \end{aligned}$$
(8)

\(M^P\) is a sufficiently large value, whose purpose is to guarantee that the constraint is automatically satisfied for integer solutions \(\textbf{x}\ne \textbf{s}\) (see Sect. 5.5).

Consider the problem obtained by optimizing the leader objective function over the set pSLF, built by adding to P an OFO cut for every integer \(\textbf{x}\in P^{\textbf{x}}\); a restricted no-good cut for every integer \(\textbf{x}\in P^{\textbf{x}}\) having \(ZI(\textbf{x}) \ne \emptyset \); a PFO cut for every integer \(\textbf{x}\in P^{\textbf{x}}\) with \(ZI(\textbf{x}) = \emptyset \).

Theorem 13

The PFO inequalities are valid for \(\varPi \) and pessimistic LBP can be solved by the (possibly non-compact) single-level problem below.

$$\begin{aligned} \min _{(\textbf{x},\textbf{y}) \in pSLF} \textbf{c}^T \textbf{x}+ \textbf{d}^T \textbf{y}\end{aligned}$$

Proof

When \(\textbf{x}= \textbf{s}\), the PFO inequality reduces to \(\textbf{d}^T \textbf{y}\ge optw(\textbf{s})\), which forces to choose a solution \(\textbf{y}\in Y_p(\textbf{s})\). When there exists \(i \in N\) such that \(x_i \ne s_i\), then the PFO cut is deactivated by the bigM part. The OFO inequalities are valid for \(B \supseteq \varPi \) by Theorem 6 and the restricted no-good cuts are also trivially valid. Consider an integer pair \((\textbf{x},\textbf{y}) \in P\). If \(\textbf{y}\notin Z(\textbf{x})\), then \((\textbf{x},\textbf{y})\) is eliminated by an OFO cut and it does not belong to pSLF. If \(\textbf{y}\in Z(\textbf{x})\) but \(ZI(\textbf{x}) \ne \emptyset \), then \((\textbf{x},\textbf{y})\) is excluded by a restricted no-good cut. If \(\textbf{y}\in Z(\textbf{x})\), \(ZI(\textbf{x}) = \emptyset \) and \(\textbf{y}\notin Y_p(\textbf{x})\), then \((\textbf{x},\textbf{y})\) is cut by a PFO inequality. Hence, any remaining pair in P that has not been cut either by an OFO or by a PFO or by a restricted no-good inequality belongs to \(\varPi \). It follows that pSLF corresponds to a (possibly non-compact) single-level reformulation of pessimistic LBP. \(\square \)

4.3 Pessimistic no-good cuts

Although the no-good inequalities have been used, so far, only in the optimistic setting, they are general purpose cuts that eliminate a single solution, independently of the reason of the infeasibility. Therefore, they can be used in the pessimistic framework as well. If \((\textbf{s},{\varvec{\eta }})\) is an integer pair in B with \(ZI(\textbf{x}) = \emptyset \) and \({\varvec{\eta }}\notin Y_p(\textbf{s})\), it can be cut away by inequality (9).

Definition 11

Let \((\textbf{s},{\varvec{\eta }}) \in B\) be an integer pair with \(ZI(\textbf{x}) = \emptyset \) and \({\varvec{\eta }}\notin Y_p(\textbf{s})\), then the corresponding no-good cut is the inequality below.

$$\begin{aligned}{} & {} \sum _{i \in N: s_i> \ell _i^{\textbf{x}}} (1-x_i^{l1_{s_i}}) + \sum _{i \in N: s_i< u_i^{\textbf{x}}}x_i^{l1_{s_i}+1} \nonumber \\{} & {} \quad + \sum _{i \in F: \eta _i > \ell _i^{\textbf{y}}} (1-y_i^{l1_{\eta _i}}) + \sum _{i \in F: \eta _i < u_i^{\textbf{y}}}y_i^{l1_{\eta _i}+1} \ge 1 \end{aligned}$$
(9)

We call optimistic no-good cuts the inequalities (2) generated to eliminate \((\textbf{x},\textbf{y}) \notin B\) and pessimistic no-good cuts, the ones (9) that are added to forbid \((\textbf{x},\textbf{y})\) with \(\textbf{y}\notin Y_p(\textbf{x})\). Using the same argument in the proof of Theorem 13, it is not difficult to see that the pessimistic no-good cuts are valid for \(\varPi \). If optimistic, pessimistic and restricted no-good inequalities are added to P for any \((\textbf{x},\textbf{y}) \notin \varPi \), we obtain set pSLG, which has the property that pessimistic LBP can be solved by solving the (possibly non-compact) single-level problem below.

$$\begin{aligned} \min _{(\textbf{x},\textbf{y}) \in pSLG} \textbf{c}^T \textbf{x}+ \textbf{d}^T \textbf{y}\end{aligned}$$

It is easy to see that the relationship between the PFO cuts and the pessimistic no-good inequalities is the same we described in the optimistic setting for the OFO and the optimistic no-good cuts (see Sect. 3.4).

4.4 Strengthening the pessimistic cuts

Strengthening the pessimistic cuts is much harder than strengthening the optimistic inequalities. Below, we illustrate some conditions that may allow to strengthen the pessimistic cuts.

Theorem 14

The following results hold:

  1. 1.

    Let \(\textbf{x},{\varvec{\mu }}\in P^{\textbf{x}}\) such that \(ZI(\textbf{x}) = ZI({\varvec{\mu }}) = \emptyset \) and \(\exists \textbf{y}\in Y_p(\textbf{x}) \cap Z({\varvec{\mu }})\), then \(\textbf{d}^T \textbf{y}\ge optw(\textbf{x})\) is valid for \(Y_p({\varvec{\mu }})\);

  2. 2.

    Let \(\textbf{x},{\varvec{\mu }}\in P^{\textbf{x}}\) such that \(ZI(\textbf{x}) = ZI({\varvec{\mu }}) = \emptyset \), \(\exists \textbf{y}\in Y_p(\textbf{x}) \cap Z({\varvec{\mu }})\) and \(\textbf{w}\in Z(\textbf{x}) {\setminus } Y_p(\textbf{x})\), then \(\textbf{w}\notin Y_p({\varvec{\mu }})\);

  3. 3.

    Let \((\textbf{x},\textbf{w}),(\textbf{x},{\varvec{\nu }}) \in P\) such that \(ZI(\textbf{x}) = \emptyset \), if \(\textbf{w}\in ZF(\textbf{x}){\setminus } Y_p(\textbf{x})\) and \(\textbf{d}^T {\varvec{\nu }}\le \textbf{d}^T \textbf{w}\), then \({\varvec{\nu }}\notin Y_p(\textbf{x})\).

Proof

If \(\exists \textbf{y}\in Y_p(\textbf{x}) \cap Z({\varvec{\mu }})\), then \(optw({\varvec{\mu }}) \ge optw(\textbf{x})\). Therefore, \(\textbf{d}^T \textbf{y}\ge optw({\varvec{\mu }}) \ge optw(\textbf{x})\) is valid for \(Y_p({\varvec{\mu }})\). If \(\textbf{w}\in ZF(\textbf{x}){\setminus } Y_p(\textbf{x})\), it follows that \(\textbf{d}^T \textbf{w}< optw(\textbf{x})\) and, hence, \(\textbf{d}^T \textbf{w}< optw({\varvec{\mu }})\). It follows that \(\textbf{w}\notin Y_p({\varvec{\mu }})\). If \(\textbf{w}\in ZF(\textbf{x}) {\setminus } Y_p(\textbf{x})\), then \(\textbf{d}^T \textbf{w}< optw(\textbf{x})\). Since \(\textbf{d}^T {\varvec{\nu }}\le \textbf{d}^T \textbf{w}< optw(\textbf{x})\), then \({\varvec{\nu }}\notin Y_p(\textbf{x})\). \(\square \)

The conditions above do not automatically translate into strengthening procedures, as there is no easy way to detect when there exists \(\textbf{y}\in Y_p(\textbf{x}) \cap Z({\varvec{\mu }})\). However, when \(\textbf{f}= \textbf{0}\), it follows that \(P(\textbf{x}) = Z(\textbf{x})\) for any \(\textbf{x}\in P^{\textbf{x}}\). Therefore, a sufficient condition is \(P(\textbf{x}) \subseteq P({\varvec{\mu }})\). Under this assumption and using Definitions 7 and 8 in Sect. 3.5, we could derive results similar to the ones for the optimistic case. We omit a detailed description, because their practical application is limited by assumption \(\textbf{f}=\textbf{0}\) (see also Sect. 2.4).

5 The branch-and-cut algorithm

We can now define a branch-and-cut algorithm to solve optimistic and pessimistic LBP. Depending on the cuts that are used (follower optimality or no-good) and on the setting (optimistic or pessimistic), we can define four versions of the algorithm:

Fo::

optimistic setting with OFO cuts;

Go::

optimistic setting with no-good cuts;

Fp::

pessimistic setting with restricted no-good, OFO and FPO cuts;

Gp::

pessimistic setting with restricted, optimistic and pessimistic no-good cuts.

We assume that a reader is familiar with the branch-and-bound algorithm, which is a popular technique for solving integer programming problems (Wolsey 1998). Branch-and-cut (Padberg and Rinaldi 1991) is an improved branch-and-bound exploration of the solution space, which starts from a suitable initial formulation of the problem and where each node of the tree is processed according to a user-defined policy that adds to the problem user-defined cuts to eliminate some unwanted solutions. The initial formulation we use is the HPR problem (see Sect. 2), and we illustrate in Sect. 5.1 which is the relation between HPR and LBP and what happens when the assumption that the variables have finite bounds is removed. Our policy to process the nodes and generate cuts (or separation) is described in Sect. 5.2, Sect. 5.3 and summarized in Algorithms 13 for the optimistic and the pessimistic problem, respectively.

Since the cuts we use define a single-level reformulation of LBP, we do not need dedicated branching rules, branching on integer solutions or accessing the current basis, as in Bard and Moore (1990); DeNegre and Ralphs (2009); Fischetti et al. (2017); Wang and Xu (2017); Xu and Wang (2014). This leads to a simpler algorithm that can easily be integrated with the standard behavior of a solver. In the following, branch simply means applying any general purpose branching rule. The binarization procedure described in Sect. 3.1 is applied to the \(\textbf{x}\) variables, for Fo and Fp, and also to the \(\textbf{y}\) variables, for Go and Gp. Also note that binarization is not required in the problems that are solved in the separation phase.

5.1 The high point relaxation

The HPR problem consists of minimizing the leader objective function over P (see Sect. 2). Hence, it can be modeled as follows.

$$\begin{aligned} \text{ HPR } \quad&\min \textbf{c}^T \textbf{x}+ \textbf{d}^T \textbf{y}{} & {} \nonumber \\&\textbf{A}\textbf{x}+\textbf{L}\textbf{y}\ge \textbf{b}\nonumber \\&\textbf{G}\textbf{x}+ \textbf{H}\textbf{y}\ge \textbf{r}{} & {} \nonumber \\&x_i \text{ integer }{} & {} i \in N \nonumber \\&y_i \text{ integer }{} & {} i \in F \nonumber \end{aligned}$$

If \(\textbf{G}= \textbf{0}\), optimistic LBP is not a bilevel problem anymore, as it can be solved by first solving the follower problem (whose feasible region is independent of \(\textbf{x}\)) to compute the corresponding optimal value k and, then, solving HPR with the additional constraint \(\textbf{f}^T \textbf{y}\le k\). This is not true, in general, for the pessimistic case. Indeed, if \(\textbf{L}\ne \textbf{0}\) and the problem is reformulated in a single-level fashion as described above, is not guaranteed that the produced optimal \(\textbf{x}\) corresponds to an empty \(ZI(\textbf{x})\), as it is required for a pessimistic solution to be feasible. However, a similar approach can be adopted for the pessimistic case as well, if \(\textbf{G}= \textbf{0}\) and \(\textbf{L}= \textbf{0}\), that is, not only is the feasible region of the follower problem independent of the leader variables, but also is the feasible region of the leader problem independent of the follower variables.

Since \(\varPi \subseteq B \subseteq P\) and \(\varOmega \subseteq B \subseteq P\), if HPR is infeasible, then LBP is trivially infeasible either. Neither HPR nor LBP can be unbounded, if all the variables have finite bounds. However, LBP can be infeasible when HPR is not. This happens when, for any \(\textbf{x}\) such that \(P(\textbf{x}) \ne \emptyset \), \(Z(\textbf{x}) = ZI(\textbf{x})\) and, in the pessimistic setting, also when \(ZI(\textbf{x}) \ne \emptyset \) for any \(\textbf{x}\) such that \(P(\textbf{x}) \ne \emptyset \), even if \(ZF(\textbf{x})\) is not empty either.

If we relax the assumption that the variables have finite bounds, the situation becomes much more complicated. Suppose to remove the bounds on the \(\textbf{y}\) variables, but to keep them on the \(\textbf{x}\) ones. As a consequence, HPR, LBP, \(\text{ FOLL }(\textbf{x}) \) and \(\text{ WORST }(\textbf{x}) \) may become unbounded and we have the possibilities illustrated in Theorem 15, which generalizes the results in Xu and Wang (2014). Note that the pessimistic problem can never be unbounded. In fact, as we prove below, if the problem of maximizing \(\textbf{d}^T \textbf{x}\) over \(ZF(\textbf{x})\) is unbounded, then \(Y_p(\textbf{x}) = \emptyset \) for any \(\textbf{x}\) and LBP is infeasible. The same happens to optimistic LBP, if ties are broken ax ante (see Sect. 1) and the problem of minimizing \(\textbf{d}^T \textbf{x}\) over \(ZF(\textbf{x})\) is unbounded; \(Y_o(\textbf{x})\) is empty for any \(\textbf{x}\), \(\varOmega =\emptyset \) and LBP is infeasible. On the contrary, if ties are broken ex post, that is, optimistic LBP is solved optimizing over B, then the problem can be unbounded (see Sect. 2.2). Indeed, if the problem of minimizing \(\textbf{d}^T \textbf{y}\) over \(ZF(\textbf{x})\) is unbounded, then \(\varOmega \) is empty, but B is not.

Theorem 15

Assume that HPR is not infeasible and that if the follower problem is not unbounded, \(ZF(\textbf{x}) \ne \emptyset \) for at least one \(\textbf{x}\) and, if the pessimistic policy is used, \(ZI(\textbf{x}) =\emptyset \) for at least one of such \(\textbf{x}\). Let R be the set of the extreme rays of the cone \(\{{\varvec{\gamma }}\in {\mathbb {R}}^{n_y}:\textbf{H}{\varvec{\gamma }}\ge \textbf{0}\}\). The following results hold.

  1. 1.

    HPR is unbounded if and only if \(\exists {\varvec{\gamma }}\in R\) such that \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}<0\);

  2. 2.

    If ties are broken ex ante, optimistic LBP is infeasible if and only if either there exists \({\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}<0\) or there exists \({\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}= 0\), \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}<0\);

  3. 3.

    If ties are broken ex post, optimistic LBP is infeasible if and only if there exists \({\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}<0\), while it is unbounded if and only if \(\textbf{f}^T {\varvec{\gamma }}\ge 0\) for all \({\varvec{\gamma }}\in R\) and there exists \({\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}= 0\), \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}<0\);

  4. 4.

    Pessimistic LBP is infeasible if and only if either there exists \({\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}<0\) or there exists \({\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}= 0\), \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}>0\);

Proof

The initial assumptions exclude cases where LBP is infeasible, independently of the removal of the bounds on the \(\textbf{y}\) variables.

  1. 1.

    Suppose that \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}<0\) for some \({\varvec{\gamma }}\in R\). For any \(\textbf{x}\), \(\textbf{y}\) \(\lambda \) with \((\textbf{x},\textbf{y}) \in P\) and \(\lambda > 0\), vector \(\textbf{w}= \textbf{y}+ \lambda {\varvec{\gamma }}\) is such that \((\textbf{x},\textbf{w})\) belongs to P and value \(\textbf{d}^T \textbf{w}\) can become arbitrarily negative. Hence, HPR is unbounded. On the contrary, if \(\textbf{d}^T {\varvec{\gamma }}\ge 0\) any time that \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{H}{\varvec{\gamma }}\ge \textbf{0}\), this cannot happen.

  2. 2.

    Suppose that \(\textbf{f}^T {\varvec{\gamma }}< 0\) for some \({\varvec{\gamma }}\in R\). For any \(\textbf{x}\), \(\textbf{y}\), \(\lambda \) with \(\textbf{y}\in P(\textbf{x})\) and \(\lambda > 0\), vector \(\textbf{w}= \textbf{y}+ \lambda {\varvec{\gamma }}\) belongs to \(P(\textbf{x})\) as well. If so, \(\textbf{f}^T \textbf{w}\) can become arbitrarily negative and \(\text{ FOLL }(\textbf{x}) \) is unbounded for any \(\textbf{x}\) with \(P(\textbf{x}) \ne \emptyset \). As a consequence, \(Z(\textbf{x}) = \emptyset \) and LBP is infeasible in both the optimistic and the pessimistic case. If ties are broken ex ante, we are optimizing over \(\varOmega \). Assume that \(\exists {\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}= 0\), \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}<0\). Hence, for any \(\textbf{x},\textbf{y},\lambda \) such that \(\textbf{y}\in ZF(\textbf{x})\) and \(\lambda > 0\), vector \(\textbf{w}= \textbf{y}+ \lambda {\varvec{\gamma }}\) has the property that \(\textbf{w}\in ZF(\textbf{x})\) and \(\textbf{d}^T \textbf{w}\) can become arbitrarily negative. It follows that the problem of minimizing \(\textbf{d}^T \textbf{y}\) over \(Z(\textbf{x})\) is unbounded, \(Y_o(\textbf{x}) = \emptyset \) for any \(\textbf{x}\), \(\varOmega = \emptyset \) and optimistic LBP is infeasible. If neither of the two possibilities verifies, then it is easy to see that the problem admits a finite optimum. In fact, if \(\textbf{f}^T {\varvec{\gamma }}> 0\) for all \({\varvec{\gamma }}\in R\), then the follower problem cannot be unbounded and \(Z(\textbf{x})\) is always a bounded set. Since LBP cannot be unbounded and we assumed that there exists at least one \(\textbf{x}\) with \(ZF(\textbf{x}) \ne \emptyset \), then LBP admits a finite optimum.

  3. 3.

    For the proof of optimistic LBP being infeasible when \(\exists {\varvec{\gamma }}\in R\) with \(\textbf{f}^T {\varvec{\gamma }}< 0\), see point 2. If ties are broken ex post, we are optimizing over B. It \(\textbf{f}^T {\varvec{\gamma }}> 0\) for all \({\varvec{\gamma }}\in R\), the follower problem cannot be unbounded. Since we assumed that \(ZF(\textbf{x}) \ne \emptyset \) for at least one \(\textbf{x}\), then optimistic LBP cannot be infeasible. Assume that \(\textbf{f}^T {\varvec{\gamma }}\ge 0\) for all \({\varvec{\gamma }}\in R\) and that \(\exists {\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}= 0\), \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}<0\). Hence, for any \(\textbf{x},\textbf{y},\lambda \) such that \(\textbf{y}\in ZF(\textbf{x})\) and \(\lambda > 0\), vector \(\textbf{w}= \textbf{y}+ \lambda {\varvec{\gamma }}\) has the property that \(\textbf{w}\in ZF(\textbf{x})\), \((\textbf{x},\textbf{y}) \in B\) and \(\textbf{d}^T \textbf{w}\) can become arbitrarily negative. It follow that optimistic LBP is unbounded. If \(\textbf{f}^T {\varvec{\gamma }}> 0\) for all \({\varvec{\gamma }}\in R\) and \(\textbf{f}^T {\varvec{\gamma }}\ne 0\) for all \({\varvec{\gamma }}\in R\) such that \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}<0\), then \(\textbf{w}\) does not belong to \(ZF(\textbf{x})\) for any \(\lambda \) and the problem cannot be unbounded.

  4. 4.

    If \(\textbf{f}^T {\varvec{\gamma }}< 0\) for some \({\varvec{\gamma }}\in R\), then pessimistic LBP is infeasible (see point 2). Suppose that \(\exists {\varvec{\gamma }}\in R\) such that \(\textbf{f}^T {\varvec{\gamma }}= 0\), \(\textbf{L}{\varvec{\gamma }}\ge \textbf{0}\) and \(\textbf{d}^T {\varvec{\gamma }}>0\). Hence, for any \(\textbf{x},\textbf{y},\lambda \) such that \(ZI(\textbf{x}) = \emptyset \), \(\textbf{y}\in ZF(\textbf{x})\) and \(\lambda > 0\), vector \(\textbf{w}= \textbf{y}+ \lambda {\varvec{\gamma }}\) has the property that \(\textbf{w}\in ZF(\textbf{x})\) and \(\textbf{d}^T \textbf{w}\) can become arbitrarily large. It follows that \(Y_p(\textbf{x})=\emptyset \) for any \(\textbf{x}\), \(\varPi = \emptyset \) and, hence, pessimistic LBP is infeasible. If neither of the two possibilities verifies, then it is easy to see that the problem admits a finite optimum.

\(\square \)

It follows from Theorem 15 that, when HPR is unbounded, LBP can be infeasible or it can admit a finite optimum in both the optimistic and the pessimistic case. The same happens when HPR admits a finite optimum. It ties are broken ex post, optimistic LBP can also be unbounded, when HPR is unbounded. In what follows, we come back to the original assumption that all the variables have finite bounds (see Sect. 2.1).

figure a

5.2 The optimistic case

The procedure for processing a node of the branch-and-bound tree in the optimistic case is the following. The linear relaxation of the current problem is solved. Let \((\textbf{x},\textbf{y})\) be the computed solution and assume that the node has not been pruned, that is, \((\textbf{x},\textbf{y})\) has a better value than the best upper bound UB known for the problem so far. If \((\textbf{x},\textbf{y})\) is fractional, we branch. Otherwise, the solution is processed by the following separation phase. We solve \(\text{ FOLL }(\textbf{x}) \) to test if \((\textbf{x},\textbf{y})\) belongs to B. The follower problem can never be infeasible at this stage, as \(P(\textbf{x}) \ne \emptyset \) for all \((\textbf{x},\textbf{y}) \in P\). If \(\textbf{f}^T \textbf{y}> opt(\textbf{x})\), then \((\textbf{x},\textbf{y})\) is eliminated by adding to the current formulation the corresponding OFO (or no-good) inequality. After that the new optimal solution of the linear relaxation is computed and processed. If there is no violated inequality, \((\textbf{x},\textbf{y})\) belongs to B and, hence, it is feasible for the optimistic problem, according to Theorem 1. This means that the current node can be closed and that the current solution can be used to update the best upper bound, if the case. This procedure is summarized in Algorithm 1. Before they are added to the current formulation, the generated cuts are strengthened according to the criteria defined in Sect. 3.5. Note that every time that the separation phase is used, a feasible solution is also generated. In fact, either \((\textbf{x},\textbf{y}) \in B\) or \((\textbf{x},\textbf{y}^*) \in B\), where \(\textbf{y}^*\) is the computed optimal solution of \(\text{ FOLL }(\textbf{x}) \).

Sometimes it is useful to process the fractional solutions as well, instead of just branching on them. When a fractional solution does not belong to the integer hull of the current formulation, it can be eliminated by a general purpose cut for mixed integer problems (e.g., a Chvatal-Gomory cut). The follower optimality cuts can also be used on some fractional vertices. Indeed, they only require the integrality of the current \(\textbf{x}\) to define \(\text{ FOLL }(\textbf{x}) \) and the violated inequality, if any, while the \(\textbf{y}\) variables can be fractional, as only value \(\textbf{f}^T \textbf{y}\) matters. Then, the follower optimality cuts can be used any time that the current \(\textbf{x}\) is integer, even if \(\textbf{y}\) is not. Assume now that \(\textbf{x}\) is fractional. Under some assumptions, we can define an equivalent integer leader solution \({\varvec{\mu }}\) with the property that if \(\textbf{f}^T \textbf{y}> opt({\varvec{\mu }})\), then the OFO inequality for \({\varvec{\mu }}\) also eliminates fractional vertex \((\textbf{x},\textbf{y})\). Suppose that any fractional component of \(\textbf{x}\) corresponds to a binary coherent variable (see Definition 7 in Sect. 3.5).

Theorem 16

Let \({\varvec{\mu }}\) be the integer leader solution obtained by rounding \(x_i\) to 1, if \(\textbf{G}_i \le \textbf{0}\), and to 0, if \(\textbf{G}_i \ge \textbf{0}\). If violated by \(({\varvec{\mu }},\textbf{y})\), the OFO inequality for \({\varvec{\mu }}\) is also violated by \((\textbf{x},\textbf{y})\) as well.

Proof

When we strengthen the OFO inequality for \({\varvec{\mu }}\) according to Theorem 10, it does not contain any leader variable \(x_i\) that was fractional in solution \(\textbf{x}\). Then, if the inequality is violated by \(({\varvec{\mu }},\textbf{y})\), it is violated by \((\textbf{x},\textbf{y})\). \(\square \)

The improved separation algorithm that uses the OFO inequalities also on some fractional solutions, as described above, is summarized in Algorithm 2.

figure b
figure c

5.3 The pessimistic case

The policy for processing a node of the branch-and-bound tree in the pessimistic case, is the following. The linear relaxation of the problem corresponding to the current node is solved. Let \((\textbf{x},\textbf{y})\) be the computed solution and assume that the node has not been pruned and that the solution is integer, otherwise we branch. Then, \((\textbf{x},\textbf{y})\) is processed by the following separation phase. We solve \(\text{ FOLL }(\textbf{x}) \) to verify if \((\textbf{x},\textbf{y}) \in B\) and, if this is not the case, we add to the problem an OFO cut or an optimistic no-good inequality. If \((\textbf{x},\textbf{y}) \in B\), we must check if \(ZI(\textbf{x}) \ne \emptyset \), by solving \(\text{ FEAS }(\textbf{x}) \). If there exists a violated restricted no-good inequality, it is added to the formulation. If no restricted no-good cut is generated, we solve \(\text{ WORST }(\textbf{x}) \), to test if \(\textbf{y}\in Y_p(\textbf{x})\). If \(\textbf{d}^T \textbf{y}= optw(\textbf{x})\), then \(\textbf{y}\in Y_p(\textbf{x})\), that is, the current solution is feasible for the pessimistic problem. The node can be closed and, if the case, \((\textbf{x},\textbf{y})\) can be used to update UB. Otherwise, a PFO or a pessimistic no-good inequality is added to the formulation. The procedure described above is summarized in Algorithm 3. Each time that separation is used, either a feasible solution is produced (\((\textbf{x}, \textbf{y})\) or \((\textbf{x},\textbf{w})\), with \(\textbf{w}\) being the optimal solution of \(\text{ WORST }(\textbf{x}) \)) or it is proved that \(\textbf{x}\) cannot be completed by any suitable follower solution and must be discarded (a restricted no-good inequality is added).

The result in Theorem 16 holds for the restricted no-good inequalities and the PFO cuts as well. However, it requires some more strict assumptions for the pessimistic policy. For this reason, this improved separation scheme is not used in the experiments for the pessimistic setting.

5.4 Preprocessing

Before solving the problem, some variables can be fixed according to the result below, which generalizes the one in Fischetti et al. (2017).

Theorem 17

For every follower variable \(y_i\), the fixing below is correct: if \(f_i<0\) and \(\textbf{H}_i \ge \textbf{0}\), then set \(y_i\) to the upper bound, independently of the setting; if \(f_i>0\) and \(\textbf{H}_i \le \textbf{0}\), then set \(y_i\) to the lower bound, independently of the setting; if \(f_i=0\), \(\textbf{H}_i \ge \textbf{0}\), \(\textbf{L}_i = \textbf{0}\) and \(d_i \ge 0\) (\(d_i \le 0\)), then set \(y_i\) to the upper bound in the pessimistic (optimistic) setting; if \(f_i=0\), \(\textbf{H}_i \le \textbf{0}\), \(\textbf{L}_i = \textbf{0}\) and \(d_i \le 0\) (\(d_i \ge 0\)), then set \(y_i\) to the lower bound in the pessimistic (optimistic) setting.

Proof

The proof for \(f_i \ne 0\) can be found in Fischetti et al. (2017). If \(\textbf{L}_i = \textbf{0}\), then \(y_i\) does not appear in the leader constraints. If \(f_i=0\) and \(\textbf{H}_i \ge \textbf{0}\) (\(\textbf{H}_i \le \textbf{0}\)), we can increase (decrease) the value of \(y_i\), without affecting the optimality or the feasibility of the follower problem. In a pessimistic perspective, the follower is supposed to choose the worst possible solution from the leader point of view, among the optimal ones. Then, if \(d_i \ge 0\) (\(d_i \le 0\)), the follower is assumed to choose the largest (smallest) possible value for \(y_i\). Hence, we can fix \(y_i\) to its upper (lower) bound. In the optimistic problem, the follower is assumed to choose the best solution for the leader, then we can set \(y_i\) to the upper (lower) bound if \(d_i \le 0\) (\(d_i \ge 0\)). \(\square \)

To reduce the size of the search tree, it is useful to derive a globally valid upper bound K on the optimal follower value. A simple way to compute such an upper bound is to maximize \(\textbf{f}^T \textbf{y}\) over \(\{(\textbf{x},\textbf{y}):\) \(\textbf{G}\textbf{x}+ \textbf{H}\textbf{y}\ge \textbf{r}:\) \(x_i\) integer for \(i \in N\), \(y_i\) integer for \(i \in F\}\). Suppose now that all the leader variables are coherent and let \(\textbf{x}\) be the leader solution where \(x_i = \ell _i^{\textbf{x}}\) if \(\textbf{G}_i \ge \textbf{0}\) and \(x_i = u_i^{\textbf{x}}\) if \(\textbf{G}_i \le \textbf{0}\). If \(\text{ FOLL }(\textbf{x}) \) is feasible, then \(opt(\textbf{x})\) is a valid upper bound. Moreover, if \(\textbf{L}= \textbf{0}\) and \(\textbf{x}\) is not feasible for the leader constraints, then the bound can be further improved as \(\max \{opt(\textbf{w}^i), i \in N\}\), where \(\textbf{w}^i\) is the solution obtained from \(\textbf{x}\) by decreasing (increasing) by one the value of variable \(x_i\), if it is negatively (positively) redundant.

5.5 Computing bigM values

Suitable bigM values \(M^O\) and \(M^P\) for the OFO and the PFO cuts are the maximum of \(\textbf{f}^T \textbf{y}\) and the minimum of \(\textbf{d}^T \textbf{y}\) over P. Values \(M^F_i\) for \(\text{ FEAS }(\textbf{x}) \) can be computed maximizing \(\textbf{L}^{iT} \textbf{y}\) over \(\{(\textbf{x},\textbf{y}):\) \(\textbf{G}\textbf{x}+ \textbf{H}\textbf{y}\ge \textbf{r}\), \(x_i\) integer for \(i\in N\), \(y_i\) integer for \(i \in F \}\). If the variables are coherent, instead of using the same bigM value \(M^O\) for all the OFO inequalities and all the variables, we can compute a different (and possibly smaller) value for each inequality and variable as follows. This improvement is done at the expense of solving the follower problem multiple times. Suppose that the \(\textbf{x}\) variables take values \(\textbf{s}\) in the current solution. Let \(\varDelta \) be the variables in the OFO cut for \(\textbf{s}\) and let \(\varGamma \) be the variables not appearing in the cut. Due to Theorem 10, for each \(j \in N\), either \(x^{l1_{s_j}}_j\) is in the cut (\(\textbf{G}_j \ge \textbf{0}\)) or \(x^{l1_{s_j}+1}_j\) is in the cut (\(\textbf{G}_j \le \textbf{0}\)), but not both. That is, at most one auxiliary variable for each \(j \in N\) belongs to \(\varDelta \). Let us denote by \(M^{O}_{\textbf{s}j}\) the bigM value to be used for variable j in the OFO cut for \(\textbf{s}\). Order the variables in \(\varDelta \) arbitrarily and sequentially compute bigM values \(M^{O}_{\textbf{s}j}\) according to the order. For any leader variable \(j \in \varDelta \), let \(\varDelta ^- = \{i \in \varDelta : i \le j\}\) and \(\varDelta ^+ = \varDelta \setminus \varDelta ^-\). Then, we can set set \(M^{O}_{\textbf{s}j} = opt_j\), where \(opt_j\) is the optimal value of problem \(rec_{\textbf{s}j}\) below, if one exists.

$$\begin{aligned} rec_{\textbf{s}j}\;&\min \textbf{f}^T \textbf{y}{} & {} \nonumber \\&\textbf{H}\textbf{y}\ge \textbf{r}- \textbf{G}\textbf{x}{} & {} \nonumber \\&y_i \text{ integer }{} & {} i \in F \nonumber \\&x_i = u_i^{\textbf{x}}{} & {} i \in \varGamma \cup \varDelta ^-: \textbf{G}_i \le \textbf{0}\\&x_i = \ell _i^{\textbf{x}}{} & {} i \in \varGamma \cup \varDelta ^-: \textbf{G}_i \ge \textbf{0}\\&x_i = s_i{} & {} i \in \varDelta ^+ \end{aligned}$$

Theorem 18

Values \(M^{O}_{\textbf{s}j}\) are correct for the OFO cut, if the variables are coherent.

Proof

When \(opt_j\) is computed, the variables in \(\varDelta ^- \cup \varGamma \) take the worst possible value, while the others do not change. Then, for any solution where the variables in \(\varDelta ^- \cup \varGamma \) take any value while the others do not change, the optimal solution of the follower problem is no larger than \(opt_j\) and the OFO cut is valid if this value is used as bigM. If some variable \(i > j\) changes, the cut is deactivated by \(M^{O}_{\textbf{s}i}\). \(\square \)

6 The experiments

We implemented and tested, on a set of benchmark instances, the algorithms defined in Sect. 5.

6.1 The instances

The optimistic test-bed includes 390 instances, belonging to the following sets of known instances from the literature:

S1o::

Fifty instances with integer leader and follower variables, originally introduced in DeNegre (2011) and downloaded from Fischetti et al. (2019);

S2o::

Hundred problems with integer leader variables and mixed integer follower variables, originally introduced in Xu and Wang (2014) and provided by the authors of Lozano and Smith (2017);

S3o::

Sixty large instances with integer leader variables and mixed integer follower variables, originally introduced in Fischetti et al. (2017) and downloaded from Fischetti et al. (2019);

S4o::

Hundred instances with integer leader and follower variables, originally derived in Wang and Xu (2017) from the ones of S2o;

S5o::

Eighty clique interdiction instances, originally introduced in Tang et al. (2016) and downloaded from Tang et al. (2019).

A more detailed description of these instances can be found in the above mentioned papers. Most of the above instances cannot be used for the pessimistic problem, as the optimistic and the pessimistic framework produce the same solution, because either they have \(\textbf{L}=\textbf{0}\) and \(\textbf{f}=-\textbf{d}\), which are sufficient conditions for Theorem 4 to hold (S5o) or because the follower problem admits a unique solution in most of the cases (Lozano and Smith 2017) (S1o, S2o, S3o, S4o). Therefore, the pessimistic test-bed includes the 263 instances below:

S1p::

The (three) instances of S1o where the optimistic and the pessimistic framework have different solutions;

S2p::

Hundred instances with integer leader variables and mixed integer follower ones, originally introduced and provided by the authors of Lozano and Smith (2017);

S3p: :

Hundred instances with integer leader variables and mixed integer follower ones, originally introduced and provided by the authors of Lozano and Smith (2017);

S4p::

Sixty large instances with integer leader variables and mixed integer follower variables, obtained from S3o by randomly setting \(f_i\) to 1 or 0 with probability \(\alpha =0.15\) and \(1-\alpha \), respectively.

For the instances in S2o, S3o, S2p, S3p and S4p, the number of the follower variables is equal to the number of the leader variables and 50% of the follower variables are required to be integer. The number of leader and follower constraints is 0.4 times the number of the leader variables. Instances in S4o follow the same scheme, but all the follower variables are required to be integer. For the instances in S2o, S4o, S2p, S3p the number of the leader variables ranges from 10 to 460, while for the problems in S3o and S4p it goes from 500 to 1000. Since S2o and S3o contain continuous lower level variables, they cannot be solved by Go, which requires all the follower variables to be integer. For the same reason, Gp can be used only on S1p.

6.2 The implementation details

The experiments run on a 2 cores Intel i7-6600U @ 2.6 GHz with 20GB RAM. The algorithms are implemented in C/cpp under Linux, using CPLEX 12.6 with the following setting: time limit of 3600 s; no probing, cuts and heuristics. For the instances of S1o, whose bounds on the variables are quite poor, we recomputed them in preprocessing for any variable to be binarized. The improved version for the optimistic method illustrated in Algorithm 2 is used only on S5o.

We compare our results with the ones reported in the literature for the algorithms in Fischetti et al. (2017); Lozano and Smith (2017); Wang and Xu (2017); Xu and Wang (2014), denoted by A1, A2, A3, and A4, respectively. A1 is tested on a cluster consisting of Intel XEON E52670v2 @ 2.5 GHz with 12GB RAM, using cplex 12.8 and multi-thread (4 threads); the results for A2 and A4 were obtained on a 2 cores Intel i7-3537U @ 2 GHz with 8GB RAM, using cplex 12.6 within an algorithm coded in Java under Windows; A3 is implemented in MATLAB using TOMLAB/CPLEX on a desktop computer with 3GB RAM and a 2.4 GHz CPU. For a comparison the machines, see (Passmark Software: http://www.cpubenchmark.net), a website that produces scores to evaluate the CPU performances.

Fig. 1
figure 1

Solution and separation times for Fo on S4p

Fig. 2
figure 2

Solution and separation times for Fp on S4p

Fig. 3
figure 3

Branch-and-bound nodes for Fo and Fp on S4p

6.3 General statistics

General statistics on the computational behavior of Fo and Fp can be found in Figs. 12 and 3. These figures report how the running time, the separation time and the number of branch-and-bound nodes generated by Fo and Fp change, when the size of the instances change. Data are obtained by running Fo and Fp on the 60 instances of S4p. The \(\textbf{x}\) axis reports the number of the leader variables in the considered instances. The number of the follower variables and of the constraints can be derived as illustrated in Sect. 6.1. The plot corresponding to each \(\textbf{x}\) value refers to the average value on 10 instances of the same size. In Figs. 1 and 2, the \(\textbf{y}\)-axis reports times in seconds. In Fig. 3, the \(\textbf{y}\)-axis corresponds to number of generated branch-and-bound nodes.

The solution time doubles at every increasing in the number of the leader (and follower) variables by 200, both in the optimistic and in the pessimistic setting. The separation time is quite limited with respect to the solution time for the optimistic setting, whereas the pessimistic setting requires, in general, larger separation times (see also Sect. 6.5). The number of branch-and-bound nodes is less dependent on the size of the instances, presenting a similar trend for the optimistic and the pessimistic setting. The pessimistic setting requires, on average, about six times the computational effort needed to solve the optimistic version of the problem on the same instances.

6.4 Comparison with other approaches on the optimistic test-bed

Table 2 Results on S1o
Table 3 Results on S2o and S4o
Table 4 Results on S3o and S5o

We discuss the results of Fo, Go, A1, A2, A3, and A4 on the instances of the optimistic test-bed defined in Sect. 6.1, which are reported in Tables 234. For each algorithm and instance of S1o, we report the solution time (in seconds), if the problem has been solved to optimality, or symbol, if the problem has not been solved to optimality by the considered approach. For S2oS5o, each line of the corresponding tables represents a class that contains ten instances of the same size and average solution times (in seconds) are reported. The best results are in bold. The times used in the tables for Fo and Go are overall times and include input, preprocessing and solution time. The times for the other approaches are the ones given in the corresponding papers. The results for A4 are taken from Lozano and Smith (2017), where the algorithm was reimplemented. No total times for A3 are given in Wang and Xu (2017), and we had to derive them from the available information, as discussed in the comment to Table 3.

Table 2 presents a comparison between Fo, Go and A1 on S1o. Both Fo and A1 are able to solve all the instances, whereas Go could solve only 48 of the 50 instances in the given time limit. Row avg times or solved instances indicates the average time spent to solve the instances in the set, if all the instances have been solved to optimality, or the number of solved instances, when this is not the case. Row best approach reports the total number of instances where each approach results to be the best one. Fo is the best approach in most cases, in particular on instances 20-20-50-0110-10-10, 20-20-50-0110-15-5 and 20-20-50-0110-15-6, which resulted to be the hardest ones in this set, requiring more time to be solved that any other instance in S1o by all the considered approaches. Most of the instances in this set are easy for all the tested algorithms.

In Table 3, we present the results on S2o obtained by Fo, A1, A2, and A4, as well as the ones on S4o obtained by A3, Go and Fo. Column #lead is the number of the leader variables of the considered group of instances. Times for A3 are obtained by summing the times to find and certify the optimal solution (last two values of each line of Table 5 in Wang and Xu (2017)) and considering, for each class, only the best option among the six versions of the algorithm. For different groups, the best option may correspond to different approaches. For S2o, Fo is the best algorithm on almost all the considered 10 classes, largely outperforming A2 and A4 and being four times faster than A1. For S4o, both Go and Fo outperform A3. Fo is the best algorithm, being about nine times faster than Go, which is about eight times faster than A3. Note that in Wang and Xu (2017), the authors report a comparison between A3 and the algorithms in Bard and Moore (1990); DeNegre and Ralphs (2009); Xu and Wang (2014), showing that A3 outperforms all those algorithms. It follows that Fo outperforms the algorithms in Bard and Moore (1990); Xu and Wang (2014) as well and that Go with the binarized no-good inequalities outperforms the original approach in DeNegre and Ralphs (2009), where the cuts are defined using a slack condition.

Table 5 Results on S1p, S2p and S3p

In Table 4, we present the results on S3o and S5o obtained by Fo and A1. For S3o, Fo is the best approach, being able to solve all the instances in faster computing times. For the clique interdiction instances in S5o, v is the number of nodes in the graph and d is the density of the edges (the percentage with respect to the total number). See Mattia (2021), Tang et al. (2016) and references therein for more details on the clique interdiction problem. We do not report the results of Go on this set, as it can solve none of the instances. S5o represents the worst set for an algorithm whose cuts are entirely based on the optimal value of the follower problem, like Fo. In fact, in these problems, the leader objective function (\(\textbf{c}=\textbf{0}\) and \(\textbf{d}=-\textbf{f}\)) drives the \(\textbf{y}\) solutions exactly in the opposite direction with respect to the follower optimum, leading to the generation of a large number of pairs not belonging to B. On the contrary, approaches whose cuts and/or branching rules are derived using different strategies, are expected to be more effective on such instances. The experiments confirm this intuition: A1 runs better than Fo, being twice as fast, although Fo outperform the dedicated approach in Tang et al. (2016), where these instances have been originally proposed.

6.5 Comparison with other approaches on the pessimistic test-bed

We discuss the results of Fp, Gp and A2 on the instances of the pessimistic test-bed defined in Sect. 6.1. The results are reported in Table 5. For each algorithm and instance of S1p, we report the solution time (in seconds), if the problem has been solved to optimality, or symbol, if the problem has not been solved to optimality by the considered approach. Rows avg times or solved instances and best approach have the same meaning of the corresponding rows in Table 2. For S2p and S3p, each line of the table represents a class that contains ten instances of the same size and average solution times (in seconds) are reported. The best results are in bold. The times used in the table for Fp and Gp are overall times and include input, preprocessing and solution time. The times for A2 are taken from Lozano and Smith (2017).

For S1p, Fp performs better than Gp, which is not able to solve all the instances. It also outperforms A2 on S2p and S3p, being about seven times faster. Figure 2 shows that Fp can also solve the much larger instances in S4p in reasonable computing times. Confirming what discussed in Lozano and Smith (2017); Xu and Wang (2014), Figs. 12 and 3 shows that the pessimistic problem is, in general, harder than the optimistic one; it requires larger solution and separation times and the corresponding search tree enumerates more nodes than in the optimistic case on the same instances. In fact, the reformulation of the optimistic problem via Theorem 1 allows to check a given solution only once before declaring it feasible or cutting it, while the pessimistic version of the problem cannot benefit from this result. For this reason, each solution that is generated during the algorithm may need to be tested three times before a decision on it is made. In addition, the strengthening techniques that can be used in the optimistic case (see Sect. 3.5) further reduce the computing times for the optimistic problem.

7 Conclusions

We studied bilevel programming problems where both the leader and the follower variables can be integer. Necessary and sufficient conditions for the optimistic and pessimistic policy to be equivalent were provided. We introduced a new family of inequalities, the follower optimality cuts, which allowed to reformulate the bilevel problem as a non-compact single-level problem. The strength of these cuts with respect to the no-good inequalities was discussed. Finally, we devised a branch-and-cut algorithm and presented a computational testing showing that the proposed approach outperforms the other approaches in the literature on most of the well-known sets of benchmark instances considered in the experiments.