Abstract
We propose an exact global solution method for bilevel mixedinteger optimization problems with lowerlevel integer variables and including nonlinear terms such as, e.g., products of upperlevel and lowerlevel variables. Problems of this type are extremely challenging as a singlelevel reformulation suitable for offtheshelf solvers is not available in general. In order to solve these problems to global optimality, we enhance an approximative projectionbased algorithm for mixedinteger linear bilevel programming problems from the literature to become exact under one additional assumption. This assumption still allows for discrete and continuous leader and follower variables on both levels, but forbids continuous upperlevel variables to appear in lowerlevel constraints and thus ensures that a bilevel optimum is attained. In addition, we extend our exact algorithm to make it applicable to a wider problem class. This setting allows nonlinear constraints and objective functions on both levels under certain assumptions, but still requires that the lowerlevel problem is convex in its continuous variables. We also discuss computational experiments on modified library instances.
1 Introduction
Hierarchical decision making by different agents occurs in a number of real life problems including, e.g., energy policy and markets [62], pricing schemes [37], supply chain management [60], infrastructure interdiction [10, 58], taxation [36], and transportation network design [6]. The authors in particular were motivated by using bilevel modelling to determine optimal graph aggregations for network design problems as in [3], and introducing controllable energy generation facilities with corresponding discrete decisions on the lower level of energy supply tariff design problems described in [25]. In this paper we consider the setting with two agents, the leader on the upper level and the follower on the lower level, known in game theory as a Stackelberg game (see [48]). Furthermore, discrete decisions are allowed on both levels.
Thus, we want an exact solution method for problems from the following class:
Here we denote the leader’s variables, objective function and constraints by \(x^u\in \mathbb {R}^{^{m_R} }_+\) and \(y^u\in \mathbb {Z}^{^{m_Z} }_+\), \( F\), and \( G(x^u, y^u, x^l, y^l) \le 0\), respectively. The follower’s optimization problem, parameterized by upperlevel variables \(\left( x^u, y^u\right) \), constitutes maximizing objective function \( f\) over variables \(x^l\in \mathbb {R}^{^{n_R} }_+\) and \(y^l\in \mathbb {Z}^{^{n_Z} }_+\) under constraints \( g(y^u,x^l,y^l)\le 0\).
A detailed set of additional assumptions that we consider in this paper will be introduced later, see Assumptions 1–4 in the beginning of Sect. 2. It should be stressed in particular that a number of the applications mentioned above, e.g., pricing schemes, taxation, interdiction and energy policy problems, fall into the setting given by our set of assumptions. Note that the majority of these applications are special cases that typically include either nonlinearities or lowerlevel integer variables, but not both.
1.1 Contribution
We show on a very small example that the algorithm proposed in [60], due to its approximative nature, may lead to incorrect results just because some constraint is scaled in an unfortunate way.
We then propose an exact algorithm version under the additional assumption that continuous leader variables do not appear in follower constraints. Without this assumption a compact bilevel feasible set cannot be guaranteed from a theoretical point of view, i.e., a bilevel optimum may be unattainable and therefore finite termination with the exact optimum can not be expected. More restrictive assumptions on this point are present in [23], where continuous leader variables are not allowed to appear in the follower’s problem at all. The assumption from [23] is also present in [49] and the corresponding software MibS [45].
Furthermore, we dispense with the assumption of feasibility of the bilevel problem required in [60], i.e., our algorithm can detect whether the bilevel problem is infeasible.
Moreover, we extend the algorithm to a nonlinear setting where any kind of MINLP which an offtheshelf solver can handle is allowed as the leader problem, while the follower problem considered in lowerlevel continuous variables only has to be convex, bounded, and satisfy Slater’s condition in case of feasibility. Thus we also provide a constructive proof that a finite optimum is always attained within this class of bilevel problems.
In two recent surveys on bilevel optimization [12, 29], no exact global solution method for this problem class is mentioned, nor is any other existing method—apart from complete enumeration of the follower’s integer variables—known to the authors of this paper. Neither are the authors aware of any other proof that an optimum is always attained for the class of bilevel problems considered.
Although there are \(\varepsilon \)optimal global optimization methods for bilevel problems requiring partially different assumptions than ours and, e.g., allowing more general nonconvex follower problems (see Sect. 1.3 on existing literature below), the algorithm proposed in this paper is able to find exact global optimal solutions or prove infeasibility for more complex bilevel problem classes than methods with the same property suggested in the literature, e.g., [23, 49].
Finally, we demonstrate the performance of the first implementation on newly constructed bilevel test instances. Note that the algorithm proposed in this paper builds upon a MINLP solver of the user’s choice, which iteratively derives upper and lower bounds on the bilevel optimum by solving corresponding singlelevel optimization problems. Thus our method incorporates all merits of an established solver implementation (stability, efficiency, multithreading support, etc.) and automatically profits from any further development of MINLP solvers.
1.2 Paper structure
Section 1.3 contains a short overview of selected methods and existing literature for bilevel optimization problems, with a focus on problems with integer follower variables. Our setting including the necessary assumptions is presented in Sect. 2. In Sect. 2.1 we briefly state how the key features of Algorithm 1 described in Sect. 2 are realized in [60], and then describe and illustrate its approximative property in Sect. 2.2. In Sects. 3.1–3.2 we propose an exact realization which guarantees that a bilevel optimum of a feasible bilevel problem is always found. We initially restrict ourselves to the case in which the follower problem is linear. In the end of Sect. 3 we give a proof that our algorithm either finds a global optimal solution of a bilevel problem satisfying Assumptions 1–4 or proves infeasibility in finitely many iterations. Numerical results are presented in Sect. 4. In Sect. 5 we provide details on how to extend our algorithm to nonlinear follower problems. Finally, we conclude with Sect. 6.
1.3 Existing literature
Note that whenever the follower’s optimal solution is not unique, the bilevel optimal solution as well as its objective value may be so too, see, e.g., [2]. As this issue is hard to deal with in practice, the majority of bilevel problem formulations rely on a socalled optimistic assumption, i.e., whenever multiple optimal lowerlevel solutions exist, the one allowing best results for the leader is chosen. The opposite approach is to consider the follower’s optimal solution that is least advantageous to the leader, cf., e.g., [53, 59] and the survey [12] for more references. In the remainder of the paper we consider only bilevel problem formulations with an optimistic assumption.
A common way of solving bilevel problems is reformulating them into a singlelevel optimization problem, which is obtained by adding optimality conditions for the lowerlevel problem as constraints to the socalled High Point Relaxation (HPR) of the original bilevel problem, see, e.g., [2, 11]. However, this approach requires that general optimality conditions for the lowerlevel problem can be stated in such a way that some general global optimization solver is able to handle the reformulated singlelevel optimization problem. Consequently, this method is suitable only for special classes of bilevel problems and in particular does not allow for lowerlevel integer variables.
A number of methods have been proposed for handling bilevel problems with lowerlevel integer variables, most of them concentrating on various linear problem classes. Branch and cut approaches for approximating the optimalvalue function have been used in [15, 16, 49], while [21,22,23] extend and apply intersection cuts to bilevel problems. An implementation of the approach published in [16] is publicly available in the solver MibS [45]. In addition, the algorithm presented in [21] is accessible as the software [20].
Global optimization approaches exploiting sensitivity analysis in the lowerlevel problem for the solution of various classes of bilevel programming problems are suggested in [18, 19]. The idea is to solve the lowerlevel problem as a multiparametric programming problem, with parameters being the variables of the upperlevel problem. Then by inserting the obtained rational reaction sets in the upperlevel problem, the overall problem is transformed into a set of independent quadratic, linear or mixedinteger linear programming problems, which can be solved to global optimality. Another parametric approach for linear bilevel problems is presented in [35], while [50] solves bilevel problems without integer variables by minmax reformulation.
There exist some algorithms for nonlinear bilevel problems that are proven to yield \(\varepsilon \)optimal solutions such as the methods proposed in [39, 40] and the BranchandSandwich algorithm from [31, 32]. They have been extended to the mixedinteger case in [38] and [33], respectively. Recent advances for these approaches allow, e.g., to lift restrictions on coupling equality constraints [17], or improve algorithm performance [42]. Computational results for a bilevel solver based on the implementation of the BranchandSandwich algorithm are presented in [43]. These algorithms have different sets of assumptions, but some allow continuous nonconvexities on the lower level and are thus more general than ours in this respect. However, in contrast to the \(\varepsilon \)optimal methods mentioned above, our aim is an exact global optimal solution algorithm for nonlinear mixedinteger bilevel problems with a set of assumptions that we prove sufficient to guarantee attainment of the bilevel optimum. Note that \(\varepsilon \) in the definition of bilevel \(\varepsilon \)optimality stands for bilevel feasibility as well as a bilevel optimality tolerance, since a bilevel optimum may not always be attained in the general case. For a more comprehensive list of currently available solution methods for bilevel optimization problems, the reader is referred to [12].
In [61] the authors adapt the abovementioned classical approach of solving bilevel problems via formulating necessary and sufficient optimality conditions of the lower level to a setting where both levels are mixedinteger problems. They show that if a mixedinteger linear bilevel optimization problem has the socalled relatively complete response property, iteratively solving its HPR with successively added follower optimality conditions for some fixed values of lowerlevel integer variables produces an exact solution of the original bilevel problem. Problem (1.1) has the relatively complete response property if every combination of HPRfeasible \(\left( x^u, y^u\right) \) and HPRfeasible \(y^l\) can be extended to a HPRfeasible solution \(\left( x^u, y^u, x^l, y^l\right) \) by a suitable HPRfeasible \(x^l\). Henceforth we shall call the HPR with optimality conditions of the lower level for some \(y^{l,k}\) with iteration index k the master problem (MP). The relatively complete response property guarantees that the master problem constructed in this way is a relaxation of the original bilevel problem. Successively adding follower optimality conditions for different \(y^{l,k}\) to (MP) produces a more precise approximation of the optimalvalue function of the original bilevel problem, which becomes exact if all HPRfeasible discrete decisions for the follower have been enumerated. Note that complete enumeration of all possible follower integer configurations is not always necessary for solving the bilevel problem via the procedure described above.
A similar approach is also used for computing binary quasiequilibria in [28], where the authors propose a transformation of a mixedinteger equilibrium problem into a mixedinteger bilevel problem, which they then solve by enumerating all possible integer configurations and formulating corresponding sets of follower optimality conditions. In this case the feasibility of the follower problem is not affected by the leader’s decisions, and the relatively complete response property holds.
In [60] the authors propose an extension of an algorithm from [61], which aims at handling lowerlevel integer variables in linear bilevel problems without the relatively complete response property. They consider a set of upperlevel decisions which allow certain lowerlevel integer configurations, and add corresponding lowerlevel optimality conditions to the HPR only for this upperlevel set.
Note that the idea of iteratively adding a bound for the lowerlevel objective function value to the HPR and making this bound valid only for a certain set of upperlevel configurations was previously proposed in [38]. However, the particular formulation of this bound as well as the way of defining and finding the corresponding upperlevel set and imposing the implied bound differs from the one suggested in [61].
2 General setting and algorithm structure
In this section we first state our assumptions on the class of bilevel problems the approach proposed in this paper can solve. Afterwards we introduce some necessary bilevel terminology in order to describe the general algorithm idea as proposed in [60]. Then, in Sect. 2.1 we state how the key features of the general projection algorithm are realized in [60]. Finally, in Sect. 2.2 we give an example on how this realization makes the algorithm prone to failure.
The algorithm proposed in the current paper requires the following assumptions:
Assumption 1
All upperlevel variables have finite bounds in the HPR. All lowerlevel variables have finite bounds in the follower problem. The objective and constraint functions \( F, G, f\) and \( g\) are continuous in their respective closed boxes.
Assumption 1 is a usual assumption which ensures that all terms occurring in the formulations are bounded. We will make use of it when reformulating various indicator constraints as well as for showing that the proposed algorithm terminates after finitely many iterations.
Assumption 2
For any fixed upperlevel continuous and integer decisions \(\bar{x}^{u}\) and \(\bar{y}^{u}\), respectively, and lowerlevel integer decisions \(\bar{y}^{l}\), the follower problem is convex and in case of feasibility satisfies Slater’s condition, i.e., \( f(\bar{x}^{u}, \bar{y}^{u}, \cdot , \bar{y}^{l})\) is concave, \( g(\bar{y}^{u}, \cdot , \bar{y}^{l})\) are convex, and there exists a strictly feasible point (satisfying all nonlinear constraints with strict inequality) for all \(\bar{x}^{u}, \bar{y}^{u}\) and \(\bar{y}^{l}\) if \( g(\bar{y}^{u}, x^l, \bar{y}^{l}) \le 0\) has a solution in \(x^l\).
Moreover, functions \( f\) and \( g\) are continuously differentiable within lowerlevel variable bounds.
The main consequence of Assumption 2 is that, considering all upperlevel variables as parameters, we can formulate necessary and sufficient optimality conditions for the follower problem with fixed lowerlevel integer variables. Note that although there are optimalvalue function formulations [54, 55] as well as optimality conditions based on extended duality [1] for ILPs/MILPs, they do not allow a singlelevel reformulation of the formulation (BLP) below that can be directly handed over to an offtheshelf solver.
Assumption 3
The high point relaxation (HPR) of the original bilevel problem together with the necessary and sufficient optimality conditions for the lower level as assumed in Assumption 2 is of a problem class that can be handled by some offtheshelf optimization solver.
The algorithm presented in this paper builds upon solvers for problems of the type given in Assumption 3. Hence we need this general assumption. In practice (and in our computational experiments), we mainly think of this problem class to be mixedinteger nonlinear problems (MINLPs) with polynomial nonlinearities. Note that Assumptions 1–3 are also present in [60].
Assumption 4
The lowerlevel constraints do not contain any continuous upperlevel variables.
Without Assumption 4 we might not be able to attain the bilevel optimum despite Assumption 1, see [51]. Indeed, this assumption justifies using ‘\(\max \)’ instead of ‘\(\sup \)’ in the problem formulation. Assumptions with this effect are widely used in the literature on different levels of restrictiveness. For instance, the requirement of an allinteger leader and/or follower problem is frequently encountered. The main point of Assumption 4 is that continuous leader variables do not influence the follower’s feasible set. This is very similar, yet slightly less restrictive, than Assumption 2 from [23] and Assumption 1 from [49], where continuous leader variables are also banned from the follower objective function, in addition to being banned from the follower constraints. This assumption has already been mentioned in earlier work by the same authors (e.g., see [22]) as being very important for their bilevel solver based on intersection cuts. Indeed, for the linear case, in which no continuous leader variables are present in the follower problem at all, [51] shows that a bilevel optimum is attained. To the best of the authors’ knowledge, no such statement exists yet for nonlinear cases satisfying Assumption 4, but possibly with continuous leader variables appearing in the follower objective function. This paper offers a constructive proof that a bilevel optimum is attained in this setting too. See Corollary 3.17 in Sect. 3 for more details.
Using the optimistic assumption described at the beginning of Sect. 1.3, we can employ the following reformulation of the original bilevel problem (1.1):
where
is an optimalvalue function, cf. Section 1.2 in [14].
The High Point Relaxation (HPR) mentioned earlier is obtained from (BLP) by dropping the (optimalvaluefunction constraint).
Next, we provide some definitions from [60] that are needed to state their general algorithm, including a master problem and two subproblems. By k we denote the iteration index in the following.
Definition 2.1
For any lowerlevel feasible \(y^{l,k}\), let \(P \left( y^{l,k} \right) \) be defined as
Definition 2.2
For any lowerlevel feasible \(y^{l,k}\), by \(Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \) we denote
We consider \(y^{l,k}\) to be lowerlevel feasible if it can be extended to a feasible solution \((x^l, y^{l,k})\) of the follower’s optimization problem by suitable \(x^l\) for some \(y^u\), and, similarly, \(y^{l,k}\) to be HPRfeasible if it can be extended to a HPRfeasible solution \((x^u, y^u, x^l, y^{l,k})\) by suitable \(x^u, y^u, x^l\). We slightly abuse the notion of lowerlevel and HPR feasibility in the same way at various places throughout the paper.
The HPR of the original bilevel problem together with optimality conditions of the lower level for temporary fixed lowerlevel feasible \(y^{l,k}\) is our master problem, denoted by (MP):
where
is the optimal objective function value of the lowerlevel problem where all but continuous lowerlevel variables \(x^l\) are fixed, and \(Y^L\) is a set of some lowerlevel feasible integer configurations, which formation will be shown in Algorithm 1. We deviate from notation purity for the sake of more intuitive understanding and use the same symbol \(\theta \) as for the optimalvalue function defined in (2.2). Particular realizations of projection test, implication and optimality package will be discussed later in this paper.
The following subproblem provides the optimal objective function value of the lower level for some fixed upperlevel decision \(\left( \bar{x}^{u}, \bar{y}^{u}\right) \), i.e., the value of \(\theta \) at the point \(\left( \bar{x}^{u}, \bar{y}^{u}\right) \). We call it follower optimality (FO) problem:
The last required subproblem checks whether for certain fixed \(\left( \bar{x}^{u}, \bar{y}^{u}\right) \) there exists an optimal solution of the lowerlevel problem that together with \(\left( \bar{x}^{u}, \bar{y}^{u}\right) \) satisfies the upperlevel constraints, i.e., whether \(\left( \bar{x}^{u}, \bar{y}^{u}\right) \) can be extended to a bilevel feasible solution. In addition to that, if such a lowerlevel solution exists, the one that produces best results for the upper level will be chosen, thus realizing the optimistic assumption. We call this subproblem bilevel feasibility (BF) problem:
Now we are ready to state a general form of the algorithm as proposed in [60] for feasible bilevel problems:
All of our assumptions apart from Assumption 4 stem directly from this general algorithm form and are also assumed in [60]. They guarantee that we are able to solve the master problem (MP) or determine that it is infeasible (Assumption 3), formulate optimality packages for fixed \(y^{l,k}\) (Assumption 2) and, if the bilevel problem is feasible, arrive at a bilevel optimal solution after finitely many iterations (Assumption 1). Assumption 4, that has already been motivated from a theoretical viewpoint, will be shown to be essential in practice too in Sect. 3.1.
Remark 2.3
For all HPRfeasible \(\left( \bar{x}^{u}, \bar{y}^{u}\right) \) we have
The following statements—Lemmas 2.4, 2.6, and Theorem 2.7—were already shown in or follow directly from [60]. We list them here in our notation for completeness. Corollary 2.5 is not explicitly stated in [60] (due to their assumption that a feasible solution exists), but is a direct consequence of Lemma 2.4.
Lemma 2.4
For any set \(Y^L\) of lowerlevel feasible integer variable configurations the master problem (MP) is a relaxation of the original bilevel problem (BLP).
If \(Y^L\) comprises a complete set of lowerlevel feasible integer variable configurations, the master problem (MP) is equivalent to the original bilevel problem (BLP).
Corollary 2.5
If the master problem (MP) is infeasible, then the original bilevel problem (BLP) is infeasible. If the original bilevel problem (BLP) is infeasible, the master problem (MP) for a complete set of HPRfeasible lowerlevel integer variable configurations \(Y^L\) is infeasible.
Lemma 2.6
Algorithm 1 generates some \(y^{l,k}\) to be added to \(Y^L_k\) at the end of each iteration \(k\). If thus generated \(y^{l,k}\) is already contained in \(Y^L_k\), i.e., it has been generated in some previous iteration, Algorithm 1 terminates with a bilevel optimal solution at the latest in iteration \(k+1\).
Theorem 2.7
Given a feasible bilevel problem (BLP) satisfying Assumptions 1–4, Algorithm 1 finds a bilevel optimal solution in finitely many iterations.
The proofs of statements analogous to Lemma 2.6 and Theorem 2.7 that are needed for the exact projection test as described in Sect. 3.1, can be found in Sect. 3.3.
Note that in the worst case, Algorithm 1 can result in a complete enumeration of all lowerlevel feasible \(y^l\). See Example A.1 in the appendix for an unfavorable instance in this regard.
For practical purposes, in the remainder of this paper we focus on the case in which the follower problem is linear, i.e., \( f\) and \( g\) are linear functions for fixed upperlevel decisions \(\bar{y}^{u}\) and \(\bar{x}^{u}\). In this case we can write
(a lowerlevel objective function term that is constant for fixed \(y^u\) and \(x^u\) would be meaningless) and
for vectorvalued functions \( f_R, f_Z\) and matrixvalued functions \( g_R, g_Z, g_c\) of the upperlevel decisions. Extensions to the general case characterized by Assumptions 1–4 will be discussed in Sect. 5.
2.1 Projection test, implications and optimality packages as realized in the literature
In this section we describe the setting and the way projection test, implication of optimality package and optimality package for \(y^{l,k}\) necessary for Algorithm 1 are realized in [60].
Their set of assumptions include Assumptions 1–3 and an additional assumption that the inducible region is nonempty, i.e. the bilevel problem is feasible. However, Assumption 4 is not present and the setting treats only bilevel problems where the corresponding HPR is linear.
Projection test (PT) determining whether \(y^u\in Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \) is realized with the following subproblem formed at the end of each iteration for the resulting \(y^{l,k}\):
where is an \(s\)dimensional allones vector \(\left( 1, \ldots , 1 \right) ^\top \), \(s\) is the number of constraints in the follower’s problem and \(x^{l,k}\) are copies of the lowerlevel continuous variables \(x^l\), introduced specifically for projection test in iteration \(k\).
Remark 2.8
Given \(y^{l,k}\) generated by Algorithm 1 and a HPRfeasible \(y^u\). If the optimal value of (PT\(_0\)) is 0, then \(y^u\in Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \) applies, and otherwise \(y^u\notin Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \).
Then (PT\(_0\)), its dual feasibility conditions as well as both types of complementarity constraints are added to (MP) in order to replace
by
The authors of [60] claim that constraint (2.7a) cannot be handled by offtheshelf solvers directly and propose an approximative reformulation dependent on some \(\varepsilon \ge 0\):
In Sect. 2.2 we elaborate on the theoretical meaning of this approximation and illustrate how it can lead to algorithm failure in practice.
The algorithm from [60] then utilizes a feature for specifying indicator constraints provided by some mixedinteger solvers such as, e.g., CPLEX, in order to realize implication (2.8a).
In total the approximative projection test, implication and optimality package add one new binary as well as \((2n_R + 2s)\) new continuous variables, and \((6n_R + 4s+ 2)\) new constraints to (MP) in each iteration.
All \((3n_R + 2s)\) complementarity constraints from both projection test and optimality package are then linearized using a bigM technique, as the master problem has to remain linear in [60] in order to be able to employ the indicator constraints as mentioned above. Note that, unless upper bounds for dual variables of the lower level are known, the bigM linearization technique may lead to suboptimal solutions in the bilevel context, see, e.g., [44] for more information.
2.2 Consequences of the approximative projection test
In this section we first explain some theoretical background regarding constraint (2.7a). Then we show how an inappropriate choice of \(\varepsilon \) used for the approximation (2.8a)–(2.8b) can lead to algorithm failure even on a very simple bilevel problem satisfying Assumption 4. In particular, we show that a value for \(\varepsilon \) ensuring correctness of Algorithm 1 with projection test as described in [60] depends also on the constraint representation of the bilevel problem.
First of all, note that constraint (2.7a) to be approximated in projection test implies optimizing over nonclosed sets, as will be explained later in more detail in Sect. 3, see (3.2) and (3.3) therein. Hence, its realization poses problems not just from a practical viewpoint of, e.g., unavailable solver features. Recall that, as stated in Sect. 2, sometimes no bilevel optimum can be attained precisely because the bilevel feasible region is not compact, cf. Example 6.2.1(ii) from [2]. Assumption 4 eliminates this theoretical problem, i.e. when upperlevel variables do not influence the follower’s feasible region, the bilevel optimum under the optimistic assumption can always be attained.
However, the approximation (2.8a)–(2.8b) can still cause problems even for bilevel problems satisfying Assumptions 1–4. An \(\varepsilon \) that is on the one hand small enough to ensure correctness of Algorithm 1 with projection test as described in [60], and on the other hand big enough so as not to lead to numerical intractability, may be hard to find in practice. The following example shows how all bilevel feasible points can be cut off and a problem instance is erroneously classified as infeasible due to constraint scaling.
Example 2.9
Let \(\nu \ge 0\) be some scaling parameter. We solve the following bilevel problem parameterized by \(\nu \) employing the algorithm from [60] and show that the outcome is incorrect for \(\nu < \varepsilon \):
We apply Algorithm 1 to this example problem and start with \(k=0\), \(\textit{UB} = \infty \), \(\textit{LB} = \infty \):
Problem (MP)
yields \(\left( y_{0}^{\text {u},*}, y_{0}^{\text {l},*} \right) = (1,0)\) together with \(\textit{UB} = 1\). Problem (FO) for \(y_{0}^{\text {u},*} = 1\)
yields \(\hat{y_{}^{}}^\text {l}_0 = 4\) and \(\theta _0(y_{0}^{\text {u},*}=1) = 4\). Finally, problem (BF)
is infeasible as the two last constraints imply \(4 \le y^l\le \frac{11}{4}\). So we add \(y_{}^{l,0} = \hat{y_{}^{}}^\text {l}_0 = 4\) to \(Y^L_1\). Thus, approximative projection test (PT\(_0\)) reads:
For any optimal solution \(\left( t^{0,*}_1, t^{0,*}_2 \right) \) of (PT\(_0\)) we have
For \(0 \le t^{0}_1 + t^{0}_2 < \varepsilon \), constraint (2.8b)
implies \(\psi ^0 = 1\).
Assuming \( t^{0}_1 = t^{0,*}_1\) and \( t^{0}_2 = t^{0,*}_2\) are optimal solutions of (PT\(_0\)), constraint (2.8a) for \(y_{}^{l,0} = 4\), given by
is added to problem (MP). Observe that there are only three HPRfeasible \(y^u\) in this example problem, namely \(\{0,1,2\}\). For \(y^u=0\) and \(y^u=1\) the optimal objective function value of (PT\(_0\)) is 0 irrespective of the value of \(\nu \). In this case the above constraint correctly activates the corresponding optimality package, i.e. \(\left[ y^l\ge 4\right] \) for \(y^u\in \{0,1\}\). In particular, it means that there are no bilevel feasible solutions for \(y^u\in \{0,1\}\). The correct rational response of the follower to the only remaining HPRfeasible point \(y^u=2\) is \(y^l= 2\). However, for \(\nu < \varepsilon \) the above constraint will impose the implication \(\left[ y^u= 2\right] \rightarrow \left[ y^l\ge 4\right] \), which cuts off the only bilevel feasible point (2, 2) and leads to the instance being classified as infeasible.
So, for any \(\varepsilon > 0\) to be chosen for the approximative projection test there exists a halfspace representation of the lowerlevel problem from Example 2.9 such that Algorithm 1 will erroneously classify the corresponding bilevel problem as infeasible.
For numerical reasons, optimization solvers usually employ some kind of scaling procedure on the problem before the actual solving process starts, see, e.g., [8]. This obviously changes the original representation of the problem and thus has, as shown on Example 2.9, an influence on the suitability of the choice of \(\varepsilon \). Unfortunately, in most cases the exact scaling process of a solver when using the most performant parameter settings is neither known nor accessible to the user.
3 Exact algorithm realization suitable for nonlinearities
In this section we describe our realization of projection test, implication and optimality package for each \(y^{l,k}\in Y^L_k\) in the master problem (MP), i.e.,
which makes Algorithm 1 exact and allows its extension to a nonlinear setting. Analogously to the implementation described in Sect. 2.1, we employ a binary variable \(\psi ^k\) for every \(y^{l,k}\in Y^L_k\) to separate the above routines of the algorithm:
Section 3.1 deals with (3.1a), and Sect. 3.2 handles (3.1b). At the end of this section we present our algorithm version with the proof of its correctness and finite termination.
3.1 Exact projection test
As has been seen in Example 2.9, approximative projection test applies optimality package for \(y^{l,k}\) also to some \(y^u\notin Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \). Here we aim at precise handling by adding the implication for \(y^u\in Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \), but abstaining from adding it for any \(y^u\notin Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \). Note that \(Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \) is a discrete set, so the required implication (3.1a) could be imposed by some kind of enumeration procedure. However, this would mean that Algorithm 1 would enumerate not only \(y^l\), but combinations of \(y^l\) and \(y^u\). Therefore we want to avoid enumeration of \(y^u\) whenever possible and use dual formulations for projection test, similar to the technique described in Sect. 2.1.
For this we utilize the linear relaxation of \(P \left( y^{l,k} \right) \), denoted by \(P_{\mathrm{lin}} \left( y^{l,k} \right) \), i.e.,
(cf. Definition 2.1). Correspondingly, we denote the projection of \(P_{\mathrm{lin}} \left( y^{l,k} \right) \) to the space of upperlevel discrete variables by
Note that while \(Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \) and \(Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y^{l,k} \right) \) contain exactly the same integer points, \(Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \) is a discrete set whereas \(Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y^{l,k} \right) \) is a continuous closed set.
The implication
which is equivalent to the disjunction
however, cannot be modeled directly, as the complement of \(Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y^{l,k} \right) \) is an open set. Consequently, for linearly relaxed \(y^u\in \mathbb {R}^{m_Z}_{+}\) the set on which (3.2) is true is not a closed set.
One option to handle this obstacle is to use an approximation as it is done in [60], described in Sect. 2.1. However, as discussed in Sect. 2.2, we want to avoid this and obtain an exact algorithm. So our idea is the following:

(a)
Find an open subset \(U \subseteq Proj _{\left( y^u \right) }P_{\mathrm{lin}} \left( y^{l,k} \right) \) for which the implication \(\left[ y^u\in U \right] \implies \left[ \psi ^k= 1 \right] \) can be modeled using standard duality theory.

(b)
Model the remaining requirement \(\left[ y^u\in \left( Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) {\setminus } U \right) \right] \implies \left[ \psi ^k= 1 \right] \) via disjunction over \(y^u\in \left( Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) {\setminus } U \right) \).
Since (b) will involve some kind of enumeration, U should be chosen as large as possible such that (a) can still be modeled conveniently. We choose \(U = Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) ^\circ \right) \), i.e., the interior of \(P_{\mathrm{lin}} \left( y^{l,k} \right) \) projected to the space of discrete upperlevel variables. Note that the projection is an open map, so U is indeed an open set in the target space.
Remark 3.1
Since we would like U to be as large as possible, \(U=Proj _{\left( y^u \right) }^\circ \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) \) seems to be the canonical candidate as it is by definition the largest open subset of \(Proj _{\left( y^u \right) }P_{\mathrm{lin}} \left( y^{l,k} \right) \). However, this set is more difficult to describe. In particular, note that \(Proj _{\left( y^u \right) }^\circ \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) \ne Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) ^\circ \right) \) in general as can be seen from the following example:
Suppose \( g :\mathbb {R}^3 \mapsto \mathbb {R}\) is a scalar function with \( g(y^u, x^l, y^{l,k}) = x^l \left( y^u 1 \right) ^2\). In this case, for any \(y^{l,k}\) we have
and hence its interior is equal to \(\mathbb {R}_{>0}\). However, for \(y^u= 1\) there is no \(x^l\in \mathbb {R}_+\) such that \(x^l< \left( y^u 1 \right) ^2\) and therefore
One may object that \(U = \emptyset \) if \(P_{\mathrm{lin}} \left( y^{l,k} \right) \) is not fulldimensional, which happens regularly in practical instances due to equations in the follower problem. Nonetheless, an adjustment for U in this case is possible and will be presented in Sect. 3.1.1.
W.l.o.g. we assume in Sects. 3.1 and 3.2 that there are no lowerlevel constraints which depend on \(y^l\) only, i.e., where neither \(x^l\) nor \(y^u\) are present. Indeed, as such constraints are always satisfied for a \(y^{l,k}\) that resulted from solving (FO) or (BF) and have no influence on other variables, they can be safely disregarded once discrete lowerlevel variables are fixed to \(y^{l,k}\).
We introduce the following auxiliary optimization problem for projection test:
where \(\left( t^{k}\right) _s= \left( t^{k}, \ldots , t^{k}\right) ^\top \in \mathbb {R}^s\) is the vector having \( t^{k}\) in every component.
It is easy to see that \(y^u\in Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y^{l,k} \right) \) if and only if (PT) has a feasible solution with \( t^{k}\)\( \ge 0\). Furthermore, we have the following:
Lemma 3.2
For any fixed \(y^{l,k}\) and \(\bar{y}^{u}\) the problem (PT) has a nonempty feasible region. Moreover, the optimal solution part \( t^{k,*}\) is unique, and the following equivalences hold:

Case 1:
\( t^{k,*}> 0 \Longleftrightarrow \bar{y}^{u}\in Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) ^\circ \right) = U\),

Case 2:
\( t^{k,*}= 0 \Longleftrightarrow \bar{y}^{u}\in \left( Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) {\setminus } U \right) \),

Case 3:
\( t^{k,*}< 0 \Longleftrightarrow \bar{y}^{u}\notin Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y^{l,k} \right) \).
Proof
Recall that due to Assumption 1 all variables of both levels have finite bounds. Since \( t^{k}\) is a free slack variable present in every constraint of (PT), with \(x^{l,k}= 0\) it can be chosen such that all constraints are satisfied. Since the objective function of (PT) depends only on one decision variable \( t^{k}\), the optimal solution part \( t^{k,*}\) is unique. For the optimal solution \( t^{k,*}\) at least one of the constraints is satisfied with equality. Thus \( t^{k,*}\) is per construction the maximal slack applicable to all lowerlevel constraints with lowerlevel variables fixed to \(y^{l,k}\). Since \(y^{l,k}\) is a constant, Case 1 is equivalent to the existence of some \(\bar{x}^{l,k}\) such that
if \( g\) does not contain any tautological equations for the particular \(y^{l,k}\). Recall that w.l.o.g. we have excluded the possibility of lowerlevel equations containing no other variables apart from \(y^l\), so this translates to \(\bar{y}^{u}\in Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) ^\circ \right) \). Consequently, Case 2 corresponds to
This completes the proof. \(\square \)
Let us now consider task (a), corresponding to the first case from Lemma 3.2. For some sufficiently large number \(\text {M}_{\mu }\) (details of which we discuss in Sect. 3.1.1), we introduce the following inequality:
It ensures that \( \left[ \bar{y}^{u}\in Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) ^\circ \right) \right] \implies \left[ \psi ^k= 1 \right] \) due to Case 1 of Lemma 3.2.
We can formulate the dual problem to (PT):
Here, is an \(s\)dimensional allones vector.
We will show that we need only dual feasibility constraints to be able to reformulate (3.4) as
Lemma 3.3
Let \(\psi ^k\in \{0,1\}\) and \(\text {M}_{\mu }\) be some sufficiently large number. Let strong duality hold for (PT) and (PT dual). Adding (PT dual) and (3.4a) to the master problem (MP) implies
Proof
The weak duality theorem implies
for every feasible solution of (PT) and (PT dual). Lemma 3.2 states that \( t^{k,*}> 0\) if and only if \(y^u\in Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) ^\circ \right) \) and, therefore,
for every \(\mu \) satisfying (PT dual). Then \(\psi ^k\) is set to 1 due to (3.4a).
Note that in the master problem (MP) the variables \(\mu \) appear only in constraints from (PT dual) and in (3.4a), which do not involve any other variables apart from \(\psi ^k\). Thus \(\psi ^k\) is the only variable that links these constraints to the rest of (MP). Consequently, if \(\mu \) can be chosen such that no restriction is imposed on the value of \(\psi ^k\), i.e., such that the lefthand side of (3.4a) is nonpositive, these constraints exert no influence on (MP). This is indeed possible due to strong duality and Cases 2 and 3 of Lemma 3.2. \(\square \)
Remark 3.4
Note that unless strong duality holds between (PT) and (PT dual), the above approach may erroneously classify some upperlevel variable values as belonging to a certain projection when in fact they do not. However, the required strong duality is satisfied due to Assumption 2.
Thus, adding (PT dual) and (3.4a) to (MP) ensures correct handling for the Cases 1 and 3 from Lemma 3.2 and completes task (a) from p. 14.
Next we consider task (b) handling Case 2. We are unable to add the implication
directly to the formulation for the same reason we could not do so for implication (3.2).
Instead, we may add a weaker implication to (MP) that considers only the optimal solution values \(y_{k}^{\text {u},*}\) of the upperlevel discrete variables in the current algorithm iteration:
An obvious way to enforce implication (3.6) is to employ a socalled nogood cut [9, 27], which cuts off a specific solution. A similar approach was pursued in [52] for a Benders’ decomposition algorithm with a mixedinteger linear subproblem. In it, each assignment of discrete variables corresponds to a nonconvex set, which is the union of the complementary set and the boundary of a polyhedron. The nogood cut is used to exclude cases in further iterations that lead to points on faces of the polyhedron.
A way to realize a nogood cut for implication (3.6) is to use a quadratic constraint:
In case all upperlevel integer variables are binary, instead of (3.7) one may use the linear constraint
Remark 3.5
Note that (3.5) can be imposed on the master problem gradually by adding suitable nogood constraints realizing the implication (3.6) for some \(y^{u,k}\in \left( Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) {\setminus } U \right) \cap \mathbb {Z}_+^{m_Z}\), where \(y^{u,k}\) is the upperlevel discrete part of the optimal solution of (MP) in iteration \(k\).
The required implication (3.1a) can be imposed on the master problem by adding (PT dual), (3.4a) and a finite number of nogood constraints realizing (3.5), which at the same time imposes no implications on any \(y^u\notin Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \). The number of required nogood constraints is finite because the number of HPRfeasible \(y^u\) is finite. Note that Assumption 4 is essential for this to work as it ensures that only discrete upperlevel variables influence the set of feasible follower responses.
3.1.1 Projection test with equalities
If the description of \(P_{\mathrm{lin}} \left( y^{l,k} \right) \) contains equations, \(P_{\mathrm{lin}} \left( y^{l,k} \right) ^\circ \) is empty, i.e., Case 1 from Lemma 3.2 never occurs. Thus, the primarily desired implication (3.2) would be imposed by (3.6) only, which would lead to adding a lot of nogood cuts and, consequently, a large number of algorithm iterations. In order to avoid this, we propose a special handling for the case of equality constraints being present in the lowerlevel problem formulation.
First, we distinguish between equality and inequality constraint functions, denoted by \( h\) and \( g\), respectively. Matrixvalued functions \( h_R\), \( h_c\), \( h_Z\) and \( g_R\), \( g_c\), \( g_Z\) are used accordingly. We denote the number of inequality constraints of the lower level by \(s^\text {I} \le s\).
Our aim is to substitute
where \(P^I_{\text {lin}} \left( y^{l,k}\right) \!=\! \left\{ \left( y^u, x^l\right) : g(y^u, x^l, y^{l,k}) \!\le \! 0, y^u\in \mathbb {R}^{^{m_Z} }_+, x^l\in \mathbb {R}^{^{n_R} }_+ \right\} \) and \( g(y^u,x^l,y^l)\le 0\) are the inequality constraints of the lower level. Hence,
To achieve this we modify the primal projection test problem (PT) by abstaining from adding slack variables to equality constraints of the lower level:
where \(\left( t^{k}\right) _{s^\text {I}} = \left( t^{k}, \ldots , t^{k}\right) ^\top \in \mathbb {R}^{s^\text {I}}\) is a vector having \( t^{k}\) in every component.
Note that in contrast to (PT), (PTEq) is not necessarily feasible. However, we will show that its dual is always feasible.
With the dual variables \(\lambda \) and \(\mu \) corresponding to equality and inequality constraints of (PTEq), respectively, we formulate the dual problem to (PTEq):
Again, dimensional allones vector.
Lemma 3.6
The optimization problem (PTEq dual) always has a nonempty feasible region.
Proof
Recall that according to Lemma 3.2, (PT) always has a nonempty feasible region. Thus, although (PTEq) is not necessarily feasible, its infeasibility can arise only from violated equality constraints. Therefore (PTEq) without equality constraints is always feasible with finite optimal value, as consequently is the case for its dual
For any feasible solution \(\bar{\mu }\) of (PTwithoutEq dual), \(\left( \bar{\mu }, \lambda = 0 \right) \) constitutes a feasible solution of (PTEq dual). \(\square \)
Corollary 3.7
If (PTEq) is infeasible, (PTEq dual) is unbounded.
Proof
The dual of an infeasible linear optimization problem can be either infeasible or unbounded according to the LP theory, see Corollary 2.5 from [57]. Lemma 3.6 shows that the first case can not occur with the considered primaldual pair. Therefore the second case is true and an infeasibility of (PTEq) implies unboundedness of (PTEq dual). \(\square \)
Analogously to (3.4a), we introduce the following constraint to complete the projection test (PTEq dual) with equalities:
Now we are ready to provide the analogon of Lemma 3.3 for the case of equality constraints being present on the lower level:
Lemma 3.8
Let \(\psi ^k\in \{0,1\}\) and \(\text {M}_{\mu }\) be some sufficiently large number. If strong duality holds for (PTEq) and (PTEq dual) whenever (PTEq) is feasible, adding (PTEq dual) and (3.4b) to the master problem (MP) implies:
Proof
If \(\bar{y}^{u}\in U^\text {I} = Proj _{\left( y^u \right) } \left( P^I_{\text {lin}} \left( y^{l,k}\right) ^\circ \cap P_{\mathrm{lin}} \left( y^{l,k} \right) \right) \), all constraints of (PTEq) can be satisfied and the maximal slack \( t^{k,*}\) for inequality constraints is strictly positive, analogously to the first case of Lemma 3.2. Thus the dual (PTEq dual) is also feasible and, due to weak duality, (3.4b) enforces \(\psi ^k= 1\):
Thus we obtain the first implication of the lemma and proceed with the second one.
If \(\bar{y}^{u}\notin U^\text {I}\), then either (PTEq) is infeasible, or has \( t^{k,*}\le 0\). If (PTEq) is infeasible, then the minimization problem (PTEq dual) is unbounded according to Corollary 3.7. For \( t^{k,*}\le 0\), strong duality holds between (PTEq) and (PTEq dual):
Thus, for \(\bar{y}^{u}\notin U^\text {I}\), a feasible solution \(\left( \mu , \lambda \right) \) of (PTEq dual) can be chosen such that its objective function, which is also the lefthand side of (3.4b), is strictly less than or equal to 0 regardless of the feasibility of (PTEq). Therefore (3.4b) imposes no restrictions on the value of \(\psi ^k\) if \(\bar{y}^{u}\notin U^\text {I}\), and this completes the proof analogously to the proof of Lemma 3.3. \(\square \)
Now we address the issue of computation of \(\text {M}_{\mu }\). Note that it is sufficient to choose \(\text {M}_{\mu }\) big enough such that for every HPRfeasible \(y^u\) there is a feasible solution of (PTEq dual) that satisfies (3.4b). Then an optimal solution of (PTEq dual) also satisfies (3.4b) for every HPRfeasible \(y^u\), and, consequently, adding (3.4b) to the master problem (MP) produces the effect desired by Lemma 3.8. In the proof of Lemma 3.6 we established that for any \(y^u\) there always is a feasible solution for (PTEq dual) with \(\lambda = 0\). Therefore, a suitable \(\text {M}_{\mu }\) is, e.g., the optimal objective function value of (PTwithoutEq dual) combined with the HPR constraint set and variables of (BLP). The resulting auxiliary optimization problem is always feasible, as (PTwithoutEq dual) is feasible for any HPRfeasible \(y^u\), and has a finite objective function value as all variables in there are bounded.
Remark 3.9
The above proposed projection test including nogood cuts allows an exact realization of Algorithm 1 and also results in some more changes compared to the approximative projection test described in Sect. 2.1:

We need only dual feasibility constraints together with (3.4a) or (3.4b), respectively, instead of all necessary optimality conditions for (PT) or (PTEq), respectively.

Consequently, neither \(x^{l,k}\) as ‘copies’ of continuous lowerlevel variables, nor \( t^{k}\) for each \(y^{l,k}\) are needed.

We introduce one implication, i.e., one nogood constraint, more per iteration \(k\) of the algorithm. Altogether, we introduce one new binary and \(s\) continuous variables as well as \(n_R + 3\) constraints for the exact projection test in each iteration.

We may have to enumerate a lot of or, in the worst case, all integer points of some sets \(Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) {\setminus } U^\text {I}\).
Example 3.10
We revisit Example 2.9 using the improved algorithm proposed above. For \(k=0\), the steps solving (MP), (FO) and (BF) are identical, so we start from (PT). As the lower level does not contain equations, we employ projection test with U:
(PT) ‘Checking what \(y^u\) allow \(y_{}^{l,0} = 4\)’
Let us consider all three possible cases regarding \(Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y_{}^{l,0} \right) \):

Case 1:
\( t^{0,*} > 0 \text { for } y^u= 0 \Longleftrightarrow \left( y^u= 0 \right) \in Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y_{}^{l,0} \right) ^\circ \right) ,\)

Case 2:
\( t^{0,*} = 0 \text { for } y^u= 1 \Longleftrightarrow \left( y^u= 1\right) \in \left( Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y_{}^{l,0} \right) \right) {\setminus } U \right) ,\)

Case 3:
\( t^{0,*} < 0 \text { for } y^u\ge 2 \Longleftrightarrow \left( y^u\ge 2 \right) \notin Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y_{}^{l,0} \right) \).
The dual problem (PT dual) reads:
Taking dual feasibility constraints together with the proposed inequality (3.4a)
we obtain that \(\psi ^0 = 1\) is implied exactly for \(y^u= 0\) corresponding to Case 1, and \(\psi ^0 \in \{0,1\}\) for larger values of \(y^u\). Indeed, consider all three possible cases regarding \(Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y_{}^{l,0} \right) \):

Case 1:
for \(y^u= 0\) we have \(\nu \mu ^{0}_1 + 6 \mu ^{0}_2 \le \psi ^0 \text {M}_{\mu }\) which implies \(\psi ^0 = 1\) since \( \mu ^{0}_1+ \mu ^{0}_2 = 1\) and \( \mu ^{0}_1, \mu ^{0}_2 \ge 0\).

Case 2:
for \(y^u= 1\) we have \(4 \mu ^{0}_2 \le \psi ^0 \text {M}_{\mu }\). Hence, \(\psi ^0 = 0\) is possible as \(( \mu ^{0}_1, \mu ^{0}_2) = (1,0)\) is a feasible solution for (PT dual).

Case 3:
for \(y^u\ge 2\) the objective coefficient of \( \mu ^{0}_1\) is negative such that the feasible solution \(( \mu ^{0}_1, \mu ^{0}_2) = (1,0)\) even gives a negative objective value for (PT dual), again allowing \(\psi ^0 = 0\).
By adding the nogood cut corresponding to \(y_{0}^{\text {u},*}\), i.e., the implication \(\left[ y_{}^{\text {u}} = y_{0}^{\text {u},*} = 1 \right] \implies \left[ \psi ^0 = 1 \right] \), Case 2 is covered as well. Altogether, projection test, implication and optimality package for \(y_{}^{l,0}\) to be added to the master problem (MP) comprise
Taken together with the leader constraint \(y^u+ 4y^l\le 12\), this implies \(y^u\ge 2\), which leaves just one integer feasible point in (MP) for \(k=1\), namely \(\left( y_{1}^{\text {u},*}, y_{1}^{\text {l},*} \right) = (2,2)\), UB \(= 0\). As the point (2, 2) is bilevelfeasible, LB resulting from (FO) and (BF) is also 0, and the algorithm terminates with an optimal solution.
3.2 Implications and optimality packages
Recall that optimality package for \(y^{l,k}\) in the master problem (MP) consists of the following constraint:
where \(\theta \left( x^u, y^u, y^{l,k}\right) \) is the optimal objective function value of the follower problem with all but continuous lowerlevel variables \(x^l\) fixed. Assumption 2 ensures that \(\theta \left( x^u, y^u, y^{l,k}\right) \) can be formulated by using necessary and sufficient optimality conditions of the lower level with fixed \(y^l\), such as, e.g., KKT conditions. Note that we are only interested in globally optimal solutions of the original bilevel problem and thus do not require reformulations to be equivalent also in terms of locally optimal solutions. For the latter, the situation is actually more complicated and equivalence may not hold [13].
Let us denote the lowerlevel dual variables in iteration \(k\) by \(\pi ^{k}\). For a linear follower problem we can choose optimality package comprised of primal and dual feasibility constraints for the lower level as well as the strong duality equality with fixed integer variables \(y^{l,k}\):
For our approach we need to implement logical implications of the following form:
Implications as (3.10) are often realized by socalled indicator constraints [5, 7]. An indicator constraint is a way to express logical relationships among variables by designating a binary variable to control whether a specified constraint is active or not. Some solvers provide facilities for using indicator constraints, e.g., CPLEX, which are utilized in [60]. It is also common to implement indicator constraints by using SOS1 conditions [4], which is the approach taken by Gurobi, for example. So far, solvers do not handle arbitrary nonlinearities together with indicator constraints, and since indicator constraints usually lead to weaker relaxations, we follow a bigM approach.
If we use bigM formulations for (3.10), we have to deduce a suitable bigM coefficient for arbitrary lowerlevel problems. A similar problem arises in a popular solution method for linear bilevel optimization problems where the lower level is reformulated using KKT optimality conditions, which then are linearized using bigM formulations [60]. Finding a suitable bigM coefficient in this case is already challenging as is indicated in [30, 44]. One possibility is to derive the correct bigM coefficient based on bound propagation. The dual variables of the lower level, however, do not have finite bounds in general. Nevertheless, in the following we show that we do not require bounds for dual variables \(\pi ^{k}\) of the lower level in iteration \(k\) in order to realize the implication (3.10).
Consider the following reformulation of the implication (3.10):
with
where \( e_{i}\) are the corresponding standard basis vectors. Similarly to the calculation of \(\text {M}_{\mu }\) described in Sect. 3.1.1, the calculation of \(\text {M}_{\pi }\) can be done by solving three auxiliary optimization problems with the objectives as given in (3.12) and constraints and variables from the HPR of the original bilevel problem. Note that in these auxiliary problems \(y^{l,k}\) are variables from a set of all lowerlevel feasible integer configurations.
Terms appearing in the \(\text {M}_{\pi }\) formula (3.12) that do not include any variables can be multiplied by \(\psi ^k\) in (3.11). Then these terms can be disregarded in the \(\text {M}_{\pi }\) calculation.
Lemma 3.11
For \(\psi ^k\in \left\{ 0,1\right\} \) the inequality system (3.11) is equivalent to (3.10).
Proof
The first part of the implication (3.10), i.e., the one for \(\psi ^k= 1\), is fulfilled trivially.
In case of \(\psi ^k= 0\), optimality package corresponding to \(y^{l,k}\) is inactive, i.e., constraints (3.11) have to be satisfied for any HPRfeasible \(\left( x^u, y^u, x^l, y^l\right) \). This is true with, e.g., \(x^{l,k}= 0\) and \(\pi ^{k}= 0\). As both \(x^{l,k}\) and \(\pi ^{k}\) appear only in optimality package for \(y^{l,k}\) and nowhere else in the master problem, in an inactive optimality package their value can be chosen freely if the first constraint of (3.11) is then valid for all HPRfeasible \(\left( x^u, y^u, x^l, y^l\right) \). \(\square \)
optimality package formulation (3.11) can be simplified even further:
with
Lemma 3.12
Assume that strong duality holds for the follower problem in continuous lowerlevel variables. Then, for \(\psi ^k\in \left\{ 0,1\right\} \) the inequality system (3.13) is equivalent to (3.10).
Proof
The case of \(\psi ^k= 0\) can be treated analogously to Lemma 3.11. All inequalities of (3.13) are satisfied for \(\pi ^{k}= 0\). As \(\pi ^{k}\) appears only in the optimality package corresponding to \(y^{l,k}\), which is inactive for \(\psi ^k= 0\), the values of \(\pi ^{k}\) can be chosen freely without influencing the result of the overall optimization problem.
Then consider the case \(\psi ^k= 1\), where the idea of the proof is similar to Lemma 3.2. Due to weak duality the relation
is always true and is even satisfied with equality for each optimal solution pair of the primal and dual follower problem. For \(\psi ^k= 1\), the first inequality of (3.11) as well as of (3.13) already ensures lowerlevel optimality for fixed \(y^{l,k}\) by imposing a lower bound on the lowerlevel objective function value. Consequently, in the optimal solution of the master problem, \(\pi ^{k}\) must constitute an optimal solution of the dual lowerlevel problem with fixed \(y^{l,k}\). Indeed, for \(\psi ^k= 1\), the master problem is feasible only if (3.14) is satisfied with equality, which is possible due to strong duality. Thus only the dual feasibility conditions are required to correctly impose the lower bound on the lowerlevel objective function for fixed \(y^{l,k}\). \(\square \)
Remark 3.13
The optimality package formulation (3.13) dispenses with the copies of lowerlevel continuous variables \(x^{l,k}\) and corresponding primal feasibility constraints together with the explicit strong duality constraint. Thus, only \(s\) continuous variables and \(n_R + 1\) constraints are added to the master problem in each iteration to realize optimality package.
Note that complete elimination of \(x^{l,k}\) from the problem formulations is possible only in combination with the specific form of the projection test described in Sect. 3.1. Absence of the primal lowerlevel feasibility constraints makes calculation of \(\text {M}_{\pi }\) easier and the number itself potentially smaller.
3.3 Algorithm form, correctness and finite termination
In the following we present Algorithm 2, a modification of Algorithm 1 with our exact projection test, that also can decide feasibility of (BLP). In order to handle Case 2 from Lemma 3.2, we need to keep track of all encountered upperlevel integer solutions \(y^{u,k}\) corresponding to every generated lowerlevel integer configuration \(y^{l,k}\), denoted by \(Y^U\left( y^{l,k}\right) \). For each of these \(y^{u,k}\) a nogood constraint of the form (3.7) or, if all leader integer variables are binary, (3.8), is added to the master problem (MP) as part of exact projection test. The rest of projection test is composed of (PTEq dual) and (3.4b), or (PT dual) and (3.4a), depending on whether a special equality constraint treatment is needed or not. Optimality package comprises (3.13).
Analogously to Lemma 2.4, we formulate the following statements:
Lemma 3.14
For any set \(Y^L\) of lowerlevel feasible integer variable configurations and corresponding sets \(Y^U(y^{l,k})\), the master problem (MP) is a relaxation of the original bilevel problem (BLP).
If \(Y^L\) comprises a complete set of lowerlevel feasible integer variable configurations while for the corresponding sets \(Y^U(y^{l,k})\) the inclusion \(\left( Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) {\setminus } U \right) \cap \mathbb {Z}_+^{m_Z} \subseteq Y^U(y^{l,k})\) holds, the master problem (MP) is equivalent to the original bilevel problem (BLP).
Proof
Lemma 2.4 shows the first statement for the master problem which incorporates
As Lemma 3.8 and the first part of Remark 3.5 indicate, exact projection test imposes \(\left[ f(x^u,y^u,x^l,y^l)\ge \theta \left( x^u, y^u, y^{l,k}\right) \right] \) for a subset of \(Proj _{\left( y^u \right) } P \left( y^{l,k} \right) \). Therefore, for any set \(Y^L\) of lowerlevel feasible integer variable configurations and corresponding sets \(Y^U(y^{l,k})\), our master problem (MP) with exact projection test is a relaxation of the master problem from Lemma 2.4, and as such a relaxation of (BLP).
The second statement of this lemma is inferred from the second part of Lemma 2.4 and the second part of Remark 3.5. Note that not necessarily all \(\bar{y}^{u}\in \left( Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) {\setminus } U \right) \cap \mathbb {Z}_+^{m_Z}\) for a complete set of lowerlevel feasible integer variable configurations \(y^{l,k}\) have to be enumerated in order to construct a singlelevel reformulation of the (BLP). \(\square \)
Even less nogood constraints may be needed to find a bilevel optimal solution of (BLP), as Algorithm 2 needs to add nogood constraints only for \(\bar{y}^{u}\) which form part of an optimal solution of (MP) in some iteration.
To show correctness and finite termination of Algorithm 2 we need the following pendant to Lemma 2.6:
Lemma 3.15
As long as the master problem (MP) remains feasible, Algorithm 2 generates some \(y^{l,k}\) and \(y^{u,k}\) to be added to \(Y^L_k\) and \(Y^U\left( y^{l,k}\right) \), respectively, at the end of each iteration \(k\) of the whileloop. If thus generated \(y^{l,k}\) is already contained in \(Y^L_k\) and \(y^{u,k}\) is already contained in \(Y^U\left( y^{l,k}\right) \), i.e., the pair \(\left( y^{u,k}, y^{l,k}\right) \) has been generated in some previous iteration, Algorithm 2 terminates with a bilevel optimal solution not later than in iteration \(k+1\).
Proof
Let \(\left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}, x_{k}^{\text {l},*}, y_{k}^{\text {l},*} \right) \) be the solution of (MP) in iteration \(k\). As the follower optimality subproblem (FO) is always feasible given HPRfeasible \(\left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*} \right) \), a \(y^{u,k}= y_{k}^{\text {u},*}\) and \(y^{l,k}\) are always generated in each iteration as long as the master problem (MP) remains feasible.
Now we prove the second part of the Lemma. If the bilevel feasibility subproblem (BF) is infeasible in iteration \(k\), then \(y^{l,k}= \hat{y_{}^{}}^\text {l}_k\) is part of the optimal solution of (FO) computed before, and consequently \( \theta \left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}\right) = \theta \left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}, y^{l,k}\right) \) holds. If (BF) is feasible in iteration \(k\), then \(y^{l,k}= \tilde{y_{}^{}}^\text {l}_k\) is part of its optimal solution and as such also satisfies \(\theta \left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}\right) =\theta \left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}, y^{l,k}\right) \).
If thus generated \(y^{l,k}\) is already contained in \(Y^L_k\) and \(y^{u,k}\) is already contained in \(Y^U\left( y^{l,k}\right) \), then the corresponding projection test together with implication and optimality package for \(y^{l,k}\) are also already present in the master problem (MP) in iteration \(k\). In particular, the nogood constraint for \(\left( y^{u,k}, y^{l,k}\right) \) as a part of projection test implies optimality package for \(y^{l,k}\), i.e., the constraint \( f(x_{k}^{\text {u},*}, y_{k}^{\text {u},*}, x_{k}^{\text {l},*}, y_{k}^{\text {l},*}) \ge \theta \left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}, y^{l,k}\right) \) is active in (MP) in iteration \(k\). As \(\theta \left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}\right) = \theta \left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}, y^{l,k}\right) \), for the upperlevel decision variables \(\left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*} \right) \) this optimality package corresponding to \(y^{l,k}\) is exactly the (optimalvaluefunction constraint). Therefore \(\left( x_{k}^{\text {u},*}, y_{k}^{\text {u},*}, x_{k}^{\text {l},*}, y_{k}^{\text {l},*} \right) \) is a bilevel feasible solution. Since according to the first part of Lemma 3.14 it is also an optimal solution of a relaxation of the original bilevel problem (BLP) and thus it is a bilevel optimal solution of (BLP). \(\square \)
Theorem 3.16
Algorithm 2 with projection test as described in Sect. 3.1 and implications and optimality packages as described in Sect. 3.2 either finds a bilevel optimal solution or shows infeasibility of the original problem (BLP) in finitely many iterations.
Proof
From Lemma 3.15 we can see that Algorithm 2 has three possible outcomes in each iteration of the whileloop:

Master problem (MP) is infeasible, which by Corollary 2.5 means the infeasibility of the original bilevel problem (BLP).

Some \(y^{l,k}\) and \(y^{u,k}\) generated in iteration \(k\) are already in \(Y^L_k\) and \(Y^U\left( y^{l,k}\right) \), respectively, which by the last statement of Lemma 3.15 leads to termination of the algorithm with a bilevel optimal solution.

An \(y^{l,k}\) generated in iteration \(k\) is not yet in \(Y^L_k\), or \(y^{u,k}\notin Y^U\left( y^{l,k}\right) \).
This means that after each iteration the algorithm either terminates or generates a pair \(\left( y^{u,k}, y^{l,k}\right) \) that has not been encountered before. Unless the algorithm terminates earlier according to the first two cases listed above, it enumerates all lowerlevel feasible integer configurations \(y^{l,k}\) and, according to Remark 3.5 some of the upperlevel feasible integer configurations \(y^{u,k}\), thus constructing a master problem (MP) which is equivalent to the original bilevel problem (BLP) due to to Part 2 of Lemma 3.14. As the number of lowerlevel feasible integer configurations \(y^{l,k}\) as well as the number of HPRfeasible integer configurations \(y^{u,k}\) is finite by Assumption 1, Algorithm 2 terminates after finitely many iterations. \(\square \)
Corollary 3.17
An optimum is attained for a feasible bilevel problem satisfying Assumptions 1–4.
Suitable realizations of projection test, implication and optimality package that are necessary to confirm this result for nonlinear follower problems will be given in Sect. 5.
Altogether, projection test, implication and optimality package as described in this section add to the master problem 1 binary and \(2s\) continuous variables as well as \(2n_R +4\) constraints per iteration. Note that this calculation is based on bilevel problems with only inequality constraints on the lower level. In case the lower level has some equality constraints, the growth of the master problem reduces further with application of projection test as described in Sect. 3.1.1. Note that via elimination of \(x^{l,k}\) as suggested in this paper, the number of additionally introduced nonlinearities is reduced compared to the realization from [60]. Also, in the current paper \(2n_R\) continuous variables and \(4n_R + 4s 2\) less constraints are added to the master problem per iteration of the algorithm.
However, the number of iterations can be higher due to possibly enumerating a lot of or, in the worst case, all integer points of some sets \(Proj _{\left( y^u \right) } \left( P_{\mathrm{lin}} \left( y^{l,k} \right) \right) {\setminus } U^\text {I}\). If such enumeration occurs, only one constraint is added per additional iteration of the algorithm, namely a nogood constraint for the encountered upperlevel integer configuration. Thus, unless their number is very high, the impact of these additional iterations on the size and complexity of the master problem is moderate compared to projection test, implication and optimality package routines for each newly encountered lowerlevel integer configuration.
4 Computational results
Given a suitable MINLP solver to be used for the master problem, our implementation is able to handle a MINLP on the upper level and a follower problem which is linear in continuous lowerlevel variables. To the best knowledge of the authors, no library of bilevel instances exists where on both levels discrete variables and nonlinearities such as products of upper and lowerlevel variables are present. Therefore, instances of the desired class were created based on the first 10 MIPS instances from [46].
The original instances miblp_20_15_50_0110_10_1 to miblp_20_15_50_0110_10_10 are allinteger with 5 upperlevel variables and 10 lowerlevel variables each. There are 20 lowerlevel and no upperlevel constraints in each instance, all constraints as well as both objective functions are linear. Notice that as the main difficulty in solving bilevel problems comes from the structure of the lower level, i.e., partial construction of its optimalvalue function, absence of upperlevel constraints does not impede representativity of the computations. However, as the ability of the proposed algorithm to detect infeasible instances has to be tested as well, some instances with upperlevel constraints were constructed too.
We also included a nonlinear toy example (Example A.2) in the appendix with constraints and integer variables on both levels, where the behavior of our implementation can be clearly retraced and the solution found can easily be verified.
Computational results given in this section comprise altogether 120 instances derived from [46]. In order to obtain mixedinteger nonlinear bilevel problems, each of the original 10 instances was modified as follows:

adding \(m_R \in \{5,10,20\}\) continuous variables to the upper level and redeclaring every second lowerlevel integer variable as a continuous variable,

adding \(m_R \in \{5,10,20\}\) continuous variables to the upper level and redeclaring every fourth lowerlevel integer variable as a continuous variable,

redeclaring every second or, respectively, fourth upperlevel integer variable as a continuous variable and redeclaring every second lowerlevel integer variable as a continuous variable,
as well as adding a bilinear term to both lower and upperlevel objective functions. Thus, 80 new instances were created, where every modification produced a distinct combination of numbers of upper and lowerlevel integer and continuous variables. In order to keep as close to the original bilevel library instances from [46] as possible, no nonlinearities apart from the abovementioned terms in the objective functions of both upper and lower level were added. If an existing integer variable was redeclared as continuous, it retains all its bounds as well as coefficients in all constraints and objective functions. An exception is made if an upperlevel integer variable is redeclared as a continuous one, in which case its coefficients in lowerlevel constraints are set to 0 in order to comply with Assumption 4.
The lower bound for added continuous upperlevel variables is set to 0, and their upper bound is set to the maximum of the upper bounds of upperlevel integer variables. No constraint coefficients need to be produced while adding continuous upperlevel variables, since there are no upperlevel constraints in the original instances, and lowerlevel constraints should not contain upperlevel continuous variables due to Assumption 4.
Coefficients for the added continuous upperlevel variables in the linear part of the objective functions on both levels are generated by rearranging the corresponding objective function coefficients of the existing discrete upperlevel variables. We describe only the construction of the continuous part of the upperlevel objective function, as the procedure for the continuous part of the lowerlevel objective function is exactly the same:
Note that the number of added upperlevel continuous variables is always a positive integer multiple of \(m_Z = 5\), the number of upperlevel integer variables.
Regardless of the way original instances have been made mixedinteger, the following bilinear term is added to the upperlevel objective function of each instance:
with B a matrix comprised of an identity matrix \(I_{\min \{m_R,n_R\}}\) extended with 0entries to fit the required dimensions, and \(ub(x^u)\) the largest upper bound of all \(x^u\) variables. The lowerlevel objective function of each instance receives the above bilinear term with a minus sign.
As none of the instances described so far proved to be infeasible, bilevel optimization problems with upperlevel constraints were constructed, again based on instances miblp_20_15_50_0110_10_1 to miblp_20_15_50_0110_10_10 from [46]. For each of the original 10 instances, three instances were produced by shifting all but every second, fifth or tenth lowerlevel constraint to the upper level. No other modifications were done to obtain these 30 new instances, which therefore remain allinteger and linear on both levels.
We used Gurobi 9.0 [26] as MINLP solver, Pyomo 5.6.9 for modeling and CPython 3.7.6 for the implementation of Algorithm 2. Notice that Gurobi allows only products of variables as nonlinearities. To extend Algorithm 2 to more general nonconvex MINLPs, other global nonlinear solvers such as, e.g., SCIP [24] or BARON [47] can be used.
Computations were performed on Xeon E31240 v6 CPUs (4 cores, HT disabled, 3.7 GHz base frequency) with 32 GB RAM. Runtimes are stated excluding instance loading from MPS and AUX files as well as their modification, but including bigM calculations. The time limit for each instance is 2 h.
The duality gap measures the maximum relative deviation from optimality of the best feasible solution found. As proposed in [34], we have calculated the gap for a given lower bound LB and upper bound UB by the following formula:
The original 10 instances from [46] were solved to optimality, and key characteristics of the runs are listed in 1 separately for each instance.
From the 80 MINLP instances all but 5 were solved to optimality, while the remaining 5 instances had a relative optimality gap under 0.35%. Consolidated statistics of the runs on all 80 MINLP instances are given in 2 with the corresponding minimum, maximum, arithmetic mean and standard deviation for the runtime. Detailed data for each individual instance is listed separately in 6 in Appendix A.3.
From the 30 instances with upperlevel constraints 14 were solved to optimality and 12 were proven infeasible. For 3 of the 4 remaining instances no decision on the feasibility could be made, while the last instance was proven feasible but not solved to optimality within the time limit. Consolidated statistics of the runs on these 30 instances are given in 3, and the full data for each instance can be found in 7 in Appendix A.3.
Each of altogether 120 instances is uniquely identified by the columns instance number, \(m_Z\), \(m_R\), \(n_Z\), \(n_R\), \(r\) and \(s\) of the detailed computational result Tables 1, 6 and 7. The instance number refers to the number of the original instances miblp_20_15_50_0110_10_1 to miblp_20_15_50_0110_10_10 from [46], while \(m_Z\), \(m_R\), \(n_Z\), \(n_R\), \(r\) and \(s\) denote the number of upperlevel discrete, upperlevel continuous, lowerlevel discrete and lowerlevel continuous variables as well as upper and lowerlevel constraints, respectively.
From altogether 120 instances, thereof 10 original library ILP, 80 MINLP and 30 ILP with upperlevel constraints, only 9 were neither solved to optimality nor proven infeasible within the time limit. In the case of these 9 instances we can observe two possible causes of the algorithm not terminating within the time limit.
First, the number of nogood cuts can be too large, which raises the iteration count to several thousands. This behavior in particular is present in all 5 MINLP instances which were not solved to optimality and the penultimate ILP instance with upperlevel constraints. The exact number of nogood cuts for each instance can be inferred by subtracting the number of optimality packages from the total number of iterations listed in Appendix A.3. However, for all 5 instances the algorithm found a solution with a gap less than or equal to 0.35%, which can be considered an acceptable result for such a challenging problem type.
Second, the number of optimality packages can cause the master problem (MP) to become too hard and thus demand too much time to solve. This is the case with the remaining 3 instances, which are all linear on both levels and with upperlevel constraints. Two of these instances acquired more than 30 optimality packages, and in one case the last completed master problem solve took over an hour. See Appendix A.3 for details on each instance.
Mean run time of the algorithm for all 120 instances is 10 min 20 s, median run time is less than 10 s. So more than 90% of the test instances were either solved to optimality or proven infeasible by Algorithm 2 within 2 h. Over 95% of the instances were either solved up to a relative optimality gap under 1% or proven infeasible before hitting the time limit of 2 h.
5 Algorithm extension for nonlinear follower
For the presentation of the results so far we have assumed the follower problem to be linear for all \(x^u\), \(y^u\). In this section, we describe how to extend Algorithm 2 to the more general setting described by Assumptions 1–4. The pseudocode description of Algorithm 2 on p. 24 stays exactly the same, but we have to generalize projection test, implication and optimality package for the nonlinear setting. In order to do this, we will use the Wolfe dual for replacing linear programming duality, which also provides us with the required strong duality statements.
5.1 Exact projection test for nonlinear follower
The general concept behind our exact projection test as described in Sect. 3.1 remains the same, and the adaptation of (PT) to the nonlinear setting is straightforward:
Note, however, that this is now a convex continuous problem—by Assumption 2—instead of a linear program. Its Wolfe dual is given as follows:
where the partial derivative \(\nabla _{x^l} g\) exists by Assumption 2. Note that the Lagrangian multipliers for the nonnegativity conditions for \(x^{l,k}\) have been eliminated from the formulation already, as well as the slack variable \( t^{k}\).
Eliminating all primal variables from the Wolfe dual is unfortunately not possible in general, but only for special cases, e.g., problems with linear constraints and strictly convex quadratic objective [41, Example 12.12]. It requires regularity of the partial derivative of the Lagrangian w.r.t. the primal variables, thus enabling usage of the implicit function theorem in order to consider \(x^{l,k}\) as a function of \(\mu \). Thus, in contrast to (PT dual) for the case of a linear follower problem discussed in Sect. 3.1, primal variable copies are in general needed for (PT Wdual).
The statement of Lemma 3.2 holds also for the nonlinear version of projection test (PT) since the arguments in its proof do not depend on the linearity of the lowerlevel constraints. In particular, (PT) is feasible for any fixed \(y^{l,k}\), \(y^u\), and \(x^{l,k}\). Additionally, Assumption 1 guarantees that its optimum is finite and attained by some \((x^{l,k}_0, t^{k}_0)\). Furthermore, (PT) satisfies Slater’s condition in case of feasibility due to Assumption 2. Therefore there exists \(\mu _0 \in \mathbb {R}_+^s\) such that \((x_0^{l,k},\mu _0)\) is optimal for (PT Wdual), i.e., we have strong duality; cf. [56]. Thus we obtain Lemma 3.3 for an accordingly adapted version of (3.4a). Hence, as nogood cuts handling Case 2 from Lemma 3.2 are independent of the lowerlevel problem class, Remark 3.5 holds for the nonlinear follower problem when (PT Wdual) and the corresponding analogon of (3.4a) are employed.
Note that using the Wolfe dual for solving bilevel problems with nonlinear lower level is not limited to our particular algorithm framework. The Wolfe dual can be employed to obtain singlelevel reformulations of bilevel optimization problems if, e.g., all variables have finite bounds on their respective levels, and the lower level is convex and satisfies Slater’s condition. So far the majority of the singlelevel reformulations of bilevel problems in the literature rely on optimality conditions of the lower level expressed either with strong duality in the linear case or the full KKT system in the nonlinear case. In contrast, it seems that using the Wolfe dual in bilevel optimization has been explored very little.
5.2 Projection test with equalities for nonlinear follower
We can implement special handling of equations as discussed in Sect. 3.1.1 also for the more general case. However, the situation is slightly more complicated.
Let \( g\) and \( h\) denote the functions defining the lowerlevel inequalities and equations, respectively. Note that by convexity due to Assumption 2, any equations must be linear in the continuous follower variables, so we can write \( h(y^u,x^l,y^l)\) as \( h(y^u,x^l,y^l)= h_R(y^u, y^l)x^l+ h_c(y^u, y^l)\) for coefficient functions \( h_R\) and \( h_c\).
We again modify the primal problem (PT) by applying slack variables only to the inequality constraints:
The Wolfe dual of this problem is given by
We again face the problem that (PTEq) could—in contrast to (PT)—be infeasible. We have to show that also in this case, (PTEq Wdual) is feasible and admits a solution with objective value \(\le 0\). This would ensure that the strong duality constraint of the form of (3.4b) does not impose any restriction to the master problem if (PTEq) is infeasible. In fact, we will be able to show that (PTEq Wdual) is unbounded in that case, matching Corollary 3.7.
Lemma 5.1
If (PTEq) is infeasible, (PTEq Wdual) is unbounded.
Proof
We can show that (PTEq Wdual) is feasible in a way that is completely analogous to the proof of Lemma 3.6, obtaining a feasible solution of the form \(\left( \bar{x}^{l,k}, \bar{\mu }, \lambda = 0 \right) \). However, this does not directly imply the statement since the Wolfe dual can in general have a finite optimum even if the primal is infeasible [56].
To complete the proof, we will find an unbounded ray of (PTEq Wdual). Consider a version of the primal problem without the inequalities \( g(y^u, x^{l,k}, y^{l,k}) + \left( t^{k}\right) _s\le 0\). It is a linear problem and still infeasible if (PTEq) is. Due to the results in Sect. 3.1.1, its dual
is unbounded. Let \(\bar{\lambda }\) be an unbounded ray, i.e., \( ({\bar{\lambda }})^\top h_c(y^u, y^l)< 0\) and \( h_R(y^u, y^l)^\top \bar{\lambda }\ge 0\). Then \((x^{l,k}= 0, \mu = 0,\bar{\lambda })\) is an unbounded ray for (PTEq Wdual). In combination with the feasible solution \(\left( \bar{x}^{l,k}, \bar{\mu }, 0 \right) \) for (PTEq Wdual), we can construct feasible points \((\bar{x} ^{l,k},\bar{\mu },\alpha \bar{\lambda })\) with arbitrarily small objective value as \(\alpha \rightarrow \infty \). \(\square \)
Therefore, we also have the result of Lemma 3.8 for the nonlinear case. Note that we may assume \(\bar{x}^{l,k}=0\) in the above proof, which will allow us to reuse the primal variable copies \(x^{l,k}\) in the optimality package for \(y^{l,k}\).
Choosing \(\text {M}_{\mu }\) works similar to the situation in Sect. 3. For example, we can choose \(\text {M}_{\mu }\) to be equal to the optimal objective function value of the dual of (PTEq) with inequalities only, combined with the HPR constraint set and variables of (BLP). It still holds that this problem always has a finite optimum and admits a feasible solution for (PTEq dual) with \(\lambda = 0\).
5.3 Implications and optimality packages for nonlinear follower
Assumption 2 ensures that we can still express optimality of the follower’s continuous decisions via strong duality. As before, we denote the lowerlevel dual variables in iteration \(k\) by \(\pi ^{k}\). optimality package consists of primal and dual feasibility constraints for the lower level as well as the strong duality equation for fixed integer variables \(y^{l,k}\):
Note that we consider the gradient \(\nabla _{x^l} f\) to be a row vector. This is for reasons of consistency with the Jacobian matrix \(\nabla _{x^l} g\), which has gradients as its rows. Recall that our formulation needs to implement the following implications:
In order to do this we again use a bigM formulation, adding (or, respectively, subtracting) a term \((1  \psi ^k) \text {M}_{\pi }\) to (from) each inequality in (5.1).
Just as described in Sect. 3.2, a sufficiently large \(\text {M}_{\pi }\) can then be found by solving auxiliary problems for the maximal constraint violations under the given bounds for \(x^u, y^u, x^l, y^l\), while the packagespecific variable copies \(x^{l,k}\) and \(\pi ^{k}\) are fixed to 0. This way, bounds for the dual variables of the lower level (which in general are not available) are not required for realizing the desired implications.
With similar arguments as in Lemma 3.12, the optimality package can be simplified to
explicitly enforcing only dual feasibility and exploiting strong duality. Recall that we were able to eliminate all primal variables from (3.11), which, unfortunately, is not possible in general for (5.3). This, however, does not impede the reduction step to (5.3). One should just be aware that \(x^{l,k}\) are not necessarily optimal solutions of the primal follower problem with fixed \(y^l= y^{l,k}\). Note that we can use the same primal variable copies \(x^{l,k}\) as in (PT Wdual). This is because setting \(x^{l,k}\) to any optimal follower response given fixed \(y^{l,k}\) and \(y^u\) will work for both (PT Wdual) and (5.3) at the same time if \(y^u\in Proj _{\left( y^u \right) } P_{\mathrm{lin}} \left( y^{l,k} \right) \), i.e., if the optimality package for \(y^{l,k}\) is supposed to be active. Otherwise, \(x^{l,k}=0\) is always possible in both problems without imposing any relevant implications.
Remark 5.2
Note that the KKTbased tightening that was proposed in [38] and also used in [61] can be incorporated into the algorithm presented in the current paper too. Indeed, any suitable necessary optimality conditions for the lowerlevel problem can be added to the master problem (MP) in order to obtain a tighter relaxation of the original bilevel problem.
6 Conclusions
In this work, we proposed an exact algorithm for solving problems from the challenging class of mixedinteger nonlinear bilevel optimization problems where both leader and follower integer variables are present on both levels. Our method is based on recent work of Yue, Gao, Zeng and You [60], following the same projectionbased scheme described in Algorithm 1. We turned it into an exact method under an additional Assumption 4, which bans continuous upperlevel variables from lowerlevel constraints. In conjunction with the other assumptions it therefore guarantees that a bilevel optimum is attained if the problem is feasible in the first place—a fact for which our algorithm also contributes a constructive proof. Assumption 4 is relatively mild compared to other assumptions with the abovementioned effect that are commonly made in the literature. The key enhancement of our algorithm is to separately realize implication (3.2) for an open subset U of the relevant projections, chosen to be as large as possible, and the remaining boundary cases—as outlined on p. 14.
Furthermore, we extend the algorithm from [60] from a purely linear setting to a more general bilevel problem class allowing nonlinearities. The limiting requirements are given by Assumptions 2–3, which essentially ensure that optimality of the continuous follower decisions can be expressed via strong duality, and that the HPR with these optimality conditions can be handled by an offtheshelf solver. The nonlinear version of our method as described in Sect. 5 may be particularly attractive if primal variable copies can be eliminated from the Wolfe dual, though this is not a strict requirement for using it.
Proofofconcept computational results have been presented for the case in which the lower level is linear in the follower variables, but products of lower and upperlevel variables may be present in the objective functions of both levels. Therefore our implementation covers a problem class for which currently no solver exists to the best of the authors’ knowledge. Our method is able to solve many bilevel library instances that have been modified to also include continuous variables and nonlinear terms on both levels (unfortunately, no established library exists for this problem class yet). Still, there are clear limitations in terms of instance size, which is not surprising given the extremely challenging problem class. We think that the experiments are quite encouraging, especially considering the number of optimality packages that still allow the master problem to be solved. It is important to note that our framework relies on established MINLP solvers, so any performance improvements in the underlying MINLP solver will also automatically further benefit our approach. However, the master problem growing due to additional optimality packages is not the only limiting factor. We also observed instances for which optimum solutions of the master problem regularly ended up being boundary cases, i.e., in the relevant projection but not in U. Hence, they required a large number of nogood cuts, showing that our modification for exactness in general comes at a price. Moreover, this observation highlights the importance of avoiding such situations and consequently of our equationhandling modifications from Sects. 3.1.1 and 5.2 for solving problems with equations on the lower level.
In future work, further performance gains might be achieved using cutting planes specifically designed for the structure of the master problem as it evolves during the solution process and/or by warmstarting. Perturbation of the follower problem could increase the chance for the master problem solution being in U. However, problemspecific knowledge will be necessary in order not to run into the very same problem that is illustrated in Example 2.9. In preliminary computations, we have observed that having good bounds for the dual variables can help the solver immensely. While such bounds cannot be given in general, dual variables often have a nice interpretation in applications, which might allow for a suitable estimation.
Data availability
The instances used for the computational study in this article were obtained by modifying publicly available library instances from [46]. All modifications are deterministic and have been described in detail in the beginning of Sect. 4.
References
Baes, M., Oertel, T., Weismantel, R.: Duality for mixedinteger convex minimization. Math. Program. 158(1–2), 547–564 (2016). https://doi.org/10.1007/s101070150917y
Bard, J.F.: Practical bilevel optimization. In: Nonconvex Optimization and its Applications, vol. 30. Kluwer Academic Publishers, Dordrecht (1998). https://doi.org/10.1007/9781475728361. Algorithms and applications
Bärmann, A., Liers, F., Martin, A., Merkert, M., Thurner, C., Weninger, D.: Solving network design problems via iterative aggregation. Math. Program. Comput. 7(2), 189–217 (2015). https://doi.org/10.1007/s1253201500791
Beale, E.M.L., Tomlin, J.A.: Special facilities in a general mathematical programming system for nonconvex problems using ordered sets of variables. OR 69(447–454), 99 (1970)
Belotti, P., Bonami, P., Fischetti, M., Lodi, A., Monaci, M., NogalesGómez, A., Salvagnin, D.: On handling indicator constraints in mixed integer programming. Comput. Optim. Appl. 65(3), 545–566 (2016). https://doi.org/10.1007/s1058901698478
BenAyed, O., Blair, C.E., Boyce, D.E., LeBlanc, L.J.: Construction of a realworld bilevel linear programming model of the highway network design problem. Ann. Oper. Res. 34(1), 219–254 (1992)
Bonami, P., Lodi, A., Tramontani, A., Wiese, S.: On mathematical programming with indicator constraints. Math. Program. 151(1), 191–223 (2015). https://doi.org/10.1007/s1010701508914
Chvátal, V.: Linear Programming. W.H. Freeman, SanFrancisco (1983)
D’Ambrosio, C., Frangioni, A., Liberti, L., Lodi, A.: On intervalsubgradient and nogood cuts. Oper. Res. Lett. 38(5), 341–345 (2010). https://doi.org/10.1016/j.orl.2010.05.010
Della Croce, F., Scatamacchia, R.: An exact approach for the bilevel knapsack problem with interdiction constraints and extensions. Math. Program. 183(1), 249–281 (2020). https://doi.org/10.1007/s10107020014825
Dempe, S.: Foundations of Bilevel Programming. Springer, Berlin (2002). https://doi.org/10.1007/b101970
Dempe, S.: Bilevel optimization: theory, algorithms, applications and a bibliography. In: Dempe, S., Zemkoho, A. (eds.) Bilevel Optimization, Springer Optimization and Its Applications, pp. 581–672. Springer, Berlin (2020). https://doi.org/10.1007/9783030521196
Dempe, S., Dutta, J.: Is bilevel programming a special case of a mathematical program with complementarity constraints? Math. Program. 131, 37–48 (2012). https://doi.org/10.1007/s1010701003421
Dempe, S., Kalashnikov, V., PrezValds, G.A., Kalashnykova, N.: Bilevel Programming Problems: Theory, Algorithms and Applications to Energy Networks. Springer, Berlin (2015)
Dempe, S., Kue, F.M.: Solving discrete linear bilevel optimization problems using the optimal value reformulation. J. Glob. Optim. 68(2), 255–277 (2017)
DeNegre, S.T., Ralphs, T.K.: A branchandcut algorithm for integer bilevel linear programs. In: Operations Research and CyberInfrastructure, pp. 65–78. Springer, Berlin (2009). https://doi.org/10.1007/9780387888439_4
Djelassi, H., Glass, M., Mitsos, A.: Discretizationbased algorithms for generalized semiinfinite and bilevel programs with coupling equality constraints. J. Glob. Optim. 75(2), 341–392 (2019). https://doi.org/10.1007/s10898019007643
Faísca, N.P., Dua, V., Rustem, B., Saraiva, P.M., Pistikopoulos, E.N.: Parametric global optimisation for bilevel programming. J. Glob. Optim. 38(4), 609–623 (2007). https://doi.org/10.1007/s1089800691006
Faísca, N.P., Saraiva, P.M., Rustem, B., Pistikopoulos, E.N.: A multiparametric programming approach for multilevel hierarchical and decentralised optimisation problems. Comput. Manag. Sci. 6(4), 377–397 (2009). https://doi.org/10.1007/s102870070062z
Fischetti, M., Ljubić, I., Monaci, M., Sinnl, M.: Bilevel (solver for mixedinteger bilevel linear problems) (2016). https://msinnl.github.io/pages/bilevel.html
Fischetti, M., Ljubić, I., Monaci, M., Sinnl, M.: Intersection cuts for bilevel optimization. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 77–88. Springer, Berlin (2016). https://doi.org/10.1007/97833193346157
Fischetti, M., Ljubić, I., Monaci, M., Sinnl, M.: A new generalpurpose algorithm for mixedinteger bilevel linear programs. Oper. Res. 65(6), 1615–1637 (2017). https://doi.org/10.1287/opre.2017.1650
Fischetti, M., Ljubić, I., Monaci, M., Sinnl, M.: On the use of intersection cuts for bilevel optimization. Math. Program. 172(1–2), 77–103 (2018). https://doi.org/10.1007/s1010701711895
Gamrath, G., Anderson, D., Bestuzheva, K., Chen, W.K., Eifler, L., Gasse, M., Gemander, P., Gleixner, A., Gottwald, L., Halbig, K., Hendel, G., Hojny, C., Koch, T., Bodic, P.L., Maher, S.J., Matter, F., Miltenberger, M., Mühmer, E., Müller, B., Pfetsch, M., Schlösser, F., Serrano, F., Shinano, Y., Tawfik, C., Vigerske, S., Wegscheider, F., Weninger, D., Witzig, J.: The scip optimization suite 7.0. Tech. Rep. 2010, ZIB, Takustr. 7, 14195 Berlin (2020). http://www.optimizationonline.org/DB_HTML/2020/03/7705.html
Grimm, V., Orlinskaya, G., Schewe, L., Schmidt, M., Zöttl, G.: Optimal design of retailerprosumer electricity tariffs using bilevel optimization. Omega 102327 (2020). https://doi.org/10.1016/j.omega.2020.102327
Gurobi Optimization, L.: Gurobi optimizer reference manual (2020). http://www.gurobi.com
Hooker, J., Ottosson, G.: Logicbased Benders decomposition. Math. Program. 96(1), 33–60 (2003). https://doi.org/10.1007/s1010700303759
Huppmann, D., Siddiqui, S.: An exact solution method for binary equilibrium problems with compensation and the power market uplift problem. Eur. J. Oper. Res. 266(2), 622–638 (2018)
Kleinert, T., Labbé, M., Ljubić, I., Schmidt, M.: A survey on mixedinteger programming techniques in bilevel optimization. Optimization Online (2021). http://www.optimizationonline.org/DB_HTML/2021/01/8187.html
Kleinert, T., Schmidt, M., Plein, F., Labbé, M.: There’s no free lunch: on the hardness of choosing a correct BigM in bilevel optimization. Oper. Res. 68, 1625–1931 (2020). https://doi.org/10.1287/opre.2019.1944
Kleniati, P.M., Adjiman, C.S.: Branchandsandwich: a deterministic global optimization algorithm for optimistic bilevel programming problems. Part I: theoretical development. J. Glob. Optim. 60(3), 425–458 (2014). https://doi.org/10.1007/s1089801301217
Kleniati, P.M., Adjiman, C.S.: Branchandsandwich: a deterministic global optimization algorithm for optimistic bilevel programming problems. Part II: convergence analysis and numerical results. J. Glob. Optim. 60(3), 459–481 (2014). https://doi.org/10.1007/s1089801301208
Kleniati, P.M., Adjiman, C.S.: A generalization of the branchandsandwich algorithm: from continuous to mixedinteger nonlinear bilevel problems. Comput. Chem. Eng. 72, 373–386 (2015)
Koch, T., Achterberg, T., Andersen, E., Bastert, O., Berthold, T., Bixby, R.E., Danna, E., Gamrath, G., Gleixner, A.M., Heinz, S., Lodi, A., Mittelmann, H., Ralphs, T., Salvagnin, D., Steffy, D.E., Wolter, K.: MIPLIB 2010. Math. Program. Comput. 3(2), 103–163 (2011). https://doi.org/10.1007/s1253201100259
Köppe, M., Queyranne, M., Ryan, C.T.: Parametric integer programming algorithm for bilevel mixed integer programs. J. Optim. Theory Appl. 146(1), 137–150 (2010). https://doi.org/10.1007/s1095701096683
Labbé, M., Marcotte, P., Savard, G.: A bilevel model of taxation and its application to optimal highway pricing. Manag. Sci. 44(12–part–1), 1608–1622 (1998)
Labbé, M., Violin, A.: Bilevel programming and price setting problems. 4OR 11(1), 1–30 (2013)
Mitsos, A.: Global solution of nonlinear mixedinteger bilevel programs. J. Glob. Optim. 47(4), 557–582 (2010). https://doi.org/10.1007/s108980099479y
Mitsos, A., Chachuat, B., Barton, P.I.: Towards global bilevel dynamic optimization. J. Glob. Optim. 45(1), 63 (2009). https://doi.org/10.1007/s1089800893956
Mitsos, A., Lemonidis, P., Barton, P.I.: Global solution of bilevel programs with a nonconvex inner program. J. Glob. Optim. 42(4), 475–513 (2010). https://doi.org/10.1007/s108980079260z
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006). https://doi.org/10.1007/9780387400655
Paulaviius, R., Adjiman, C.S.: New bounding schemes and algorithmic options for the branchandsandwich algorithm. J. Glob. Optim. 77(2), 197–225 (2020). https://doi.org/10.1007/s10898020008743
Paulaviius, R., Gao, J., Kleniati, P.M., Adjiman, C.S.: BASBL: branchandsandwich bilevel solver. Implementation and computational study with the BASBLib test set. Comput. Chem. Eng. 132, 1–23 (2019). https://doi.org/10.1016/j.compchemeng.2019.106609
Pineda, S., Morales, J.M.: Solving linear bilevel problems using bigMs: not all that glitters is gold. IEEE Trans. Power Syst. (2019)
Ralphs, T.K.: MibS (mixed integer bilevel solver) (2015). https://github.com/tkralphs/MibS
Ralphs, T.K., Adams, E.: Bilevel optimization problem library (2016). https://coral.ise.lehigh.edu/datasets/bilevelinstances/
Sahinidis, N.V.: BARON user manual v.2020.10.16 (2020). https://www.minlp.com/downloads/docs/baron%20manual.pdf
Stackelberg, H.V.: Theory of the Market Economy. Oxford University Press, Oxford (1952)
Tahernejad, S., Ralphs, T.K., DeNegre, S.T.: A branchandcut algorithm for mixed integer bilevel linear optimization problems and its implementation. Math. Program. Comput. 12(4), 529–568 (2020). https://doi.org/10.1007/s12532020001836
Tsoukalas, A., Rustem, B., Pistikopoulos, E.N.: A global optimization algorithm for generalized semiinfinite, continuous minimax with coupled constraints and bilevel problems. J. Glob. Optim. 44(2), 235–250 (2009). https://doi.org/10.1007/s108980089321y
Vicente, L., Savard, G., Judice, J.: Discrete linear bilevel programming problem. J. Optim. Theory Appl. 89(3), 597–614 (1996). https://doi.org/10.1007/BF02275351
Weninger, D.: Solving mixedinteger programs arising in production planning. Ph.D. thesis, FriedrichAlexanderUniversität ErlangenNürnberg (2016). https://opus4.kobv.de/opus4fau/frontdoor/index/index/docId/8226
Wiesemann, W., Tsoukalas, A., Kleniati, P.M., Rustem, B.: Pessimistic bilevel optimization. SIAM J. Optim. 23(1), 353–380 (2013). https://doi.org/10.1137/120864015
Williams, H.P.: The dependency diagram of a mixed integer linear programme. J. Oper. Res. Soc. 68(7), 829–833 (2017). https://doi.org/10.1057/jors.2016.45
Williams, H.P., Hooker, J.: Integer programming as projection. Discret. Optim. 22, 291–311 (2016)
Wolfe, P.: A duality theorem for nonlinear programming. Q Appl. Math. 19(3), 239–244 (1961)
Wolsey, L.A., Nemhauser, G.L.: Integer and Combinatorial Optimization. John Wiley & Sons, London (1999). https://doi.org/10.1002/9781118627372
Wood, R.K.: Bilevel network interdiction models: formulations and solutions. Network (2011). https://doi.org/10.1002/9780470400531.eorms0932
Yanıkoğlu, İ, Kuhn, D.: Decision rule bounds for twostage stochastic bilevel programs. SIAM J. Optim. 28(1), 198–222 (2018). https://doi.org/10.1137/16M1098486
Yue, D., Gao, J., Zeng, B., You, F.: A projectionbased reformulation and decomposition algorithm for global optimization of a class of mixed integer bilevel linear programs. J. Glob. Optim. (2018). https://doi.org/10.1007/s1089801806791
Zeng, B., An, Y.: Solving bilevel mixed integer program by reformulations and decomposing. Preprint (Optimization Online) (2014). http://www.optimizationonline.org/DB_HTML/2014/07/4455.html
Zugno, M., Morales, J.M., Pinson, P., Madsen, H.: A bilevel model for electricity retailers’ participation in a demand response market environment. Energy Econ. 36, 182–197 (2013). https://doi.org/10.1016/j.eneco.2012.12.010
Acknowledgements
We thank Martin Schmidt for valuable suggestions on software and literature. The first author thanks the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) for their support within GRK 2297 MathCoRe. The research of the second author was performed partially within the Energie Campus Nürnberg and as such was supported by funding of the Bavarian State Government. Last but not least, we thank the anonymous reviewers for their helpful comments.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Appendix A.1: Example of an unfavorable instance
Example A.1
We solve the following bilevel problem employing Algorithm 1, to show that sometimes all HPRfeasible \(y^l\) assignments are encountered:
For a graphical illustration we refer to 1, while relevant details of each iteration are listed in Table .
1.2 Appendix A.2: Toy example with nonlinearities
Example A.2
We illustrate Algorithm 2 on a nonlinear toy example, which has bilinear terms in both objective functions and integer variables on both levels:
The relation between leader and follower is quite adversarial since the product term \(x_{}^{\text {u}} x_{}^{\text {l}}\) appears with different sign in the two objective functions. The leader controls two binary variables and one continuous variable. The latter is implicitly integer due to the single master constraint and indirectly enters the righthand side of some follower constraints via this representation. There is one continuous and one integer follower variable.
Table 5 summarizes the steps of Algorithm 2. As there are only four different possible upperlevel solutions, i.e. \((x^u,y_{}^{\text {u}_1},y_{}^{\text {u}_2}) \in \{ (0,0,0), (1,1,0), (2,0,1), (3,1,1) \}\), optimality of the solution found with Algorithm 2 can easily be verified.
Iteration 0 The first master problem is the High Point Relaxation. It has the unique optimal solution \((x_{0}^{\text {u},*},y_{0}^{\text {u}_1,*},y_{0}^{\text {u}_2,*},x_{0}^{\text {l},*},y_{0}^{\text {l},*}) = (3,1,1,10,0)\) with objective function value 30. However, solving (FO) shows that \((x_{0}^{\text {l}},y_{0}^{\text {l}}) =(10,0)\) is not an optimal response to \((y_{0}^{\text {u}_1},y_{0}^{\text {u}_2},x_{0}^{\text {l}})=(3,1,1)\). A rational follower instead plays \((x_{0}^{\text {l}},y_{0}^{\text {l}}) =(0,4)\), which leads to a lowerlevel objective value of 40 (instead of \(30\) according to the HPR solution). Problem (BF) confirms that \((x_{0}^{\text {u},*},y_{0}^{\text {u}_1},y_{0}^{\text {u}_2},x_{0}^{\text {l}},y_{0}^{\text {l}}) = (3,1,1,0,4)\) represents a bilevelfeasible solution with objective value 0.
We create projection test, implication and optimality package for \(y_{}^{\text {l},0}=4\). This amounts to introducing two sets \(\mu ^{0} = (\mu ^{0}_{1}, \mu ^{0}_{2}, \mu ^{0}_{3})^{\top }\) and \( \pi ^{0} = (\pi ^{0}_{1}, \pi ^{0}_{2}, \pi ^{0}_{3})^{\top }\) of continuous dual variables for (PT dual) and (3.13), respectively, to the master problem, together with the binary variable \(\psi ^{0}\). In addition we need the constraint (3.4a), and also add the nogood cut (3.6) upfront.
Iteration 1 We solve the master problem (MP) with the newly added variables and constraints, obtaining a solution with \((x_{1}^{\text {u},*},y_{1}^{\text {u}_1,*},y_{1}^{\text {u}_2,*},x_{1}^{\text {l},*},y_{1}^{\text {l},*}) = (2,0,1,10,0)\) and \(\psi ^{0,*}=0\), i.e. the optimality package that has just been added is inactive. We solve (FO) and see that the follower replies \((x_{}^{\text {l}},y_{}^{\text {l}}) =(2,4)\), which is confirmed as being bilevelfeasible together with \((x_{}^{\text {u}},y_{}^{\text {u}_1},y_{}^{\text {u}_2}) = (2,0,1)\) by (BF). This might be surprising at first since it means
However, setting \(\psi ^{0,*}=0\) was still possible in the master problem. We see at closer inspection that (0, 1) is not in U, but represents a boundary case. We thus add a corresponding nogood cut ensuring \(\left[ (y_{}^{\text {u}_1}, y_{}^{\text {u}_2}) = (0,1) \right] \implies \left[ \psi ^{0} = 1 \right] \). Bounds are updated to \(\textit{UB}=20\) and \(\textit{LB}=4\).
Iteration 2 Solving (MP) yields a solution with \((x_{2}^{\text {u},*},y_{2}^{\text {u}_1,*},y_{2}^{\text {u}_2,*},x_{2}^{\text {l},*},y_{2}^{\text {l},*}) = (1,1,0,8,1)\) and \(\psi ^{0,*}=0\). It has objective value 8, which is the new upper bound. This time, \(y_{}^{\text {l}}=4\) is indeed impossible for the follower. Instead, (FO) has the optimal solution \((x_{}^{\text {l}},y_{}^{\text {l}}) =(3,3)\), again confirmed by (BF). The corresponding bilevelfeasible solution assigns objective values of 3 to the leader and 27 for the follower, respectively. Since the former is worse than the incumbent value of 4 for \(\textit{LB}\), there is no update on the lower bound.
We create projection test, implication and optimality package for \(y_{}^{\text {l},0}=3\), introducing further variables \(\mu ^{1} = (\mu ^{1}_{1}, \mu ^{1}_{2}, \mu ^{1}_{3})^{\top }\), \( \pi ^{1} = (\pi ^{1}_{1}, \pi ^{1}_{2}, \pi ^{1}_{3})^{\top }\), and \(\psi ^{1}\). With \(\text {M}_{\mu }=49\) and \(\text {M}_{\pi }=30\) being precomputed safe choices for bigM parameters, the full master problem now reads:
Iteration 3 Solving (MP) now yields a solution with \((x_{3}^{\text {u},*},y_{3}^{\text {u}_1,*},y_{3}^{\text {u}_2,*},x_{3}^{\text {l},*},y_{3}^{\text {l},*}) = (2,0,1,2,4)\) and objective value 4. Feasible values for the remaining variables are \(\psi ^{0} = \psi ^{1} = 1, (\mu ^{0}_{1}, \mu ^{0}_{2}, \mu ^{0}_{3}) = (0,1,0), (\mu ^{1}_{1}, \mu ^{1}_{2}, \mu ^{1}_{3}) = (0,0,1), (\pi ^{0}_{1}, \pi ^{0}_{2}, \pi ^{0}_{3}) = (2,0,0), (\pi ^{1}_{1}, \pi ^{1}_{2}, \pi ^{1}_{3}) = (0,0,0)\). Note that this is the same solution as obtained in Iteration 1 from (BF) with respect to \((x_{}^{\text {u}},y_{}^{\text {u}_1},y_{}^{\text {u}_2},x_{}^{\text {l}},y_{}^{\text {l}})\). Therefore, we now have \(\textit{LB}=\textit{UB}\). After completing the iteration, the algorithm terminates with the bileveloptimal solution \((x_{}^{\text {u},*},y_{}^{\text {u}_1,*},y_{}^{\text {u}_2,*},x_{}^{\text {l},*},y_{}^{\text {l},*}) = (2,0,1,2,4)\).
1.3 Appendix A.3: detailed computational results
Recall that instance number refers to the number of the original instances miblp _20_15_50_0110_10_1 to miblp_20_15_50_0110_10_10 from [46], while \(m_Z\), \(m_R\), \(n_Z\), \(n_R\), \(r\) and \(s\) denote the number of upperlevel discrete, upperlevel continuous, lowerlevel discrete and lowerlevel continuous variables as well as upper and lowerlevel constraints, respectively. Together, these columns identify each instance uniquely, and precisely define its size.
The master problem size in the last iteration can be inferred from the number of completed iterations and added optimality packages. Each iteration adds one nogood constraint irrespective of whether a new optimality package is introduced, and each optimality package (together with the associated projection test) increases the master problem as described in the penultimate paragraph of Sect. 3.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Merkert, M., Orlinskaya, G. & Weninger, D. An exact projectionbased algorithm for bilevel mixedinteger problems with nonlinearities. J Glob Optim 84, 607–650 (2022). https://doi.org/10.1007/s1089802201172w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1089802201172w
Keywords
 Bilevel optimization
 Mixedinteger nonlinear programming
 Strong duality
 Projection
 Global optimization
Mathematics Subject Classification
 90C11
 90C26
 90C46
 91A65