An exact projection-based algorithm for bilevel mixed-integer problems with nonlinearities

We propose an exact global solution method for bilevel mixed-integer optimization problems with lower-level integer variables and including nonlinear terms such as, e.g., products of upper-level and lower-level variables. Problems of this type are extremely challenging as a single-level reformulation suitable for off-the-shelf solvers is not available in general. In order to solve these problems to global optimality, we enhance an approximative projection-based algorithm for mixed-integer linear bilevel programming problems from the literature to become exact under one additional assumption. This assumption still allows for discrete and continuous leader and follower variables on both levels, but forbids continuous upper-level variables to appear in lower-level constraints and thus ensures that a bilevel optimum is attained. In addition, we extend our exact algorithm to make it applicable to a wider problem class. This setting allows nonlinear constraints and objective functions on both levels under certain assumptions, but still requires that the lower-level problem is convex in its continuous variables. We also discuss computational experiments on modified library instances.


Introduction
Hierarchical decision making by different agents occurs in a number of real life problems including, e.g., energy policy and markets [62], pricing schemes [37], supply chain management [60], infrastructure interdiction [10,58], taxation [36], and transportation network design [6]. The authors in particular were motivated by using bilevel modelling to determine optimal graph aggregations for network design problems as in [3], and introducing controllable energy generation facilities with corresponding discrete decisions on the lower level of energy supply tariff design problems described in [25]. In this paper we consider the setting with two agents, the leader on the upper level and the follower on the lower level, known in game theory as a Stackelberg game (see [48]). Furthermore, discrete decisions are allowed on both levels.
Thus, we want an exact solution method for problems from the following class: (1.1) Here we denote the leader's variables, objective function and constraints by x u ∈ R m R + and y u ∈ Z m Z + , F, and G(x u , y u , x l , y l ) ≤ 0, respectively. The follower's optimization problem, parameterized by upper-level variables (x u , y u ), constitutes maximizing objective function f over variables x l ∈ R n R + and y l ∈ Z n Z + under constraints g(y u , x l , y l ) ≤ 0. A detailed set of additional assumptions that we consider in this paper will be introduced later, see Assumptions 1-4 in the beginning of Sect. 2. It should be stressed in particular that a number of the applications mentioned above, e.g., pricing schemes, taxation, interdiction and energy policy problems, fall into the setting given by our set of assumptions. Note that the majority of these applications are special cases that typically include either nonlinearities or lower-level integer variables, but not both.

Contribution
We show on a very small example that the algorithm proposed in [60], due to its approximative nature, may lead to incorrect results just because some constraint is scaled in an unfortunate way.
We then propose an exact algorithm version under the additional assumption that continuous leader variables do not appear in follower constraints. Without this assumption a compact bilevel feasible set cannot be guaranteed from a theoretical point of view, i.e., a bilevel optimum may be unattainable and therefore finite termination with the exact optimum can not be expected. More restrictive assumptions on this point are present in [23], where continuous leader variables are not allowed to appear in the follower's problem at all. The assumption from [23] is also present in [49] and the corresponding software MibS [45].
Furthermore, we dispense with the assumption of feasibility of the bilevel problem required in [60], i.e., our algorithm can detect whether the bilevel problem is infeasible.
Moreover, we extend the algorithm to a nonlinear setting where any kind of MINLP which an off-the-shelf solver can handle is allowed as the leader problem, while the follower problem considered in lower-level continuous variables only has to be convex, bounded, and satisfy Slater's condition in case of feasibility. Thus we also provide a constructive proof that a finite optimum is always attained within this class of bilevel problems.
In two recent surveys on bilevel optimization [12,29], no exact global solution method for this problem class is mentioned, nor is any other existing method-apart from complete enumeration of the follower's integer variables-known to the authors of this paper. Neither are the authors aware of any other proof that an optimum is always attained for the class of bilevel problems considered.
Although there are ε-optimal global optimization methods for bilevel problems requiring partially different assumptions than ours and, e.g., allowing more general nonconvex follower problems (see Sect. 1.3 on existing literature below), the algorithm proposed in this paper is able to find exact global optimal solutions or prove infeasibility for more complex bilevel problem classes than methods with the same property suggested in the literature, e.g., [23,49].
Finally, we demonstrate the performance of the first implementation on newly constructed bilevel test instances. Note that the algorithm proposed in this paper builds upon a MINLP solver of the user's choice, which iteratively derives upper and lower bounds on the bilevel optimum by solving corresponding single-level optimization problems. Thus our method incorporates all merits of an established solver implementation (stability, efficiency, multithreading support, etc.) and automatically profits from any further development of MINLP solvers.

Paper structure
Section 1.3 contains a short overview of selected methods and existing literature for bilevel optimization problems, with a focus on problems with integer follower variables. Our setting including the necessary assumptions is presented in Sect. 2. In Sect. 2.1 we briefly state how the key features of Algorithm 1 described in Sect. 2 are realized in [60], and then describe and illustrate its approximative property in Sect. 2.2. In Sects. 3.1-3.2 we propose an exact realization which guarantees that a bilevel optimum of a feasible bilevel problem is always found. We initially restrict ourselves to the case in which the follower problem is linear. In the end of Sect. 3 we give a proof that our algorithm either finds a global optimal solution of a bilevel problem satisfying Assumptions 1-4 or proves infeasibility in finitely many iterations. Numerical results are presented in Sect. 4. In Sect. 5 we provide details on how to extend our algorithm to nonlinear follower problems. Finally, we conclude with Sect. 6.

Existing literature
Note that whenever the follower's optimal solution is not unique, the bilevel optimal solution as well as its objective value may be so too, see, e.g., [2]. As this issue is hard to deal with in practice, the majority of bilevel problem formulations rely on a so-called optimistic assumption, i.e., whenever multiple optimal lower-level solutions exist, the one allowing best results for the leader is chosen. The opposite approach is to consider the follower's optimal solution that is least advantageous to the leader, cf., e.g., [53,59] and the survey [12] for more references. In the remainder of the paper we consider only bilevel problem formulations with an optimistic assumption.
A common way of solving bilevel problems is reformulating them into a single-level optimization problem, which is obtained by adding optimality conditions for the lower-level problem as constraints to the so-called High Point Relaxation (HPR) of the original bilevel problem, see, e.g., [2,11]. However, this approach requires that general optimality conditions for the lower-level problem can be stated in such a way that some general global optimization solver is able to handle the reformulated single-level optimization problem. Consequently, this method is suitable only for special classes of bilevel problems and in particular does not allow for lower-level integer variables.
A number of methods have been proposed for handling bilevel problems with lower-level integer variables, most of them concentrating on various linear problem classes. Branch and cut approaches for approximating the optimal-value function have been used in [15,16,49], while [21][22][23] extend and apply intersection cuts to bilevel problems. An implementation of the approach published in [16] is publicly available in the solver MibS [45]. In addition, the algorithm presented in [21] is accessible as the software [20].
Global optimization approaches exploiting sensitivity analysis in the lower-level problem for the solution of various classes of bilevel programming problems are suggested in [18,19]. The idea is to solve the lower-level problem as a multi-parametric programming problem, with parameters being the variables of the upper-level problem. Then by inserting the obtained rational reaction sets in the upper-level problem, the overall problem is transformed into a set of independent quadratic, linear or mixed-integer linear programming problems, which can be solved to global optimality. Another parametric approach for linear bilevel problems is presented in [35], while [50] solves bilevel problems without integer variables by min-max reformulation.
There exist some algorithms for nonlinear bilevel problems that are proven to yield εoptimal solutions such as the methods proposed in [39,40] and the Branch-and-Sandwich algorithm from [31,32]. They have been extended to the mixed-integer case in [38] and [33], respectively. Recent advances for these approaches allow, e.g., to lift restrictions on coupling equality constraints [17], or improve algorithm performance [42]. Computational results for a bilevel solver based on the implementation of the Branch-and-Sandwich algorithm are presented in [43]. These algorithms have different sets of assumptions, but some allow continuous nonconvexities on the lower level and are thus more general than ours in this respect. However, in contrast to the ε-optimal methods mentioned above, our aim is an exact global optimal solution algorithm for nonlinear mixed-integer bilevel problems with a set of assumptions that we prove sufficient to guarantee attainment of the bilevel optimum. Note that ε in the definition of bilevel ε-optimality stands for bilevel feasibility as well as a bilevel optimality tolerance, since a bilevel optimum may not always be attained in the general case. For a more comprehensive list of currently available solution methods for bilevel optimization problems, the reader is referred to [12].
In [61] the authors adapt the above-mentioned classical approach of solving bilevel problems via formulating necessary and sufficient optimality conditions of the lower level to a setting where both levels are mixed-integer problems. They show that if a mixed-integer linear bilevel optimization problem has the so-called relatively complete response property, iteratively solving its HPR with successively added follower optimality conditions for some fixed values of lower-level integer variables produces an exact solution of the original bilevel problem. Problem (1.1) has the relatively complete response property if every combination of HPR-feasible (x u , y u ) and HPR-feasible y l can be extended to a HPR-feasible solution x u , y u , x l , y l by a suitable HPR-feasible x l . Henceforth we shall call the HPR with optimality conditions of the lower level for some y l,k with iteration index k the master problem (MP). The relatively complete response property guarantees that the master problem constructed in this way is a relaxation of the original bilevel problem. Successively adding follower optimality conditions for different y l,k to (MP) produces a more precise approximation of the optimal-value function of the original bilevel problem, which becomes exact if all HPR-feasible discrete decisions for the follower have been enumerated. Note that complete enumeration of all possible follower integer configurations is not always necessary for solving the bilevel problem via the procedure described above.
A similar approach is also used for computing binary quasi-equilibria in [28], where the authors propose a transformation of a mixed-integer equilibrium problem into a mixed-integer bilevel problem, which they then solve by enumerating all possible integer configurations and formulating corresponding sets of follower optimality conditions. In this case the feasibility of the follower problem is not affected by the leader's decisions, and the relatively complete response property holds.
In [60] the authors propose an extension of an algorithm from [61], which aims at handling lower-level integer variables in linear bilevel problems without the relatively complete response property. They consider a set of upper-level decisions which allow certain lowerlevel integer configurations, and add corresponding lower-level optimality conditions to the HPR only for this upper-level set.
Note that the idea of iteratively adding a bound for the lower-level objective function value to the HPR and making this bound valid only for a certain set of upper-level configurations was previously proposed in [38]. However, the particular formulation of this bound as well as the way of defining and finding the corresponding upper-level set and imposing the implied bound differs from the one suggested in [61].

General setting and algorithm structure
In this section we first state our assumptions on the class of bilevel problems the approach proposed in this paper can solve. Afterwards we introduce some necessary bilevel terminology in order to describe the general algorithm idea as proposed in [60]. Then, in Sect. 2.1 we state how the key features of the general projection algorithm are realized in [60]. Finally, in Sect. 2.2 we give an example on how this realization makes the algorithm prone to failure.
The algorithm proposed in the current paper requires the following assumptions: Assumption 1 is a usual assumption which ensures that all terms occurring in the formulations are bounded. We will make use of it when reformulating various indicator constraints as well as for showing that the proposed algorithm terminates after finitely many iterations.
Assumption 2 For any fixed upper-level continuous and integer decisionsx u andȳ u , respectively, and lower-level integer decisionsȳ l , the follower problem is convex and in case of feasibility satisfies Slater's condition, i.e., f (x u ,ȳ u , ·,ȳ l ) is concave, g(ȳ u , ·,ȳ l ) are convex, and there exists a strictly feasible point (satisfying all nonlinear constraints with strict inequality) for allx u ,ȳ u andȳ l if g(ȳ u , x l ,ȳ l ) ≤ 0 has a solution in x l . Moreover, functions f and g are continuously differentiable within lower-level variable bounds.
The main consequence of Assumption 2 is that, considering all upper-level variables as parameters, we can formulate necessary and sufficient optimality conditions for the follower problem with fixed lower-level integer variables. Note that although there are optimal-value function formulations [54,55] as well as optimality conditions based on extended duality [1] for ILPs/MILPs, they do not allow a single-level reformulation of the formulation (BLP) below that can be directly handed over to an off-the-shelf solver.

Assumption 3
The high point relaxation (HPR) of the original bilevel problem together with the necessary and sufficient optimality conditions for the lower level as assumed in Assumption 2 is of a problem class that can be handled by some off-the-shelf optimization solver.
The algorithm presented in this paper builds upon solvers for problems of the type given in Assumption 3. Hence we need this general assumption. In practice (and in our computational experiments), we mainly think of this problem class to be mixed-integer nonlinear problems (MINLPs) with polynomial nonlinearities. Note that Assumptions 1-3 are also present in [60].

Assumption 4 The lower-level constraints do not contain any continuous upper-level variables.
Without Assumption 4 we might not be able to attain the bilevel optimum despite Assumption 1, see [51]. Indeed, this assumption justifies using 'max' instead of 'sup' in the problem formulation. Assumptions with this effect are widely used in the literature on different levels of restrictiveness. For instance, the requirement of an all-integer leader and/or follower problem is frequently encountered. The main point of Assumption 4 is that continuous leader variables do not influence the follower's feasible set. This is very similar, yet slightly less restrictive, than Assumption 2 from [23] and Assumption 1 from [49], where continuous leader variables are also banned from the follower objective function, in addition to being banned from the follower constraints. This assumption has already been mentioned in earlier work by the same authors (e.g., see [22]) as being very important for their bilevel solver based on intersection cuts. Indeed, for the linear case, in which no continuous leader variables are present in the follower problem at all, [51] shows that a bilevel optimum is attained. To the best of the authors' knowledge, no such statement exists yet for nonlinear cases satisfying Assumption 4, but possibly with continuous leader variables appearing in the follower objective function. This paper offers a constructive proof that a bilevel optimum is attained in this setting too. See Corollary 3.17 in Sect. 3 for more details.
Using the optimistic assumption described at the beginning of Sect. 1.3, we can employ the following reformulation of the original bilevel problem (1.1): where θ x u , y u = max is an optimal-value function, cf. Section 1.2 in [14]. The High Point Relaxation (HPR) mentioned earlier is obtained from (BLP) by dropping the (optimal-value-function constraint).
Next, we provide some definitions from [60] that are needed to state their general algorithm, including a master problem and two subproblems. By k we denote the iteration index in the following.

Definition 2.1
For any lower-level feasible y l,k , let P y l,k be defined as

Definition 2.2
For any lower-level feasible y l,k , by Proj (y u ) P y l,k we denote We consider y l,k to be lower-level feasible if it can be extended to a feasible solution (x l , y l,k ) of the follower's optimization problem by suitable x l for some y u , and, similarly, y l,k to be HPR-feasible if it can be extended to a HPR-feasible solution (x u , y u , x l , y l,k ) by suitable x u , y u , x l . We slightly abuse the notion of lower-level and HPR feasibility in the same way at various places throughout the paper.
The HPR of the original bilevel problem together with optimality conditions of the lower level for temporary fixed lower-level feasible y l,k is our master problem, denoted by (MP): is the optimal objective function value of the lower-level problem where all but continuous lower-level variables x l are fixed, and Y L is a set of some lower-level feasible integer configurations, which formation will be shown in Algorithm 1. We deviate from notation purity for the sake of more intuitive understanding and use the same symbol θ as for the optimalvalue function defined in (2.2). Particular realizations of projection test, implication and optimality package will be discussed later in this paper.
The following subproblem provides the optimal objective function value of the lower level for some fixed upper-level decision (x u ,ȳ u ), i.e., the value of θ at the point (x u ,ȳ u ). We call it follower optimality (FO) problem: The last required subproblem checks whether for certain fixed (x u ,ȳ u ) there exists an optimal solution of the lower-level problem that together with (x u ,ȳ u ) satisfies the upperlevel constraints, i.e., whether (x u ,ȳ u ) can be extended to a bilevel feasible solution. In addition to that, if such a lower-level solution exists, the one that produces best results for the upper level will be chosen, thus realizing the optimistic assumption. We call this subproblem bilevel feasibility (BF) problem: Now we are ready to state a general form of the algorithm as proposed in [60] for feasible bilevel problems:

Algorithm 1 General algorithm for feasible bilevel problems with integer variables
Set UB to the optimal objective function value of (MP) Solve (FO) for x u, * k , y u, * k to get x l k ,ŷ l k and objective function value θ k x u, * k , y u, * k Solve (BF) for x u, * k , y u, * k and θ k x u, * k , y u, * k if (BF) is feasible then Denote optimal solution of (BF) as x l k ,ỹ l k LB ← max{LB, optimal objective function value of (BF)} y l,k ←ỹ l k else y l,k ←ŷ l k Add projection test, implication and optimality package for y l,k to (MP) Terminate and return the optimal solution All of our assumptions apart from Assumption 4 stem directly from this general algorithm form and are also assumed in [60]. They guarantee that we are able to solve the master problem (MP) or determine that it is infeasible (Assumption 3), formulate optimality packages for fixed y l,k (Assumption 2) and, if the bilevel problem is feasible, arrive at a bilevel optimal solution after finitely many iterations (Assumption 1). Assumption 4, that has already been motivated from a theoretical viewpoint, will be shown to be essential in practice too in Sect. 3.1.

Remark 2.3 For all HPR-feasible
The following statements-Lemmas 2.4, 2.6, and Theorem 2.7-were already shown in or follow directly from [60]. We list them here in our notation for completeness. Corollary 2.5 is not explicitly stated in [60] (due to their assumption that a feasible solution exists), but is a direct consequence of Lemma 2.4.  Lemma 2.6 Algorithm 1 generates some y l,k to be added to Y L k at the end of each iteration k. If thus generated y l,k is already contained in Y L k , i.e., it has been generated in some previous iteration, Algorithm 1 terminates with a bilevel optimal solution at the latest in iteration k + 1.

Theorem 2.7
Given a feasible bilevel problem (BLP) satisfying Assumptions 1-4, Algorithm 1 finds a bilevel optimal solution in finitely many iterations.
The proofs of statements analogous to Lemma 2.6 and Theorem 2.7 that are needed for the exact projection test as described in Sect. 3.1, can be found in Sect. 3.3. Note that in the worst case, Algorithm 1 can result in a complete enumeration of all lower-level feasible y l . See Example A.1 in the appendix for an unfavorable instance in this regard.
For practical purposes, in the remainder of this paper we focus on the case in which the follower problem is linear, i.e., f and g are linear functions for fixed upper-level decisions y u andx u . In this case we can write (a lower-level objective function term that is constant for fixed y u and x u would be meaningless) and for vector-valued functions f R , f Z and matrix-valued functions g R , g Z , g c of the upper-level decisions. Extensions to the general case characterized by Assumptions 1-4 will be discussed in Sect. 5.

Projection test, implications and optimality packages as realized in the literature
In this section we describe the setting and the way projection test, implication of optimality package and optimality package for y l,k necessary for Algorithm 1 are realized in [60].
Their set of assumptions include Assumptions 1-3 and an additional assumption that the inducible region is nonempty, i.e. the bilevel problem is feasible. However, Assumption 4 is not present and the setting treats only bilevel problems where the corresponding HPR is linear.
Projection test (PT) determining whether y u ∈ Proj (y u ) P y l,k is realized with the following subproblem formed at the end of each iteration for the resulting y l,k : where 1 s is an s-dimensional all-ones vector (1, . . . , 1) , s is the number of constraints in the follower's problem and x l,k are copies of the lower-level continuous variables x l , introduced specifically for projection test in iteration k.

Remark 2.8
Given y l,k generated by Algorithm 1 and a HPR-feasible y u . If the optimal value of (PT 0 ) is 0, then y u ∈ Proj (y u ) P y l,k applies, and otherwise y u / ∈ Proj (y u ) P y l,k .
Then (PT 0 ), its dual feasibility conditions as well as both types of complementarity constraints are added to (MP) in order to replace (2.7c) < complementarity constraint for (PT 0 ) and its dual > . (2.7d) The authors of [60] claim that constraint (2.7a) cannot be handled by off-the-shelf solvers directly and propose an approximative reformulation dependent on some ε ≥ 0: In Sect. 2.2 we elaborate on the theoretical meaning of this approximation and illustrate how it can lead to algorithm failure in practice. The algorithm from [60] then utilizes a feature for specifying indicator constraints provided by some mixed-integer solvers such as, e.g., CPLEX, in order to realize implication (2.8a).
In total the approximative projection test, implication and optimality package add one new binary as well as (2n R + 2s) new continuous variables, and (6n R + 4s + 2) new constraints to (MP) in each iteration.
All (3n R +2s) complementarity constraints from both projection test and optimality package are then linearized using a big-M technique, as the master problem has to remain linear in [60] in order to be able to employ the indicator constraints as mentioned above. Note that, unless upper bounds for dual variables of the lower level are known, the big-M linearization technique may lead to suboptimal solutions in the bilevel context, see, e.g., [44] for more information.

Consequences of the approximative projection test
In this section we first explain some theoretical background regarding constraint (2.7a). Then we show how an inappropriate choice of ε used for the approximation (2.8a)-(2.8b) can lead to algorithm failure even on a very simple bilevel problem satisfying Assumption 4. In particular, we show that a value for ε ensuring correctness of Algorithm 1 with projection test as described in [60] depends also on the constraint representation of the bilevel problem.
First of all, note that constraint (2.7a) to be approximated in projection test implies optimizing over non-closed sets, as will be explained later in more detail in Sect. 3, see (3.2) and (3.3) therein. Hence, its realization poses problems not just from a practical viewpoint of, e.g., unavailable solver features. Recall that, as stated in Sect. 2, sometimes no bilevel optimum can be attained precisely because the bilevel feasible region is not compact, cf. Example 6.2.1(ii) from [2]. Assumption 4 eliminates this theoretical problem, i.e. when upper-level variables do not influence the follower's feasible region, the bilevel optimum under the optimistic assumption can always be attained.
However, the approximation (2.8a)-(2.8b) can still cause problems even for bilevel problems satisfying Assumptions 1-4. An ε that is on the one hand small enough to ensure correctness of Algorithm 1 with projection test as described in [60], and on the other hand big enough so as not to lead to numerical intractability, may be hard to find in practice. The following example shows how all bilevel feasible points can be cut off and a problem instance is erroneously classified as infeasible due to constraint scaling.

Example 2.9
Let ν ≥ 0 be some scaling parameter. We solve the following bilevel problem parameterized by ν employing the algorithm from [60] and show that the outcome is incorrect for ν < ε: max is infeasible as the two last constraints imply 4 ≤ y l ≤ 11 4 . So we add y l,0 =ŷ l 0 = 4 to Y L 1 . Thus, approximative projection test (PT 0 ) reads: For any optimal solution t 0, * 1 , t 0, * 2 of (PT 0 ) we have Assuming t 0 1 = t 0, * 1 and t 0 2 = t 0, * 2 are optimal solutions of (PT 0 ), constraint (2.8a) for y l,0 = 4, given by is added to problem (MP). Observe that there are only three HPR-feasible y u in this example problem, namely {0, 1, 2}. For y u = 0 and y u = 1 the optimal objective function value of (PT 0 ) is 0 irrespective of the value of ν. In this case the above constraint correctly activates the corresponding optimality package, i.e. y l ≥ 4 for y u ∈ {0, 1}. In particular, it means that there are no bilevel feasible solutions for y u ∈ {0, 1}. The correct rational response of the follower to the only remaining HPR-feasible point y u = 2 is y l = 2. However, for ν < ε the above constraint will impose the implication y u = 2 → y l ≥ 4 , which cuts off the only bilevel feasible point (2, 2) and leads to the instance being classified as infeasible.
So, for any ε > 0 to be chosen for the approximative projection test there exists a half-space representation of the lower-level problem from Example 2.9 such that Algorithm 1 will erroneously classify the corresponding bilevel problem as infeasible.
For numerical reasons, optimization solvers usually employ some kind of scaling procedure on the problem before the actual solving process starts, see, e.g., [8]. This obviously changes the original representation of the problem and thus has, as shown on Example 2.9, an influence on the suitability of the choice of ε. Unfortunately, in most cases the exact scaling process of a solver when using the most performant parameter settings is neither known nor accessible to the user.

Exact algorithm realization suitable for nonlinearities
In this section we describe our realization of projection test, implication and optimality package for each y l,k ∈ Y L k in the master problem (MP), i.e., which makes Algorithm 1 exact and allows its extension to a nonlinear setting. Analogously to the implementation described in Sect. 2.1, we employ a binary variable ψ k for every y l,k ∈ Y L k to separate the above routines of the algorithm: Section 3.1 deals with (3.1a), and Sect. 3.2 handles (3.1b). At the end of this section we present our algorithm version with the proof of its correctness and finite termination.

Exact projection test
As has been seen in Example 2.9, approximative projection test applies optimality package for y l,k also to some y u / ∈ Proj (y u ) P y l,k . Here we aim at precise handling by adding the implication for y u ∈ Proj (y u ) P y l,k , but abstaining from adding it for any y u / ∈ Proj (y u ) P y l,k . Note that Proj (y u ) P y l,k is a discrete set, so the required implication (3.1a) could be imposed by some kind of enumeration procedure. However, this would mean that Algorithm 1 would enumerate not only y l , but combinations of y l and y u . Therefore we want to avoid enumeration of y u whenever possible and use dual formulations for projection test, similar to the technique described in Sect. 2.1.
For this we utilize the linear relaxation of P y l,k , denoted by P lin y l,k , i.e., (cf. Definition 2.1). Correspondingly, we denote the projection of P lin y l,k to the space of upper-level discrete variables by Note that while Proj (y u ) P y l,k and Proj (y u ) P lin y l,k contain exactly the same integer points, Proj (y u ) P y l,k is a discrete set whereas Proj (y u ) P lin y l,k is a continuous closed set.
The implication which is equivalent to the disjunction however, cannot be modeled directly, as the complement of Proj (y u ) P lin y l,k is an open set. Consequently, for linearly relaxed y u ∈ R m Z + the set on which (3.2) is true is not a closed set. One option to handle this obstacle is to use an approximation as it is done in [60], described in Sect. 2.1. However, as discussed in Sect. 2.2, we want to avoid this and obtain an exact algorithm. So our idea is the following: (a) Find an open subset U ⊆ Proj (y u ) P lin y l,k for which the implication y u ∈ U ⇒ ψ k = 1 can be modeled using standard duality theory. (b) Model the remaining requirement y u ∈ Proj (y u ) P lin y l,k \U ⇒ ψ k = 1 via disjunction over y u ∈ Proj (y u ) P lin y l,k \U .
Since (b) will involve some kind of enumeration, U should be chosen as large as possible such that (a) can still be modeled conveniently. We choose U = Proj (y u ) P lin y l,k • , i.e., the interior of P lin y l,k projected to the space of discrete upper-level variables. Note that the projection is an open map, so U is indeed an open set in the target space.

Remark 3.1
Since we would like U to be as large as possible, U = Proj • (y u ) P lin y l,k seems to be the canonical candidate as it is by definition the largest open subset of Proj (y u ) P lin y l,k . However, this set is more difficult to describe. In particular, note that Proj • (y u ) P lin y l,k = Proj (y u ) P lin y l,k • in general as can be seen from the following example: Suppose g : R 3 → R is a scalar function with g(y u , x l , y l,k ) = x l − (y u − 1) 2 . In this case, for any y l,k we have and hence its interior is equal to R >0 . However, for y u = 1 there is no x l ∈ R + such that x l < (y u − 1) 2 and therefore One may object that U = ∅ if P lin y l,k is not full-dimensional, which happens regularly in practical instances due to equations in the follower problem. Nonetheless, an adjustment for U in this case is possible and will be presented in Sect. 3.1.1.
W.l.o.g. we assume in Sects. 3.1 and 3.2 that there are no lower-level constraints which depend on y l only, i.e., where neither x l nor y u are present. Indeed, as such constraints are always satisfied for a y l,k that resulted from solving (FO) or (BF) and have no influence on other variables, they can be safely disregarded once discrete lower-level variables are fixed to y l,k .
We introduce the following auxiliary optimization problem for projection test: where t k s = t k , . . . , t k ∈ R s is the vector having t k in every component. It is easy to see that y u ∈ Proj (y u ) P lin y l,k if and only if (PT) has a feasible solution with t k ≥ 0. Furthermore, we have the following: Lemma 3.2 For any fixed y l,k andȳ u the problem (PT) has a nonempty feasible region. Moreover, the optimal solution part t k, * is unique, and the following equivalences hold: Proof Recall that due to Assumption 1 all variables of both levels have finite bounds. Since t k is a free slack variable present in every constraint of (PT), with x l,k = 0 it can be chosen such that all constraints are satisfied. Since the objective function of (PT) depends only on one decision variable t k , the optimal solution part t k, * is unique. For the optimal solution t k, * at least one of the constraints is satisfied with equality. Thus t k, * is per construction the maximal slack applicable to all lower-level constraints with lower-level variables fixed to y l,k . Since y l,k is a constant, Case 1 is equivalent to the existence of somex l,k such that if g does not contain any tautological equations for the particular y l,k . Recall that w.l.o.g. we have excluded the possibility of lower-level equations containing no other variables apart from y l , so this translates toȳ u ∈ Proj (y u ) P lin y l,k • . Consequently, Case 2 corresponds toȳ u ∈ Proj (y u ) P lin y l,k \U .
This completes the proof.
Let us now consider task (a), corresponding to the first case from Lemma 3.2. For some sufficiently large number M μ (details of which we discuss in Sect. 3.1.1), we introduce the following inequality: It ensures that ȳ u ∈ Proj (y u ) P lin y l,k • ⇒ ψ k = 1 due to Case 1 of Lemma 3.2. We can formulate the dual problem to (PT): Here, 1 s = (1, . . . , 1) is an s-dimensional all-ones vector.
We will show that we need only dual feasibility constraints to be able to reformulate (3.4) as 1} and M μ be some sufficiently large number. Let strong duality hold for (PT) and (PT dual). Adding (PT dual) and (3.4a) to the master problem (MP) implies Proof The weak duality theorem implies for every feasible solution of (PT) and (PT dual). Lemma 3.2 states that t k, * > 0 if and only if y u ∈ Proj (y u ) P lin y l,k • and, therefore, for every μ satisfying (PT dual). Then ψ k is set to 1 due to (3.4a). Note that in the master problem (MP) the variables μ appear only in constraints from (PT dual) and in (3.4a), which do not involve any other variables apart from ψ k . Thus ψ k is the only variable that links these constraints to the rest of (MP). Consequently, if μ can be chosen such that no restriction is imposed on the value of ψ k , i.e., such that the left-hand side of (3.4a) is nonpositive, these constraints exert no influence on (MP). This is indeed possible due to strong duality and Cases 2 and 3 of Lemma 3.2.

Remark 3.4
Note that unless strong duality holds between (PT) and (PT dual), the above approach may erroneously classify some upper-level variable values as belonging to a certain projection when in fact they do not. However, the required strong duality is satisfied due to Assumption 2.
Thus, adding (PT dual) and (3.4a) to (MP) ensures correct handling for the Cases 1 and 3 from Lemma 3.2 and completes task (a) from p. 14.
Next we consider task (b) handling Case 2. We are unable to add the implication directly to the formulation for the same reason we could not do so for implication (3.2). Instead, we may add a weaker implication to (MP) that considers only the optimal solution values y u, * k of the upper-level discrete variables in the current algorithm iteration: An obvious way to enforce implication (3.6) is to employ a so-called no-good cut [9,27], which cuts off a specific solution. A similar approach was pursued in [52] for a Benders' decomposition algorithm with a mixed-integer linear subproblem. In it, each assignment of discrete variables corresponds to a nonconvex set, which is the union of the complementary set and the boundary of a polyhedron. The no-good cut is used to exclude cases in further iterations that lead to points on faces of the polyhedron.
A way to realize a no-good cut for implication (3.6) is to use a quadratic constraint: In case all upper-level integer variables are binary, instead of (3.7) one may use the linear constraint 1 + Remark 3.5 Note that (3.5) can be imposed on the master problem gradually by adding suitable no-good constraints realizing the implication (3.6) for some y u,k ∈ Proj (y u ) P lin y l,k \U ∩ Z m Z + , where y u,k is the upper-level discrete part of the optimal solution of (MP) in iteration k.
The required implication (3.1a) can be imposed on the master problem by adding (PT dual), (3.4a) and a finite number of no-good constraints realizing (3.5), which at the same time imposes no implications on any y u / ∈ Proj (y u ) P y l,k . The number of required no-good constraints is finite because the number of HPR-feasible y u is finite. Note that Assumption 4 is essential for this to work as it ensures that only discrete upper-level variables influence the set of feasible follower responses.

Projection test with equalities
If the description of P lin y l,k contains equations, P lin y l,k • is empty, i.e., Case 1 from Lemma 3.2 never occurs. Thus, the primarily desired implication (3.2) would be imposed by (3.6) only, which would lead to adding a lot of no-good cuts and, consequently, a large number of algorithm iterations. In order to avoid this, we propose a special handling for the case of equality constraints being present in the lower-level problem formulation.
First, we distinguish between equality and inequality constraint functions, denoted by h and g, respectively. Matrix-valued functions h R , h c , h Z and g R , g c , g Z are used accordingly. We denote the number of inequality constraints of the lower level by s I ≤ s.
Our aim is to substitute where P I lin y l,k = y u , x l : g(y u , x l , y l,k ) ≤ 0, y u ∈ R m Z + , x l ∈ R n R + and g(y u , x l , y l ) ≤ 0 are the inequality constraints of the lower level. Hence, To achieve this we modify the primal projection test problem (PT) by abstaining from adding slack variables to equality constraints of the lower level: where t k s I = t k , . . . , t k ∈ R s I is a vector having t k in every component. Note that in contrast to (PT), (PTEq) is not necessarily feasible. However, we will show that its dual is always feasible.
With the dual variables λ and μ corresponding to equality and inequality constraints of (PTEq), respectively, we formulate the dual problem to (PTEq): Again, 1 s I = (1, . . . , 1) is an s I -dimensional all-ones vector.

Lemma 3.6
The optimization problem (PTEq dual) always has a nonempty feasible region.
Proof Recall that according to Lemma 3.2, (PT) always has a nonempty feasible region. Thus, although (PTEq) is not necessarily feasible, its infeasibility can arise only from violated equality constraints. Therefore (PTEq) without equality constraints is always feasible with finite optimal value, as consequently is the case for its dual For any feasible solutionμ of (PTwithoutEq dual), (μ, λ = 0) constitutes a feasible solution of (PTEq dual).

Corollary 3.7 If (PTEq) is infeasible, (PTEq dual) is unbounded.
Proof The dual of an infeasible linear optimization problem can be either infeasible or unbounded according to the LP theory, see Corollary 2.5 from [57]. Lemma 3.6 shows that the first case can not occur with the considered primal-dual pair. Therefore the second case is true and an infeasibility of (PTEq) implies unboundedness of (PTEq dual).
Analogously to (3.4a), we introduce the following constraint to complete the projection test (PTEq dual) with equalities: (3.4b) Now we are ready to provide the analogon of Lemma 3.3 for the case of equality constraints being present on the lower level: ȳ u ∈ Proj (y u ) P I lin y l,k • ∩ P lin y l,k ⇒ ψ k = 1 ȳ u / ∈ Proj (y u ) P I lin y l,k • ∩ P lin y l,k ⇒ no additional restrictions on the master problem.
Proof Ifȳ u ∈ U I = Proj (y u ) P I lin y l,k • ∩ P lin y l,k , all constraints of (PTEq) can be satisfied and the maximal slack t k, * for inequality constraints is strictly positive, analogously to the first case of Lemma 3.2. Thus the dual (PTEq dual) is also feasible and, due to weak duality, (3.4b) enforces ψ k = 1: Thus we obtain the first implication of the lemma and proceed with the second one. Ifȳ u / ∈ U I , then either (PTEq) is infeasible, or has t k, * ≤ 0. If (PTEq) is infeasible, then the minimization problem (PTEq dual) is unbounded according to Corollary 3.7. For t k, * ≤ 0, strong duality holds between (PTEq) and (PTEq dual): Thus, forȳ u / ∈ U I , a feasible solution (μ, λ) of (PTEq dual) can be chosen such that its objective function, which is also the left-hand side of (3.4b), is strictly less than or equal to 0 regardless of the feasibility of (PTEq). Therefore (3.4b) imposes no restrictions on the value of ψ k ifȳ u / ∈ U I , and this completes the proof analogously to the proof of Lemma 3.3.
Now we address the issue of computation of M μ . Note that it is sufficient to choose M μ big enough such that for every HPR-feasible y u there is a feasible solution of (PTEq dual) that satisfies (3.4b). Then an optimal solution of (PTEq dual) also satisfies (3.4b) for every HPR-feasible y u , and, consequently, adding (3.4b) to the master problem (MP) produces the effect desired by Lemma 3.8. In the proof of Lemma 3.6 we established that for any y u there always is a feasible solution for (PTEq dual) with λ = 0. Therefore, a suitable M μ is, e.g., the optimal objective function value of (PTwithoutEq dual) combined with the HPR constraint set and variables of (BLP). The resulting auxiliary optimization problem is always feasible, as (PTwithoutEq dual) is feasible for any HPR-feasible y u , and has a finite objective function value as all variables in there are bounded.

Remark 3.9
The above proposed projection test including no-good cuts allows an exact realization of Algorithm 1 and also results in some more changes compared to the approximative projection test described in Sect. 2.1: • We need only dual feasibility constraints together with (3.4a) or (3.4b), respectively, instead of all necessary optimality conditions for (PT) or (PTEq), respectively. • Consequently, neither x l,k as 'copies' of continuous lower-level variables, nor t k for each y l,k are needed. • We introduce one implication, i.e., one no-good constraint, more per iteration k of the algorithm. Altogether, we introduce one new binary and s continuous variables as well as n R + 3 constraints for the exact projection test in each iteration. • We may have to enumerate a lot of or, in the worst case, all integer points of some sets Proj (y u ) P lin y l,k \U I . Example 3. 10 We revisit Example 2.9 using the improved algorithm proposed above. For k = 0, the steps solving (MP), (FO) and (BF) are identical, so we start from (PT). As the lower level does not contain equations, we employ projection test with U : Let us consider all three possible cases regarding Proj (y u ) P lin y l,0 : The dual problem (PT dual) reads: Taking dual feasibility constraints together with the proposed inequality (3.4a) we obtain that ψ 0 = 1 is implied exactly for y u = 0 corresponding to Case 1, and ψ 0 ∈ {0, 1} for larger values of y u . Indeed, consider all three possible cases regarding Proj (y u ) P lin y l,0 : Case 1: for y u = 0 we have νμ 0 1 + 6μ 0 2 ≤ ψ 0 M μ which implies ψ 0 = 1 since μ 0 1 + μ 0 2 = 1 and μ 0 1 , μ 0 2 ≥ 0. Case 2: for y u = 1 we have 4μ 0 2 ≤ ψ 0 M μ . Hence, ψ 0 = 0 is possible as (μ 0 1 , μ 0 2 ) = (1, 0) is a feasible solution for (PT dual). Case 3: for y u ≥ 2 the objective coefficient of μ 0 1 is negative such that the feasible solution (μ 0 1 , μ 0 2 ) = (1, 0) even gives a negative objective value for (PT dual), again allowing ψ 0 = 0.
By adding the no-good cut corresponding to y u, * 0 , i.e., the implication y u = y u, * 0 = 1 ⇒ ψ 0 = 1 , Case 2 is covered as well. Altogether, projection test, implication and optimality package for y l,0 to be added to the master problem (MP) comprise 0 ≤ y u ≤ 1 → y l ≥ 4 .
As the point (2, 2) is bilevel-feasible, LB resulting from (FO) and (BF) is also 0, and the algorithm terminates with an optimal solution.

Implications and optimality packages
Recall that optimality package for y l,k in the master problem (MP) consists of the following constraint: where θ x u , y u , y l,k is the optimal objective function value of the follower problem with all but continuous lower-level variables x l fixed. Assumption 2 ensures that θ x u , y u , y l,k can be formulated by using necessary and sufficient optimality conditions of the lower level with fixed y l , such as, e.g., KKT conditions. Note that we are only interested in globally optimal solutions of the original bilevel problem and thus do not require reformulations to be equivalent also in terms of locally optimal solutions. For the latter, the situation is actually more complicated and equivalence may not hold [13]. Let us denote the lower-level dual variables in iteration k by π k . For a linear follower problem we can choose optimality package comprised of primal and dual feasibility constraints for the lower level as well as the strong duality equality with fixed integer variables y l,k : (3.9) For our approach we need to implement logical implications of the following form: ψ k = 1 ⇒ optimality package for y l,k is active, i.e., (3.9) is added to (MP), ψ k = 0 ⇒ optimality package for y l,k is inactive, i.e., no additional restrictions are imposed on the master problem.
(3.10) Implications as (3.10) are often realized by so-called indicator constraints [5,7]. An indicator constraint is a way to express logical relationships among variables by designating a binary variable to control whether a specified constraint is active or not. Some solvers provide facilities for using indicator constraints, e.g., CPLEX, which are utilized in [60]. It is also common to implement indicator constraints by using SOS1 conditions [4], which is the approach taken by Gurobi, for example. So far, solvers do not handle arbitrary nonlinearities together with indicator constraints, and since indicator constraints usually lead to weaker relaxations, we follow a big-M approach. If we use big-M formulations for (3.10), we have to deduce a suitable big-M coefficient for arbitrary lower-level problems. A similar problem arises in a popular solution method for linear bilevel optimization problems where the lower level is reformulated using KKT optimality conditions, which then are linearized using big-M formulations [60]. Finding a suitable big-M coefficient in this case is already challenging as is indicated in [30,44]. One possibility is to derive the correct big-M coefficient based on bound propagation. The dual variables of the lower level, however, do not have finite bounds in general. Nevertheless, in the following we show that we do not require bounds for dual variables π k of the lower level in iteration k in order to realize the implication (3.10).
Consider the following reformulation of the implication (3.10): where e i are the corresponding standard basis vectors. Similarly to the calculation of M μ described in Sect. 3.1.1, the calculation of M π can be done by solving three auxiliary optimization problems with the objectives as given in (3.12) and constraints and variables from the HPR of the original bilevel problem. Note that in these auxiliary problems y l,k are variables from a set of all lower-level feasible integer configurations. Terms appearing in the M π formula (3.12) that do not include any variables can be multiplied by ψ k in (3.11). Then these terms can be disregarded in the M π calculation.
Proof The first part of the implication (3.10), i.e., the one for ψ k = 1, is fulfilled trivially.
In case of ψ k = 0, optimality package corresponding to y l,k is inactive, i.e., constraints (3.11) have to be satisfied for any HPR-feasible x u , y u , x l , y l . This is true with, e.g., x l,k = 0 and π k = 0. As both x l,k and π k appear only in optimality package for y l,k and nowhere else in the master problem, in an inactive optimality package their value can be chosen freely if the first constraint of (3.11) is then valid for all HPR-feasible x u , y u , x l , y l . optimality package formulation (3.11) can be simplified even further: Lemma 3.12 Assume that strong duality holds for the follower problem in continuous lowerlevel variables. Then, for ψ k ∈ {0, 1} the inequality system (3.13) is equivalent to (3.10).
Proof The case of ψ k = 0 can be treated analogously to Lemma 3.11. All inequalities of (3.13) are satisfied for π k = 0. As π k appears only in the optimality package corresponding to y l,k , which is inactive for ψ k = 0, the values of π k can be chosen freely without influencing the result of the overall optimization problem.
Then consider the case ψ k = 1, where the idea of the proof is similar to Lemma 3.2. Due to weak duality the relation is always true and is even satisfied with equality for each optimal solution pair of the primal and dual follower problem. For ψ k = 1, the first inequality of (3.11) as well as of (3.13) already ensures lower-level optimality for fixed y l,k by imposing a lower bound on the lowerlevel objective function value. Consequently, in the optimal solution of the master problem, π k must constitute an optimal solution of the dual lower-level problem with fixed y l,k . Indeed, for ψ k = 1, the master problem is feasible only if (3.14) is satisfied with equality, which is possible due to strong duality. Thus only the dual feasibility conditions are required to correctly impose the lower bound on the lower-level objective function for fixed y l,k .

Remark 3.13
The optimality package formulation (3.13) dispenses with the copies of lower-level continuous variables x l,k and corresponding primal feasibility constraints together with the explicit strong duality constraint. Thus, only s continuous variables and n R + 1 constraints are added to the master problem in each iteration to realize optimality package.
Note that complete elimination of x l,k from the problem formulations is possible only in combination with the specific form of the projection test described in Sect. 3.1. Absence of the primal lower-level feasibility constraints makes calculation of M π easier and the number itself potentially smaller.

Algorithm form, correctness and finite termination
In the following we present Algorithm 2, a modification of Algorithm 1 with our exact projection test, that also can decide feasibility of (BLP). In order to handle Case 2 from Lemma 3.2, we need to keep track of all encountered upper-level integer solutions y u,k corresponding to every generated lower-level integer configuration y l,k , denoted by Y U y l,k . For each of these y u,k a no-good constraint of the form (3.7) or, if all leader integer variables are binary, (3.8), is added to the master problem (MP) as part of exact projection test. The rest of projection test is composed of (PTEq dual) and (3.4b), or (PT dual) and (3.4a), depending on whether a special equality constraint treatment is needed or not. Optimality package comprises (3.13).
Analogously to Lemma 2.4, we formulate the following statements:

Algorithm 2
Algorithm for bilevel problems with exact projection test Solve master problem (MP) for Y L k and Y U y l,k , y l,k ∈ Y L k 4: if (MP) is feasible then 5: Denote its optimal solution by x u, * k , y u, * k , x l, * k , y l, * k 6: Set UB to the optimal objective function value of (MP) 7: Solve (FO) for x u, * k , y u, * k , to get x l k ,ŷ l k and θ k x u, * k , y u, * k 8: Solve bilevel feasibility problem (BF) for x u, * k , y u, * k and θ k x u, * k , y u, * k 9: if (BF) is feasible then 10: Denote optimal solution of (BF) as x l k ,ỹ l k 11: LB ← max{LB, optimal objective function value of (BF)} 12: y l,k ←ỹ l k 13: else 14: y l,k ←ŷ l k 15: Create Y U y l,k = {y u,k }

18:
Create projection test and optimality package for y l,k and add them to (MP) 19:

22:
For y u,k create implication of optimality package for y l,k , i.e., add a no-good constraint (3.7) to (MP) 23: k ← k + 1 24: else 25: The original bilevel problem (BLP) is infeasible, terminate 26: Terminate and return the optimal solution Lemma 3.14 For any set Y L of lower-level feasible integer variable configurations and corresponding sets Y U (y l,k ), the master problem (MP) is a relaxation of the original bilevel problem (BLP).
If Y L comprises a complete set of lower-level feasible integer variable configurations while for the corresponding sets Y U (y l,k ) the inclusion Proj (y u ) P lin y l,k \U ∩ Z m Z + ⊆ Y U (y l,k ) holds, the master problem (MP) is equivalent to the original bilevel problem (BLP).
Proof Lemma 2.4 shows the first statement for the master problem which incorporates As Lemma 3.8 and the first part of Remark 3.5 indicate, exact projection test imposes f (x u , y u , x l , y l ) ≥ θ x u , y u , y l,k for a subset of Proj (y u ) P y l,k . Therefore, for any set Y L of lower-level feasible integer variable configurations and corresponding sets Y U (y l,k ), our master problem (MP) with exact projection test is a relaxation of the master problem from Lemma 2.4, and as such a relaxation of (BLP).
The second statement of this lemma is inferred from the second part of Lemma 2.4 and the second part of Remark 3.5. Note that not necessarily allȳ u ∈ Proj (y u ) P lin y l,k \U ∩Z m Z + for a complete set of lower-level feasible integer variable configurations y l,k have to be enumerated in order to construct a single-level reformulation of the (BLP).
Even less no-good constraints may be needed to find a bilevel optimal solution of (BLP), as Algorithm 2 needs to add no-good constraints only forȳ u which form part of an optimal solution of (MP) in some iteration.
To show correctness and finite termination of Algorithm 2 we need the following pendant to Lemma 2.6: Lemma 3.15 As long as the master problem (MP) remains feasible, Algorithm 2 generates some y l,k and y u,k to be added to Y L k and Y U y l,k , respectively, at the end of each iteration k of the while-loop. If thus generated y l,k is already contained in Y L k and y u,k is already contained in Y U y l,k , i.e., the pair y u,k , y l,k has been generated in some previous iteration, Algorithm 2 terminates with a bilevel optimal solution not later than in iteration k + 1.
Proof Let x u, * k , y u, * k , x l, * k , y l, * k be the solution of (MP) in iteration k. As the follower optimality subproblem (FO) is always feasible given HPR-feasible x u, * k , y u, * k , a y u,k = y u, * k and y l,k are always generated in each iteration as long as the master problem (MP) remains feasible. Now we prove the second part of the Lemma. If the bilevel feasibility subproblem (BF) is infeasible in iteration k, then y l,k =ŷ l k is part of the optimal solution of (FO) computed before, and consequently θ x u, * k , y u, * k = θ x u, * k , y u, * k , y l,k holds. If (BF) is feasible in iteration k, then y l,k =ỹ l k is part of its optimal solution and as such also satisfies θ x u, * k , y u, * k = θ x u, * k , y u, * k , y l,k . If thus generated y l,k is already contained in Y L k and y u,k is already contained in Y U y l,k , then the corresponding projection test together with implication and optimality package for y l,k are also already present in the master problem (MP) in iteration k. In particular, the no-good constraint for y u,k , y l,k as a part of projection test implies optimality package for y l,k , i.e., the constraint f (x u, * k , y u, * k , x l, * k , y l, * k ) ≥ θ x u, * k , y u, * k , y l,k is active in (MP) in iteration k. As θ x u, * k , y u, * k = θ x u, * k , y u, * k , y l,k , for the upper-level decision variables x u, * k , y u, * k this optimality package corresponding to y l,k is exactly the (optimal-value-function constraint). Therefore x u, * k , y u, * k , x l, * k , y l, * k is a bilevel feasible solution. Since according to the first part of Lemma 3.14 it is also an optimal solution of a relaxation of the original bilevel problem (BLP) and thus it is a bilevel optimal solution of (BLP).

Theorem 3.16
Algorithm 2 with projection test as described in Sect. 3.1 and implications and optimality packages as described in Sect. 3

.2 either finds a bilevel optimal solution or shows infeasibility of the original problem (BLP) in finitely many iterations.
Proof From Lemma 3.15 we can see that Algorithm 2 has three possible outcomes in each iteration of the while-loop: • Master problem (MP) is infeasible, which by Corollary 2.5 means the infeasibility of the original bilevel problem (BLP). • Some y l,k and y u,k generated in iteration k are already in Y L k and Y U y l,k , respectively, which by the last statement of Lemma 3.15 leads to termination of the algorithm with a bilevel optimal solution. • An y l,k generated in iteration k is not yet in Y L k , or y u,k / ∈ Y U y l,k .
This means that after each iteration the algorithm either terminates or generates a pair y u,k , y l,k that has not been encountered before. Unless the algorithm terminates earlier according to the first two cases listed above, it enumerates all lower-level feasible integer configurations y l,k and, according to Remark 3.5 some of the upper-level feasible integer configurations y u,k , thus constructing a master problem (MP) which is equivalent to the original bilevel problem (BLP) due to to Part 2 of Lemma 3.14. As the number of lower-level feasible integer configurations y l,k as well as the number of HPR-feasible integer configurations y u,k is finite by Assumption 1, Algorithm 2 terminates after finitely many iterations.

Corollary 3.17
An optimum is attained for a feasible bilevel problem satisfying Assumptions 1-4.
Suitable realizations of projection test, implication and optimality package that are necessary to confirm this result for nonlinear follower problems will be given in Sect. 5.
Altogether, projection test, implication and optimality package as described in this section add to the master problem 1 binary and 2s continuous variables as well as 2n R +4 constraints per iteration. Note that this calculation is based on bilevel problems with only inequality constraints on the lower level. In case the lower level has some equality constraints, the growth of the master problem reduces further with application of projection test as described in Sect. 3.1.1. Note that via elimination of x l,k as suggested in this paper, the number of additionally introduced nonlinearities is reduced compared to the realization from [60]. Also, in the current paper 2n R continuous variables and 4n R + 4s − 2 less constraints are added to the master problem per iteration of the algorithm.
However, the number of iterations can be higher due to possibly enumerating a lot of or, in the worst case, all integer points of some sets Proj (y u ) P lin y l,k \U I . If such enumeration occurs, only one constraint is added per additional iteration of the algorithm, namely a no-good constraint for the encountered upper-level integer configuration. Thus, unless their number is very high, the impact of these additional iterations on the size and complexity of the master problem is moderate compared to projection test, implication and optimality package routines for each newly encountered lower-level integer configuration.

Computational results
Given a suitable MINLP solver to be used for the master problem, our implementation is able to handle a MINLP on the upper level and a follower problem which is linear in continuous lower-level variables. To the best knowledge of the authors, no library of bilevel instances exists where on both levels discrete variables and nonlinearities such as products of upperand lower-level variables are present. Therefore, instances of the desired class were created based on the first 10 MIPS instances from [46].
The original instances miblp_20_15_50_0110_10_1 to miblp_20_15_50_0110_10_10 are all-integer with 5 upper-level variables and 10 lower-level variables each. There are 20 lowerlevel and no upper-level constraints in each instance, all constraints as well as both objective functions are linear. Notice that as the main difficulty in solving bilevel problems comes from the structure of the lower level, i.e., partial construction of its optimal-value function, absence of upper-level constraints does not impede representativity of the computations. However, as the ability of the proposed algorithm to detect infeasible instances has to be tested as well, some instances with upper-level constraints were constructed too.
We also included a nonlinear toy example (Example A.2) in the appendix with constraints and integer variables on both levels, where the behavior of our implementation can be clearly retraced and the solution found can easily be verified.
Computational results given in this section comprise altogether 120 instances derived from [46]. In order to obtain mixed-integer nonlinear bilevel problems, each of the original 10 instances was modified as follows: • adding m R ∈ {5, 10, 20} continuous variables to the upper level and redeclaring every second lower-level integer variable as a continuous variable, Regardless of the way original instances have been made mixed-integer, the following bilinear term is added to the upper-level objective function of each instance: with B a matrix comprised of an identity matrix I min{m R ,n R } extended with 0-entries to fit the required dimensions, and ub(x u ) the largest upper bound of all x u variables. The lower-level objective function of each instance receives the above bilinear term with a minus sign. As none of the instances described so far proved to be infeasible, bilevel optimization problems with upper-level constraints were constructed, again based on instances miblp_20_15_50_0110_10_1 to miblp_20_15_50_0110_10_10 from [46]. For each of the  original 10 instances, three instances were produced by shifting all but every second, fifth or tenth lower-level constraint to the upper level. No other modifications were done to obtain these 30 new instances, which therefore remain all-integer and linear on both levels. We used Gurobi 9.0 [26] as MINLP solver, Pyomo 5.6.9 for modeling and CPython 3.7.6 for the implementation of Algorithm 2. Notice that Gurobi allows only products of variables as nonlinearities. To extend Algorithm 2 to more general nonconvex MINLPs, other global nonlinear solvers such as, e.g., SCIP [24] or BARON [47] can be used.
Computations were performed on Xeon E3-1240 v6 CPUs (4 cores, HT disabled, 3.7 GHz base frequency) with 32 GB RAM. Runtimes are stated excluding instance loading from MPS and AUX files as well as their modification, but including big-M calculations. The time limit for each instance is 2 h.
The duality gap measures the maximum relative deviation from optimality of the best feasible solution found. As proposed in [34], we have calculated the gap for a given lower bound LB and upper bound UB by the following formula: if LB · UB ≤ 0 and not LB = UB = 0, The original 10 instances from [46] were solved to optimality, and key characteristics of the runs are listed in Table 1 separately for each instance.
From the 80 MINLP instances all but 5 were solved to optimality, while the remaining 5 instances had a relative optimality gap under 0.35%. Consolidated statistics of the runs on all 80 MINLP instances are given in Table 2 with the corresponding minimum, maximum, arithmetic mean and standard deviation for the runtime. Detailed data for each individual instance is listed separately in Table 6 in Appendix A.3.
From the 30 instances with upper-level constraints 14 were solved to optimality and 12 were proven infeasible. For 3 of the 4 remaining instances no decision on the feasibility could be made, while the last instance was proven feasible but not solved to optimality within the time limit. Consolidated statistics of the runs on these 30 instances are given in Table 3, and the full data for each instance can be found in Table 7 in Appendix A.3.  Each of altogether 120 instances is uniquely identified by the columns instance number, m Z , m R , n Z , n R , r and s of the detailed computational result Tables 1, 6 and 7. The instance number refers to the number of the original instances miblp_20_15_50_0110_10_1 to miblp_20_15_50_0110_10_10 from [46], while m Z , m R , n Z , n R , r and s denote the number of upper-level discrete, upper-level continuous, lower-level discrete and lower-level continuous variables as well as upper-and lower-level constraints, respectively.
From altogether 120 instances, thereof 10 original library ILP, 80 MINLP and 30 ILP with upper-level constraints, only 9 were neither solved to optimality nor proven infeasible within the time limit. In the case of these 9 instances we can observe two possible causes of the algorithm not terminating within the time limit.
First, the number of no-good cuts can be too large, which raises the iteration count to several thousands. This behavior in particular is present in all 5 MINLP instances which were not solved to optimality and the penultimate ILP instance with upper-level constraints. The exact number of no-good cuts for each instance can be inferred by subtracting the number of optimality packages from the total number of iterations listed in Appendix A.3. However, for all 5 instances the algorithm found a solution with a gap less than or equal to 0.35%, which can be considered an acceptable result for such a challenging problem type.
Second, the number of optimality packages can cause the master problem (MP) to become too hard and thus demand too much time to solve. This is the case with the remaining 3 instances, which are all linear on both levels and with upper-level constraints. Two of these instances acquired more than 30 optimality packages, and in one case the last completed master problem solve took over an hour. See Appendix A.3 for details on each instance.
Mean run time of the algorithm for all 120 instances is 10 min 20 s, median run time is less than 10 s. So more than 90% of the test instances were either solved to optimality or proven infeasible by Algorithm 2 within 2 h. Over 95% of the instances were either solved up to a relative optimality gap under 1% or proven infeasible before hitting the time limit of 2 h.

Algorithm extension for nonlinear follower
For the presentation of the results so far we have assumed the follower problem to be linear for all x u , y u . In this section, we describe how to extend Algorithm 2 to the more general setting described by Assumptions 1-4. The pseudo-code description of Algorithm 2 on p. 24 stays exactly the same, but we have to generalize projection test, implication and optimality package for the nonlinear setting. In order to do this, we will use the Wolfe dual for replacing linear programming duality, which also provides us with the required strong duality statements.

Exact projection test for nonlinear follower
The general concept behind our exact projection test as described in Sect. 3.1 remains the same, and the adaptation of (PT) to the nonlinear setting is straightforward: Note, however, that this is now a convex continuous problem-by Assumption 2-instead of a linear program. Its Wolfe dual is given as follows: min x l,k ,μ −μ g(y u , x l,k , y l,k ) + μ ∇ x l g(y u , x l,k , y l,k )x l,k (PT W-dual) s.t. ∇ x l g(y u , x l,k , y l,k ) μ ≥ 0 n R μ 1 s = 1 μ ∈ R s + , where the partial derivative ∇ x l g exists by Assumption 2. Note that the Lagrangian multipliers for the non-negativity conditions for x l,k have been eliminated from the formulation already, as well as the slack variable t k .
Eliminating all primal variables from the Wolfe dual is unfortunately not possible in general, but only for special cases, e.g., problems with linear constraints and strictly convex quadratic objective [41,Example 12.12]. It requires regularity of the partial derivative of the Lagrangian w.r.t. the primal variables, thus enabling usage of the implicit function theorem in order to consider x l,k as a function of μ. Thus, in contrast to (PT dual) for the case of a linear follower problem discussed in Sect. 3.1, primal variable copies are in general needed for (PT W-dual).
The statement of Lemma 3.2 holds also for the nonlinear version of projection test (PT) since the arguments in its proof do not depend on the linearity of the lower-level constraints. In particular, (PT) is feasible for any fixed y l,k , y u , and x l,k . Additionally, Assumption 1 guarantees that its optimum is finite and attained by some (x l,k 0 , t k 0 ). Furthermore, (PT) satisfies Slater's condition in case of feasibility due to Assumption 2. Therefore there exists μ 0 ∈ R s + such that (x l,k 0 , μ 0 ) is optimal for (PT W-dual), i.e., we have strong duality; cf. [56]. Thus we obtain Lemma 3.3 for an accordingly adapted version of (3.4a). Hence, as no-good cuts handling Case 2 from Lemma 3.2 are independent of the lower-level problem class, Remark 3.5 holds for the nonlinear follower problem when (PT W-dual) and the corresponding analogon of (3.4a) are employed. Note that using the Wolfe dual for solving bilevel problems with nonlinear lower level is not limited to our particular algorithm framework. The Wolfe dual can be employed to obtain single-level reformulations of bilevel optimization problems if, e.g., all variables have finite bounds on their respective levels, and the lower level is convex and satisfies Slater's condition. So far the majority of the single-level reformulations of bilevel problems in the literature rely on optimality conditions of the lower level expressed either with strong duality in the linear case or the full KKT system in the nonlinear case. In contrast, it seems that using the Wolfe dual in bilevel optimization has been explored very little.

Projection test with equalities for nonlinear follower
We can implement special handling of equations as discussed in Sect. 3.1.1 also for the more general case. However, the situation is slightly more complicated.
Let g and h denote the functions defining the lower-level inequalities and equations, respectively. Note that by convexity due to Assumption 2, any equations must be linear in the continuous follower variables, so we can write h(y u , x l , y l ) as h(y u , x l , y l ) = h R (y u , y l )x l + h c (y u , y l ) for coefficient functions h R and h c .
We again modify the primal problem (PT) by applying slack variables only to the inequality constraints: The Wolfe dual of this problem is given by min x l,k ,μ,λ − μ g(y u , x l,k , y l,k ) + μ ∇ x l g(y u , x l,k , y l,k )x l,k − λ h c (y u , y l,k ) We again face the problem that (PTEq) could-in contrast to (PT)-be infeasible. We have to show that also in this case, (PTEq W-dual) is feasible and admits a solution with objective value ≤ 0. This would ensure that the strong duality constraint of the form of (3.4b) does not impose any restriction to the master problem if (PTEq) is infeasible. In fact, we will be able to show that (PTEq W-dual) is unbounded in that case, matching Corollary 3.7.

Lemma 5.1 If (PTEq) is infeasible, (PTEq W-dual) is unbounded.
Proof We can show that (PTEq W-dual) is feasible in a way that is completely analogous to the proof of Lemma 3.6, obtaining a feasible solution of the form x l,k ,μ, λ = 0 . However, this does not directly imply the statement since the Wolfe dual can in general have a finite optimum even if the primal is infeasible [56].
To complete the proof, we will find an unbounded ray of (PTEq W-dual). Consider a version of the primal problem without the inequalities g(y u , x l,k , y l,k ) + t k s ≤ 0. It is a linear problem and still infeasible if (PTEq) is. Due to the results in Sect. 3.1.1, its dual min is unbounded. Letλ be an unbounded ray, i.e., −(λ) h c (y u , y l ) < 0 and h R (y u , y l ) λ ≥ 0.
Therefore, we also have the result of Lemma 3.8 for the nonlinear case. Note that we may assumex l,k = 0 in the above proof, which will allow us to reuse the primal variable copies x l,k in the optimality package for y l,k .
Choosing M μ works similar to the situation in Sect. 3. For example, we can choose M μ to be equal to the optimal objective function value of the dual of (PTEq) with inequalities only, combined with the HPR constraint set and variables of (BLP). It still holds that this problem always has a finite optimum and admits a feasible solution for (PTEq dual) with λ = 0.

Implications and optimality packages for nonlinear follower
Assumption 2 ensures that we can still express optimality of the follower's continuous decisions via strong duality. As before, we denote the lower-level dual variables in iteration k by π k . optimality package consists of primal and dual feasibility constraints for the lower level as well as the strong duality equation for fixed integer variables y l,k : f (x u , y u , x l , y l ) ≥ f (x u , y u , x l,k , y l,k ) g(y u , x l,k , y l,k ) ≤ 0 ∇ x l g(y u , x l,k , y l,k ) π k ≥ ∇ x l f (x u , y u , x l,k , y l,k ) ∇ x l f (x u , y u , x l,k , y l,k )x l,k = −π k g(y u , x l,k , y l ) + π k ∇ x l g(y u , x l,k , y l )x l,k x l,k ∈ R n R + , π k ∈ R s + .
Note that we consider the gradient ∇ x l f to be a row vector. This is for reasons of consistency with the Jacobian matrix ∇ x l g, which has gradients as its rows. Recall that our formulation needs to implement the following implications: ψ k = 1 ⇒ optimality package for y l,k is active, i.e., (5.1) is added to (MP) ψ k = 0 ⇒ optimality package for y l,k is inactive, i.e., no additional restrictions are imposed on the master problem. (5.2) In order to do this we again use a big-M formulation, adding (or, respectively, subtracting) a term (1 − ψ k )M π to (from) each inequality in (5.1).
Just as described in Sect. 3.2, a sufficiently large M π can then be found by solving auxiliary problems for the maximal constraint violations under the given bounds for x u , y u , x l , y l , while the package-specific variable copies x l,k and π k are fixed to 0. This way, bounds for the dual variables of the lower level (which in general are not available) are not required for realizing the desired implications.
With similar arguments as in Lemma 3.12, the optimality package can be simplified to explicitly enforcing only dual feasibility and exploiting strong duality. Recall that we were able to eliminate all primal variables from (3.11), which, unfortunately, is not possible in general for (5.3). This, however, does not impede the reduction step to (5.3). One should just be aware that x l,k are not necessarily optimal solutions of the primal follower problem with fixed y l = y l,k . Note that we can use the same primal variable copies x l,k as in (PT W-dual). This is because setting x l,k to any optimal follower response given fixed y l,k and y u will work for both (PT W-dual) and (5.3) at the same time if y u ∈ Proj (y u ) P lin y l,k , i.e., if the optimality package for y l,k is supposed to be active. Otherwise, x l,k = 0 is always possible in both problems without imposing any relevant implications.

Remark 5.2
Note that the KKT-based tightening that was proposed in [38] and also used in [61] can be incorporated into the algorithm presented in the current paper too. Indeed, any suitable necessary optimality conditions for the lower-level problem can be added to the master problem (MP) in order to obtain a tighter relaxation of the original bilevel problem.

Conclusions
In this work, we proposed an exact algorithm for solving problems from the challenging class of mixed-integer nonlinear bilevel optimization problems where both leader and follower integer variables are present on both levels. Our method is based on recent work of Yue, Gao, Zeng and You [60], following the same projection-based scheme described in Algorithm 1. We turned it into an exact method under an additional Assumption 4, which bans continuous upper-level variables from lower-level constraints. In conjunction with the other assumptions it therefore guarantees that a bilevel optimum is attained if the problem is feasible in the first place-a fact for which our algorithm also contributes a constructive proof. Assumption 4 is relatively mild compared to other assumptions with the above-mentioned effect that are commonly made in the literature. The key enhancement of our algorithm is to separately realize implication (3.2) for an open subset U of the relevant projections, chosen to be as large as possible, and the remaining boundary cases-as outlined on p. 14. Furthermore, we extend the algorithm from [60] from a purely linear setting to a more general bilevel problem class allowing nonlinearities. The limiting requirements are given by Assumptions 2-3, which essentially ensure that optimality of the continuous follower decisions can be expressed via strong duality, and that the HPR with these optimality conditions can be handled by an off-the-shelf solver. The nonlinear version of our method as described in Sect. 5 may be particularly attractive if primal variable copies can be eliminated from the Wolfe dual, though this is not a strict requirement for using it.
Proof-of-concept computational results have been presented for the case in which the lower level is linear in the follower variables, but products of lower-and upper-level variables may be present in the objective functions of both levels. Therefore our implementation covers a problem class for which currently no solver exists to the best of the authors' knowledge. Our method is able to solve many bilevel library instances that have been modified to also include continuous variables and nonlinear terms on both levels (unfortunately, no established library exists for this problem class yet). Still, there are clear limitations in terms of instance size, which is not surprising given the extremely challenging problem class. We think that the experiments are quite encouraging, especially considering the number of optimality packages that still allow the master problem to be solved. It is important to note that our framework relies on established MINLP solvers, so any performance improvements in the underlying MINLP solver will also automatically further benefit our approach. However, the master problem growing due to additional optimality packages is not the only limiting factor. We also observed instances for which optimum solutions of the master problem regularly ended up being boundary cases, i.e., in the relevant projection but not in U . Hence, they required a large number of no-good cuts, showing that our modification for exactness in general comes at a price. Moreover, this observation highlights the importance of avoiding such situations and consequently of our equation-handling modifications from Sects. 3.1.1 and 5.2 for solving problems with equations on the lower level.
In future work, further performance gains might be achieved using cutting planes specifically designed for the structure of the master problem as it evolves during the solution process and/or by warmstarting. Perturbation of the follower problem could increase the chance for the master problem solution being in U . However, problem-specific knowledge will be necessary in order not to run into the very same problem that is illustrated in Example 2.9. In preliminary computations, we have observed that having good bounds for the dual variables can help the solver immensely. While such bounds cannot be given in general, dual variables often have a nice interpretation in applications, which might allow for a suitable estimation.
article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix A.1: Example of an unfavorable instance
Example A. 1 We solve the following bilevel problem employing Algorithm 1, to show that sometimes all HPR-feasible y l assignments are encountered: For a graphical illustration we refer to Fig. 1, while relevant details of each iteration are listed in Table 4.
The relation between leader and follower is quite adversarial since the product term x u x l appears with different sign in the two objective functions. The leader controls two binary variables and one continuous variable. The latter is implicitly integer due to the single master constraint and indirectly enters the right-hand side of some follower constraints via this representation. There is one continuous and one integer follower variable.   Table 5 summarizes the steps of Algorithm 2. As there are only four different possible upper-level solutions, i.e. (x u , y u 1 , y u 2 ) ∈ {(0, 0, 0), (1, 1, 0), (2, 0, 1), (3, 1, 1)}, optimality of the solution found with Algorithm 2 can easily be verified.
However, setting ψ 0, * = 0 was still possible in the master problem. We see at closer inspection that (0, 1) is not in U , but represents a boundary case. We thus add a corresponding no-good cut ensuring (y u 1 , y u 2 ) = (0, 1) ⇒ ψ 0 = 1 . Bounds are updated to UB = 20 and LB = 4. Iteration 2 Solving (MP) yields a solution with (x u, * 2 , y u 1 , * 2 , y u 2 , * 2 , x l, * 2 , y l, * 2 ) = (1, 1, 0, 8, 1) and ψ 0, * = 0. It has objective value 8, which is the new upper bound. This time, y l = 4 is indeed impossible for the follower. Instead, (FO) has the optimal solution (x l , y l ) = (3, 3), again confirmed by (BF). The corresponding bilevel-feasible solution assigns objective values of 3 to the leader and 27 for the follower, respectively. Since the former is worse than the incumbent value of 4 for LB, there is no update on the lower bound.