On a Computationally Ill-Behaved Bilevel Problem with a Continuous and Nonconvex Lower Level

It is well known that bilevel optimization problems are hard to solve both in theory and practice. In this paper, we highlight a further computational difficulty when it comes to solving bilevel problems with continuous but nonconvex lower levels. Even if the lower-level problem is solved to ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-feasibility regarding its nonlinear constraints for an arbitrarily small but positive ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}, the obtained bilevel solution as well as its objective value may be arbitrarily far away from the actual bilevel solution and its actual objective value. This result even holds for bilevel problems for which the nonconvex lower level is uniquely solvable, for which the strict complementarity condition holds, for which the feasible set is convex, and for which Slater’s constraint qualification is satisfied for all feasible upper-level decisions. Since the consideration of ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-feasibility cannot be avoided when solving nonconvex problems to global optimality, our result shows that computational bilevel optimization with continuous and nonconvex lower levels needs to be done with great care. Finally, we illustrate that the nonlinearities in the lower level are the key reason for the observed bad behavior by showing that linear bilevel problems behave much better at least on the level of feasible solutions.


Introduction
Bilevel optimization problems are known to be notoriously hard to solve and this holds true both in theory and in practice.In theory, bilevel problems are strongly NP-hard even if all objective functions and constraints are linear and all variables are continuous; see Hansen et al. (1992).This, of course, is also reflected in computational practice since linear bilevel problems are inherently nonsmooth and nonconvex problems.Moreover, the single-level reformulations used to solve linear bilevel problems in practice are nonconvex, complementarity-constrained problems.Their linearization requires big-M parameters that are hard to obtain in general (Kleinert et al. 2020) and that often lead to numerically badly posed problems, which are hard to tackle even for state-of-the-art commercial solvers.
In the last years, algorithmic research on bilevel optimization focused on more and more complicated lower-level problems such as mixed-integer linear models (Fischetti et al. 2017(Fischetti et al. , 2018)), nonlinear but still convex models (Kleinert et al. 2021a), or problems in the lower level with uncertain data (Beck et al. 2023;Buchheim and Henke 2022;Burtscheidt and Claus 2020).When it comes to the situation of a lowerlevel problem with continuous nonlinearities there is not too much literature-in particular in comparison to the case in which the lower-level problem is convex; see, e.g., Kleniati and Adjiman (2011, 2014a,b, 2015), Mitsos (2010), Mitsos et al. (2008), Paulavičius and Adjiman (2020), Paulavičius et al. (2020), and Paulavičius et al. (2016).Due to the brevity of this article, we do not go into the details of the literature but refer to the seminal book by Dempe (2002) as well as the recent survey by Kleinert et al. (2021b) for further discussions of the relevant literature.
There is one important difference when crossing the border from (mixed-integer) convex to (mixed-integer) nonconvex lower-level problems.The lower-level problem can, in general, not be solved to global optimality anymore in an exact sense in finite time since we need to exploit techniques such as spatial branching to tackle nonconvexities.These techniques only lead to finite algorithms for prescribed and strictly positive feasibility tolerances; see, e.g., Locatelli and Schoen (2013) for more detailed discussions.Note that this is in clear contrast to, e.g., linear optimization.Here, the simplex method is an exact method in the sense that, if applied using exact arithmetic, the method computes a global optimal solution without any errors; see, e.g., Applegate et al. (2007).The same applies to simplex-based branch-andbound methods for solving mixed-integer linear optimization problems.Algorithms as such are not available for continuous but nonconvex problems.This means that, just due to algorithmic necessities, we cannot expect to get exact feasible solutions of the lower-level problem anymore when doing computations for continuous but nonconvex lower-level problems.Instead, we have to deal with ε-feasible solutionsat least for the nonlinear constraints of the lower-level problem.
The aim of this paper is to present an exemplary bilevel optimization problem with continuous variables and a nonconvex lower-level problem, where the latter algorithmic aspect leads to the following severe issue: Even if the feasibility tolerance for the lower level can be made extremely small, the exact bilevel solution can be arbitrarily far away from the bilevel solution that one obtains for ε-feasibility in the lower level, which in particular can be superoptimal out of proportion to ε.The same is true for the optimal objective function values.The main idea for the construction of this exemplary bilevel problem is based on a constraint set presented first by Bienstock et al. (2021).We explicitly note here that this construction does not make use of large constraint coefficient ranges (all coefficients are 1) or arbitrarily large degrees of polynomials (we only use quadratic or linear terms).Moreover, when considered in an exact sense, (i) the example's lower-level problem is uniquely solvable, (ii) strict complementarity holds, (iii) its convex constraint set satisfies Slater's constraint qualification for all feasible upperlevel decisions, (iv) the upper level does not contain coupling constraints, and (v) the overall problem has a unique global solution as well.Thus, the bilevel program does not look like a badly-modeled problem but is shown to behave very badly in a computational sense, i.e., if only ε-feasible points for the nonlinear constraints of the lower-level problem can be considered.We also show that the observed pathological behavior arises due to the nonlinearities as we show that linear bilevel problems behave much better at least on the level of feasible points.
The example is presented in Section 2 and discussed in an exact sense in Section 3. Afterward, the example is analyzed in Section 4 for the case of ε-feasibility of nonlinear constraints.Section 5 presents an analysis of the linear bilevel case.Our final conclusions are drawn in Section 6.

Problem Statement
Let us consider the bilevel problem where x, x ∈ R 2 with 1 ≤ x i < xi , i ∈ {1, 2}, denote lower and upper bounds on the variables x.Here, S(x) is the set of optimal solutions of the x-parameterized problem We Problem ( 1) is a linear problem in both the leader's and the follower's variables.The only constraints that occur in this problem are variable bounds for the leader's variables x.In particular, there are no upper-level constraints that explicitly depend on the follower's variables y, i.e., there are no coupling constraints.
Moreover, the feasible set of the lower-level problem is bounded due to the following.From (2d) and (2b), we obtain 0 ≤ y 1 ≤ 1/2 as well as 0 ≤ y n ≤ 1/2 for any feasible follower's decision y.Using Constraints (2c), we further obtain 0 ≤ y i ≤ 1 for all i ∈ {1, . . ., n}.Finally, we have 0 ≤ y n+1 ≤ x1 as well as −x 2 ≤ y n+2 ≤ x2 by (2e) and (2f) because the leader's variables x are bounded.Since all finitely many lower-level constraints are continuous, the feasible set of the follower's problem is compact.In addition to the compactness, the feasible set of the lower-level problem (2) is non-empty for every feasible leader's decision For instance, the point , and y n+2 = 0 is strictly feasible w.r.t. the inequality constraints (2d), (2e) as well as (2f) and feasible w.r.t. the equality constraint (2b).Here, we exploit the assumption that 1 ≤ x 1 , x 2 holds to obtain strict feasibility w.r.t. the variable bounds in (2e) and (2f).Moreover, y is also strictly feasible w.r.t. the inequality constraints (2c) due to the following.For all i ∈ {1, . . ., n − 2}, we have Furthermore, we have In particular, this means that the problem satisfies Slater's constraint qualification.Moreover, the gradient of the single equality constraint (2b) is not the null vector.Hence, the Mangasarian-Fromovitz constraint qualification (MFCQ) is also satisfied at every feasible decision of the follower.Let us further point out that all lower-level constraints are linear except for the quadratic but convex inequality constraints in (2c).Therefore, the feasible set of the lower-level problem (2) is convex.Nevertheless, the overall lower-level problem is nonconvex since the follower's objective function contains bilinear terms.Before we solve the bilevel problem (1) and (2) in the following sections, let us briefly summarize the nice properties of the problem.The upper-level problem is linear and does not contain coupling constraints.The feasible set of the lower-level problem is convex and compact.For every feasible leader's decision, the lower-level problem further satisfies Slater's constraint qualification and the MFCQ is satisfied for every feasible follower's decision.

Exact Feasibility
In this section, we determine the unique exact solution of the bilevel problem (1) and (2).To this end, we start by solving the lower-level problem (2) analytically for an arbitrary but fixed feasible leader's decision First, we note that any feasible follower's decision y satisfies y n > 0. The reasons are as follows.Let us contrarily assume that y n = 0 holds.Then, Constraint (2b) yields y 1 = 1/2.From y n = 0 and (2c), it follows that y i = 0 holds for all i ∈ {1, . . ., n}, which contradicts y 1 = 1/2.Consequently, y n > 0 holds.For later reference, let us briefly summarize the previous observation.

Result 1. For every feasible leader's decision
The equality constraint (2b) thus yields y 1 < 1/2.From ( 2e) and (2f), we additionally obtain In particular, the latter term is minimized for (y n+1 , y n+2 ) = (x 1 , x 2 ).Therefore, the lower-level objective function value can be bounded from above by It is thus evident that an optimal follower's decision y * satisfies (y * n+1 , y * n+2 ) = (x 1 , x 2 ).Here, we can fix (y * n+1 , y * n+2 ) since these variables are subject to simple bound constraints and, in particular, they are not coupled to the other variables of the follower.Hence, the follower's problem can be reduced to the convex problem As shown above, Problem (3) satisfies Slater's constraint qualification.1 Again as shown above, the feasible set is compact.Therefore, Problem (3) has an optimal solution y * .Because of the equality constraint (3b), the lower-level objective function value y * 1 is maximized by minimizing y * n .From Constraints (3c) and the optimality of y * , we obtain where y * 1 denotes the root of the function In particular, one can show that y * 1 is the unique root of (4).The function h is continuous and strictly increasing on [0, 1/2].Moreover, we have h(0) < 0 and h(1/2) > 0. Consequently, there is a unique point y * 1 ∈ (0, 1/2) such that h(y * 1 ) = 0 holds.Furthermore, the follower's decision y * is the unique solution of Problem (3).To see this, let us assume that there is another feasible follower's decision ŷ = y * for which the optimal objective function value y * 1 is obtained, i.e., ŷ1 = y * 1 .Then, there must be at least one quadratic inequality constraint in (3c) that is not satisfied with equality for ŷ.Otherwise, we have y * = ŷ.However, if there is slack in Constraints (3c), we obtain ŷn > y * n .Then, (3b) yields which is a contradiction to the optimality of ŷ.
Result 2. For every feasible leader's decision the set of optimal solutions of the lower-level problem (2) is a singleton.
In particular, Result 2 means that there is no need to distinguish between the optimistic and the pessimistic approach to bilevel optimization; see, e.g., Dempe (2002).Thus, we can finally determine an optimal leader's decision for the overall bilevel problem (1) and (2).As (y * n+1 , y * n+2 ) = (x 1 , x 2 ) holds in the optimal follower's decision y * , the leader actually solves the linear problem The unique optimal solution is given by x * = (x 1 , x2 ).
To sum up, the bilevel problem (1) and ( 2) not only has nice properties such as a convex and bounded lower-level feasible set as well as a lower-level problem that satisfies Slater's constraint qualification, but also has a unique optimal solution.Moreover, the strict complementarity condition holds for which we give a proof in Appendix B. Overall, the bilevel problem (1) and ( 2) is thus well-behaved.

ε-Feasibility
In what follows, we determine an optimal solution of the bilevel problem (1) and ( 2) under the assumption that we allow for small violations of the nonlinear lower-level constraints according to the following notion, which is motivated by the necessary special treatment of nonlinear (and, in particular, nonconvex) constraints in global optimization as we discussed it in the introduction.

A follower's decision of the form
are (exactly) satisfied, whereas only the constraint there is an ε-feasible follower's decision y with y n = 0 for every feasible leader's decision It can easily be seen that by increasing n, we can obtain arbitrarily small values for ε.In particular, there is no ε-feasible follower's decision that yields a better objective function value than 1/2.The reasons are as follows.Using the equality constraint (2b), the lower-level objective function can be re-written as For all ε-feasible follower's decisions, we have because of the linear constraints (2e) and (2f).Consequently, a lower-level objective function value larger than 1/2 could only be obtained if y n < 0. However, this is not ε-feasible w.r.t. the variable bounds (2d).In addition, a follower's decision of the form stated in ( 5) is thus an ε-feasible solution of the lower-level problem (2).
Let us point out that, in contrast to the exact case, the follower's variables y n+1 and y n+2 do not affect the lower-level objective function value in this setting and can thus be chosen arbitrarily.Therefore, the set of ε-feasible follower's solutions is not a singleton anymore.
Result 5.If ε ≥ 2 −2 n−1 , the set of ε-feasible follower's solutions is not a singleton for every feasible leader's decision Due to Result 5, we need to distinguish between optimistic and pessimistic solutions.Following the optimistic approach, the follower chooses y n+1 = 0 as well as y n+2 = x 2 such as to favor the leader w.r.t. the leader's objective function value.Therefore, the leader actually solves the linear problem The optimistic optimal leader's decision is thus given by x * = (x 1 , x2 ).In the pessimistic case, the follower chooses y n+1 = x 1 as well as y n+2 = −x 2 such as to adversely affect the leader's decision.In this setting, the leader solves the linear problem Hence, the pessimistic optimal leader's decision is given by x * = (x 1 , x 2 ).To sum up, let us state the main observations of this section.
Result 6.Let ε ≥ 2 −2 n−1 and suppose that we allow for ε-feasible follower's solutions.Then, the optimistic optimal solution of the bilevel problem (1) and (2) is given by x * o = (x 1 , x2 ) with an optimal objective function value of The pessimistic optimal solution is given by x * p = (x 1 , x 2 ) with an optimal objective function value of F * p = −x 1 − x 2 .We now finally compare the results of the exact bilevel solution with the results for the optimistic and pessimistic setting for the case of only ε-feasibility of the lower level.In the optimistic setting, the distance between the solutions is x1 − x 1 and the difference between the corresponding objective function values is x1 + x 1 .Two aspects are remarkable.First, by enlarging the feasible interval for the variable x 1 , we get an arbitrarily large error and, second, this error is independent of ε, i.e., this arbitrarily large error occurs independent of how accurate one solves the lower-level problem.
For the pessimistic setting, the distance between the solution is x2 − x 2 and the difference between the objective function values is x2 + x 2 .Hence, we obtain the same qualitative behavior but now in dependence of the variable x 2 instead of x 1 .
In summary, we obtain the following two main observations.First, we can be arbitrarily far away from the overall exact bilevel solution.Second, we also obtain arbitrarily large errors regarding the optimal objective function value of the leader.The latter is very much in contrast to the situation in single-level optimization for which sensitivity results are available; see, e.g., Proposition 4.2.2 in Bertsekas (2016).This is particularly the case for linear optimization problems, where standard sensitivity analysis results (see, e.g., Theorem 5.5 in Chvátal (1983)) apply as well and state that a small change in the right-hand side of the problem's constraints can only lead to a small change in the optimal objective function value.
Lastly, let us comment on that only very moderate values of n are required to get the wrong solution.Taking the inequality for ε from Result 6, it is easy to see that for a given tolerance ε, the parameter n needs to satisfy n ≥ log 2 (log 2 (1/ε 2 )) so that numerically computed solutions do not coincide with the exact solution for the given ε.For instance, a tolerance of ε = 10 −8 already leads to a wrong result for n = 6.This particularly means that the considered bilevel problem is moderate in size w.r.t. the number of constraints and variables.For n = 6, we only have 16 constraints and 8 variables on the lower level.We further note that the used constraint coefficients are all 1 and that the coefficients are independent from n and the given tolerance ε.
A Python code for the example considered in this paper is publicly available at https://github.com/m-schmidt-math-opt/ill-behaved-bilevel-exampleand can be used to verify the discussed results.

Analysis of the ε-Feasible Linear Case
In this section, we analyze the linear bilevel case, i.e., we study the problem and b ∈ R ℓ .We assume that the set {(x, y) ∈ R nx × R ny : Ax ≥ a, Cx + Dy ≥ b} is non-empty and compact and that for every feasible upper-level decision x, there exists a feasible lower-level decision y.This implies that the lower-level problem is bounded for every feasible upper-level decision and that the dual problem of the lower level is feasible.We also assume that the set {x ∈ R nx : Ax ≥ a} is bounded.Moreover, we consider the setting in which the underlying linear algebra and linear optimization routines are of finite precision only.
When finite-precision procedures are used, an algorithm that solves Problem (6) will output a pair (x, ŷ) that may be slightly infeasible.The concern, should that happen, is that the solution being output can be superoptimal to a degree that is not proportional to its infeasibility.As discussed in the previous section, such an outcome can be observed for general, i.e., nonlinear, bilevel problems.In this section, however, we show that linear bilevel problems behave better in some sense.To this end, we assume that our underlying solver can ensure the following properties: Here and in what follows, 0 < ε < 1 is a given tolerance, e k ∈ R k is the vector of all ones, and (x, ŷ) is used to denote a nearly feasible solution of the bilevel problem (6).
Prior to our analysis, we present a general result that will be used below.This result can be read from Theorem 3.38 (Page 112) of Conforti et al. (2014).It can also be obtained from Corollary 3.2b (Page 20) of Schrijver (1986) or from Theorem 10.2 (Page 121) of Schrijver (1986).We will use the term size to refer to the (bit) encoding length of a matrix, vector, or formulation, as appropriate.
Furthermore, if in addition z ≥ 0, we say that z is basic feasible.
Remark 1.Let P be as in Definition 2. The extreme points of P are precisely the vectors z that are basic feasible.
There is a constant κ(H) > 0 of size polynomial in the size (of the bit-encoding) of H such that, for any basic vector v, we have Proof.Let v be basic and set J = {j : v j = 0}.Since v is basic, there is a subset of rows I of H with |I| = |J| such that the following holds: (i) The submatrix H I,J of H indexed by rows I and columns J is invertible.
(ii) As a consequence, it holds v J = H −1 I,J h I , where v J is the subvector of v indexed by J and h I is the subvector of h indexed by I. Using submultiplicativity of the norm, we get The result now follows by defining κ(H) to be the maximum over all B −1 ∞ for B being an invertible submatrix of H.
In Theorem 1, we use what is usually termed the standard representation of a polyhedron.Similar statements can be derived using other representations of polyhedra, e.g., {x ∈ R n : Hx ≤ h}, via well-known reformulations.5.1.Linear Optimization with Errors.We start with some simple observations for classic, i.e., single-level, linear problems of the form , and f ∈ R m .Throughout this section, we assume that the feasible region for problem ( 7) is non-empty and bounded.Moreover, we denote the corresponding dual problem by Next, we will derive estimates involving near-feasible and near-optimal points for Problem (7).
Lemma 1. Suppose that there is a point x ∈ R nx that is nearly feasible for Problem (7), i.e., M x ≥ f − εe m .Then, the following holds.
holds for a certain constant κ 1 (M ) > 0 of polynomial size. Proof.
(a) Let z * be an optimal solution of the dual problem of ( 7).Then, In particular, this equation applies to any dual optimal z * .Since z ≥ 0 is a constraint of the dual problem, the dual feasible region is a pointed polyhedron, and, w.l.o.g., z * is an extreme point.The result now follows from Theorem 1.(b) Consider the linear optimization problem That this is indeed a linear program follows by reformulating (8b) as Clearly, the resulting problem is both feasible and bounded since (7) is.Moreover, (x, 0) satisfies the constraints of this problem with additive error of at most ε.We can therefore apply (a) to this problem to obtain x * being feasible for Problem (7) and such that holds, where κ 1 (M ) is the κ-constant (of polynomial size in M ) that applies to the matrix for Constraints ( 9) and (8c).
Let us emphasize that the result in Lemma 1 applies for any ε > 0, no matter how large.In particular, it is not required that ε is "sufficiently small".
Lemma 2. Suppose that there is a nearly primal-dual feasible and nearly primaldual optimal pair (x, ẑ) ∈ R nx × R m for Problem (7), i.e., (x, ẑ) satisfies Then, there exists an optimal solution x * for Problem (7) such that holds for a certain constant κ 3 (M, v) > 0, whose size is polynomial in the size of the input data M and v.
Proof.First, we note that Condition (ii) simply states that ẑ is feasible for the dual of (7) up to an error of ε.We can thus apply Part (a) of Lemma 1 to obtain where κ 2 (M ) is the κ-constant for the dual of ( 7), which is of polynomial size in M .Together with (iii), this implies Next, we consider the polyhedron given by which is feasible and bounded.By (i) and ( 11), x satisfies these inequalities with feasibility error of at most ε(1 . By applying Part (b) of Lemma 1, we obtain that there is a feasible point x * for (12), i.e., an optimal solution x * for (7), such that 5.2.Application to Linear Bilevel Problems.We now return to the bilevel setup as stated in (6).To this end, note that for a given upper-level decision x ∈ R nx , the dual of the lower-level problem (6c) reads Then, there exists an optimal solution y * for the x-parameterized lower-level problem (6c) such that y * − ŷ ∞ ≤ εκ 4 (A, C, D, a, b, d) (14) holds for a constant κ 4 (A, C, D, a, b, d) > 0, whose size is polynomial in the size of the input data A, C, D, a, b, and d.

Proof.
By assumption, for a given x, the lower-level problem is feasible and bounded.We can thus apply Lemma 2 since Conditions (ii)-(iv) correspond to Conditions (i)-(iii) of Lemma 2. Thus, there exists an optimal point y * for the lower-level problem such that holds.Using the triangle inequality and the submultiplicativity of the norm, we obtain Since the feasible region for the upper-level problem is bounded, x ∞ is upper bounded by the ∞-norm of some extreme point.We can apply Theorem 1 to obtain x ∞ ≤ κ ′ (A) a ∞ , where κ ′ (A) is the κ-constant (of polynomial size in A) for the system Ax ≥ a.The proof is now concluded by appropriately defining κ 4 (A, C, D, a, b, d).Note that z is dual basic and feasible (i.e., z ≥ 0) if and only if z is an extreme point of the dual polyhedron to any lower-level problem.
for some dual basic z.Then, there exists a pair (x * , y * ) that is feasible for the bilevel problem (6) such that where κ 7 (A, C, D) is the κ-constant for System (15).Next, we use (iv) and obtain by definition of the infinity-norm of a matrix.Hence, using (17) we obtain These facts, together with ( 16) and ( 18), yield with κ 8 (A, C, D, d) ≥ 1 being appropriately defined and of polynomial size.To sum up, x * , y ′ , and ẑ satisfy (a) Ax * ≥ a, (b) Thus, by Lemma 3 applied to the error εκ 8 (A, C, D, d) ≥ ε, there exists an optimal solution y * for the x * -parameterized lower-level problem (6c) such that holds.Using this inequality and ( 16) concludes the proof.
Remark 2. Assumption (v) in Theorem 2 states that the distance between a nearly feasible and nearly optimal solution for the dual of the lower-level problem and a basic solution for the dual is small, which is a reasonable assumption in our setting.
To summarize the statement of the theorem, the distance to feasibility and the superoptimality of a nearly feasible pair (x, ŷ) for the bilevel problem ( 6) is linear in ε with coefficients κ that have polynomial size in the input data.This type of guarantee with polynomial sized coefficients is simply unavailable in the nonlinear case as we have seen in the previous sections.

Conclusion
In this paper, we consider an exemplary bilevel problem with continuous variables and a nonconvex lower-level problem and illustrate that numerically obtained solutions can be arbitrarily far away from an exact solution.The discrepancy between exact and numerically computed solutions is based on the fact that we cannot exactly satisfy all constraints of the nonconvex lower level when using global optimization techniques such as spatial branching.The considered problem itself is well-posed in the sense that we do not use large constraint coefficient ranges or highdegree polynomials.Moreover, we show that the constraint set of the lower-level problem is convex, compact, and that it satisfies Slater's constraint qualification.In an exact sense, we prove that the lower-level problem as well as the overall bilevel problem possess unique solutions.It is further established that LICQ holds in every follower's solution for every feasible leader's decision.While working computationally, however, we can only expect to obtain ε-feasible solutions of the nonconvex lower-level problem.Furthermore, the set of ε-feasible follower solutions is not a singleton anymore.Thus, we determine both an optimal solution for the optimistic and the pessimistic variant of the bilevel problem.By doing so, we establish that not only the obtained ε-feasible bilevel solutions can be arbitrarily far away from the overall exact bilevel solution but that there can also be an arbitrarily large error in the objective function value of the leader.
We also show that the pathological behavior observed for nonlinear lower-level problems seems to be due to the nonlinearities by showing that linear bilevel problems behave better at least on the level of feasible points.As an important question for future research, it is still open if one can prove that the bad behavior can also not appear for more general problems than linear ones, such as convex problems, in the lower level.
Finally, our results show that computational bilevel optimization with continuous but nonconvex lower levels needs to be done with great care and that ex-post checks may be needed to avoid considering arbitrarily bad points as "solutions" of the given bilevel problem.
, be arbitrary but fixed.Further, let y * be the exact optimal solution of the follower for the given leader's decision x.As shown in Section 3, a follower's solution y * satisfies y * i > 0 for all i ∈ {1, . . ., n+2}.This means that the non-negativity constraints (2d) as well as the lower bound constraints in (2e) and (2f) are inactive in an optimal follower's decision.Conversely, all quadratic constraints (2c) as well as the upper bound constraints in (2e) and (2f) are active.Hence, the Jacobian matrix of the single equality constraint and the active inequality constraints in an optimal decision of the follower is given by All matrix entries that are left blank here correspond to zeros.It is easy to verify that the Jacobian matrix has full rank, i.e., the linear independence constraint qualification holds.with the Lagrange multipliers α ∈ R n−1 ≥0 , β ∈ R n+1 ≥0 , γ, δ ± ∈ R ≥0 , and π ∈ R. The KKT complementarity conditions of Problem (2) are given by α i y i+1 − y 2 i = 0, i ∈ {1, . . ., n − 1} , (20a) δ − (y n+2 + x 2 ) = 0, (20d) Let y * be the exact optimal solution of the follower for the given leader's decision x.
Now, we consider the entire bilevel problem (6) and recall a basic definition from linear optimization.Definition 3. Let z ∈ R ℓ satisfy D ⊤ z = d and define B = B(z) = {j : z j = 0}.We say that z is dual basic if the submatrix D ⊤ B of D ⊤ corresponding to the columns in B has rank |B|.
a, b, c, d) hold for certain constants κ 5 (A, C, D, a, b, d) and κ 6 (A, C, D, a, b, c, d) > 0, whose sizes are polynomial in the size of the input data.Proof.By Assumptions (i) and (ii), the pair (x, ŷ) ∈ R nx × R ny is nearly feasible for the upper-and the lower-level problem of (6).Applying Part (b) of Lemma 1 to (x, ŷ) and the system A * , y ′ ) with Ax * ≥ a, Dy ′ ≥ b − Cx * , and
refer to Problem (1) as the upper-level (or the leader's) problem and to Prob- Bienstock et al. (2021)vel (or the follower's) problem.Let us point out that the lower-level constraints (2b) and (2c) together with y 1 ≥ 0 have already been considered inBienstock et al. (2021)in the context of approximately feasible solutions for single-level optimization problems.Let us further emphasize that the number of variables and constraints of the lower-level problem is linear in n.