1 Introduction

Bilevel optimization problems are known to be notoriously hard to solve and this holds true both in theory and in practice. In theory, bilevel problems are strongly NP-hard even if all objective functions and constraints are linear and all variables are continuous; see [12]. This, of course, is also reflected in computational practice since linear bilevel problems are inherently nonsmooth and nonconvex problems. Moreover, the single-level reformulations used to solve linear bilevel problems in practice are nonconvex, complementarity-constrained problems. Their linearization requires big-M parameters that are hard to obtain in general [15] and that often lead to numerically badly posed problems, which are hard to tackle even for state-of-the-art commercial solvers.

In the last years, algorithmic research on bilevel optimization focused on more and more complicated lower-level problems such as mixed-integer linear models [10, 11], nonlinear but still convex models [13], or problems in the lower level with uncertain data [2, 5, 6]. When it comes to the situation of a lower-level problem with continuous nonlinearities, there is not too much literature—in particular in comparison with the case in which the lower-level problem is convex; see, e.g., [16,17,18,19, 21,22,23,24,25]. Due to the brevity of this article, we do not go into the details of the literature but refer to the seminal book [9] as well as the recent survey [14] for further discussions of the relevant literature.

There is one important difference when crossing the border from (mixed-integer) convex to (mixed-integer) nonconvex lower-level problems. The lower-level problem can, in general, not be solved to global optimality anymore in an exact sense in finite time since we need to exploit techniques such as spatial branching to tackle nonconvexities. These techniques only lead to finite algorithms for prescribed and strictly positive feasibility tolerances; see, e.g., [20] for more detailed discussions. Note that this is in clear contrast to, e.g., linear optimization. Here, the simplex method is an exact method in the sense that, if applied using exact arithmetic, the method computes a global optimal solution without any errors; see, e.g., [1]. The same applies to simplex-based branch-and-bound methods for solving mixed-integer linear optimization problems. Algorithms as such are not available for continuous but nonconvex problems. This means that, just due to algorithmic necessities, we cannot expect to get exact feasible solutions of the lower-level problem anymore when doing computations for continuous but nonconvex lower-level problems. Instead, we have to deal with \(\varepsilon \)-feasible solutions—at least for the nonlinear constraints of the lower-level problem.

The aim of this paper is to present an exemplary bilevel optimization problem with continuous variables and a nonconvex lower-level problem, where the latter algorithmic aspect leads to the following severe issue:

Even if the feasibility tolerance for the lower level can be made extremely small, the exact bilevel solution can be arbitrarily far away from the bilevel solution that one obtains for \(\varepsilon \)-feasibility in the lower level, which in particular can be superoptimal out of proportion to \(\varepsilon \). The same is true for the optimal objective function values.

The main idea for the construction of this exemplary bilevel problem is based on a constraint set presented first in [4]. We explicitly note here that this construction does not make use of large constraint coefficient ranges (all coefficients are 1) or arbitrarily large degrees of polynomials (we only use quadratic or linear terms). Moreover, when considered in an exact sense, (i) the example’s lower-level problem is uniquely solvable, (ii) strict complementarity holds, (iii) its convex constraint set satisfies Slater’s constraint qualification for all feasible upper-level decisions, (iv) the upper level does not contain coupling constraints, and (v) the overall problem has a unique global solution as well. Thus, the bilevel program does not look like a badly modeled problem but is shown to behave very badly in a computational sense, i.e., if only \(\varepsilon \)-feasible points for the nonlinear constraints of the lower-level problem can be considered. We also show that the observed pathological behavior arises due to the nonlinearities as we show that linear bilevel problems behave much better at least on the level of feasible points.

The example is presented in Sect. 2 and discussed in an exact sense in Sect. 3. Afterward, the example is analyzed in Sect. 4 for the case of \(\varepsilon \)-feasibility of nonlinear constraints. Section 5 presents an analysis of the linear bilevel case. Our final conclusions are drawn in Sect. 6.

2 Problem Statement

Let us consider the bilevel problem

$$\begin{aligned} \max _{x \in \mathbb {R}^2} \quad&F(x,y) = x_1 - 2y_{n+1} + y_{n+2} \end{aligned}$$
(1a)
$$\begin{aligned} \text {s.t.}\quad&(x_1,x_2) \in [{{x}}_1,{\bar{x}}_1] \times [{{x}}_2,{\bar{x}}_2], \end{aligned}$$
(1b)
$$\begin{aligned}&y \in S(x), \end{aligned}$$
(1c)

where \({{x}},\, {\bar{x}} \in \mathbb {R}^2\) with \(1 \le {{x}}_i < {\bar{x}}_i\), \(i \in \{1,2\}\), denote lower and upper bounds on the variables x. Here, S(x) is the set of optimal solutions of the x-parameterized problem

$$\begin{aligned} \max _{y \in \mathbb {R}^{n+2}} \quad&f(x,y) = y_1 - y_n \left( x_1 + x_2 - y_{n+1} - y_{n+2}\right) \end{aligned}$$
(2a)
$$\begin{aligned} \text {s.t.}\quad&y_1 + y_n = \frac{1}{2}, \end{aligned}$$
(2b)
$$\begin{aligned}&y_i^2 \le y_{i+1}, \quad i \in \{1,\ldots ,n-1\}, \end{aligned}$$
(2c)
$$\begin{aligned}&y_i \ge 0, \quad i \in \{1,\ldots ,n\}, \end{aligned}$$
(2d)
$$\begin{aligned}&y_{n+1} \in [0,x_1], \end{aligned}$$
(2e)
$$\begin{aligned}&y_{n+2} \in [-x_2,x_2]. \end{aligned}$$
(2f)

We refer to Problem (1) as the upper-level (or the leader’s) problem and to Problem (2) as the lower-level (or the follower’s) problem. Let us point out that the lower-level constraints (2b) and (2c) together with \(y_1 \ge 0\) have already been considered in [4] in the context of approximately feasible solutions for single-level optimization problems. Let us further emphasize that the number of variables and constraints of the lower-level problem is linear in n.

Problem (1) is a linear problem in both the leader’s and the follower’s variables. The only constraints that occur in this problem are variable bounds for the leader’s variables x. In particular, there are no upper-level constraints that explicitly depend on the follower’s variables y, i.e., there are no coupling constraints.

Moreover, the feasible set of the lower-level problem is bounded due to the following. From (2d) and (2b), we obtain  \(0 \le y_1 \le 1/2\) as well as  \(0 \le y_n \le 1/2\) for any feasible follower’s decision y. Using Constraints (2c), we further obtain \(0 \le y_i \le {1}\) for all \(i \in \{1,\ldots ,n\}\). Finally, we have \(0 \le y_{n+1} \le {\bar{x}}_1\) as well as \(-{\bar{x}}_2 \le y_{n+2} \le {\bar{x}}_2\) by (2e) and (2f) because the leader’s variables x are bounded. Since all finitely many lower-level constraints are continuous, the feasible set of the follower’s problem is compact. In addition to the compactness, the feasible set of the lower-level problem (2) is non-empty for every feasible leader’s decision \((x_1,x_2) \in [ {x}_1,{\bar{x}}_1] \times [ {x}_2,{\bar{x}}_2]\). For instance, the point

$$\begin{aligned} y_i = \frac{i}{2^{2^n}},\, i \in \{1,\ldots ,n-1\}, \quad y_n = \frac{1}{2} - \frac{1}{2^{2^n}}, \quad y_{n+1} = \frac{1}{2}, \quad \text {and} \quad y_{n+2} = 0 \end{aligned}$$

is strictly feasible w.r.t. the inequality constraints (2d), (2e) as well as (2f) and feasible w.r.t. the equality constraint (2b). Here, we exploit the assumption that  \(1 \le {x}_1, {x}_2\) holds to obtain strict feasibility w.r.t. the variable bounds in (2e) and (2f). Moreover, y is also strictly feasible w.r.t. the inequality constraints (2c) due to the following. For all \(i \in \{1,\ldots ,n-2\}\), we have

$$\begin{aligned} y_{i+1} - y^2_i = \frac{i+1}{2^{2^n}} - \left( \frac{i}{2^{2^n}}\right) ^2 = \frac{2^{2^n}(i+1) - i^2}{\left( 2^{2^n} \right) ^2 } = \frac{2^{2^n} + 2^{2^n}i \left( 1 - \frac{i}{2^{2^n}} \right) }{ \left( 2^{2^n} \right) ^2} > 0. \end{aligned}$$

Furthermore, we have

$$\begin{aligned} y_n - y^2_{n-1}&= \frac{1}{2} - \frac{1}{2^{2^n}} - \left( \frac{n-1}{2^{2^n}} \right) ^2 = \frac{\left( 2^{2^n} \right) ^2 - 2\cdot 2^{2^n} - 2(n-1)^2}{ 2 \cdot \left( 2^{2^n} \right) ^2} \\&= \frac{2^{2^n} \left( 2^{2^n} - 2 - \frac{2(n-1)^2}{2^{2^n}}\right) }{ 2 \cdot \left( 2^{2^n} \right) ^2} > 0. \end{aligned}$$

In particular, this means that the problem satisfies Slater’s constraint qualification. Moreover, the gradient of the single equality constraint (2b) is not the null vector. Hence, the Mangasarian–Fromovitz constraint qualification (MFCQ) is also satisfied at every feasible decision of the follower. Let us further point out that all lower-level constraints are linear except for the quadratic but convex inequality constraints in (2c). Therefore, the feasible set of the lower-level problem (2) is convex. Nevertheless, the overall lower-level problem is nonconvex since the follower’s objective function contains bilinear terms.

Before we solve the bilevel problem (1) and (2) in the following sections, let us briefly summarize the nice properties of the problem. The upper-level problem is linear and does not contain coupling constraints. The feasible set of the lower-level problem is convex and compact. For every feasible leader’s decision, the lower-level problem further satisfies Slater’s constraint qualification and the MFCQ is satisfied for every feasible follower’s decision.

3 Exact Feasibility

In this section, we determine the unique exact solution of the bilevel problem (1) and (2). To this end, we start by solving the lower-level problem (2) analytically for an arbitrary but fixed feasible leader’s decision  \((x_1,x_2) \in [{{x}}_1,{\bar{x}}_1] \times [{x}_2,{\bar{x}}_2]\).

First, we note that any feasible follower’s decision y satisfies \(y_n > 0\). The reasons are as follows. Let us contrarily assume that \(y_n = 0\) holds. Then, Constraint (2b) yields  \(y_1 = 1/2\). From \(y_{n} = 0\) and (2c), it follows that \(y_i = 0\) holds for all \(i \in \{1,\ldots ,n\}\), which contradicts \(y_{1} = 1/2\). Consequently, \(y_{n} > 0\) holds. For later reference, let us briefly summarize the previous observation.

Result 1

For every feasible leader’s decision \((x_1,x_2) \in [ {x}_1,{\bar{x}}_1] \times [ {x}_2,{\bar{x}}_2]\), a feasible follower’s decision y satisfies \(y_n > 0\).

The equality constraint (2b) thus yields \(y_1 < 1/2\). From (2e) and (2f), we additionally obtain

$$\begin{aligned} y_n \left( x_1 + x_2 - y_{n+1} - y_{n+2} \right) \ge 0. \end{aligned}$$

In particular, the latter term is minimized for \((y_{n+1},y_{n+2}) = (x_1,x_2)\). Therefore, the lower-level objective function value can be bounded from above by

$$\begin{aligned} f(x,y) = y_1 - y_n \left( x_1 + x_2 - y_{n+1} - y_{n+2} \right) \le y_1 < \frac{1}{2}. \end{aligned}$$

It is thus evident that an optimal follower’s decision \(y^*\) satisfies \((y^*_{n+1},y^*_{n+2}) = (x_1,x_2)\). Here, we can fix \((y^*_{n+1},y^*_{n+2})\) since these variables are subject to simple bound constraints and, in particular, they are not coupled to the other variables of the follower. Hence, the follower’s problem can be reduced to the convex problem

$$\begin{aligned} \max _y \quad&y_1 \end{aligned}$$
(3a)
$$\begin{aligned} \text {s.t.}\quad&y_1 + y_n = \frac{1}{2}, \end{aligned}$$
(3b)
$$\begin{aligned}&y_i^2 \le y_{i+1}, \quad i \in \{1,\ldots ,n-1\}, \end{aligned}$$
(3c)
$$\begin{aligned}&y_i \ge 0, \quad i \in \{1,\ldots ,n\}. \end{aligned}$$
(3d)

As shown above, Problem (3) satisfies Slater’s constraint qualification.Footnote 1 Again as shown above, the feasible set is compact. Therefore, Problem (3) has an optimal solution \(y^*\). Because of the equality constraint (3b), the lower-level objective function value \(y^*_1\) is maximized by minimizing \(y^*_n\). From Constraints (3c) and the optimality of \(y^{*}\), we obtain

$$\begin{aligned} {y^*_i = \left( y^*_1 \right) ^{2^{i-1}}} \quad \text { for all } i \in \{2,\ldots ,n\}, \end{aligned}$$

where \(y^*_1\) denotes the root of the function

$$\begin{aligned} h: \left[ 0,\frac{1}{2} \right] \rightarrow \mathbb {R}, \quad z \mapsto z + {z^{2^{n-1}}} - \frac{1}{2}. \end{aligned}$$
(4)

In particular, one can show that \(y^*_1\) is the unique root of (4). The function h is continuous and strictly increasing on [0, 1/2]. Moreover, we have \(h(0) < 0\) and \(h(1/2) > 0\). Consequently, there is a unique point \(y^*_1 \in (0,1/2)\) such that \(h(y^*_1) = 0\) holds. Furthermore, the follower’s decision \(y^*\) is the unique solution of Problem (3). To see this, let us assume that there is another feasible follower’s decision \({\hat{y}} \ne y^*\) for which the optimal objective function value \(y^*_1\) is obtained, i.e., \({\hat{y}}_1 = y^*_1\). Then, there must be at least one quadratic inequality constraint in (3c) that is not satisfied with equality for \({\hat{y}}\). Otherwise, we have \(y^* = {\hat{y}}\). However, if there is slack in Constraints (3c), we obtain \({\hat{y}}_n > y^*_n\). Then, (3b) yields

$$\begin{aligned} y^*_1 = \frac{1}{2} - y^*_n > \frac{1}{2} - {\hat{y}}_n = {\hat{y}}_1, \end{aligned}$$

which is a contradiction to the optimality of \({\hat{y}}\).

Result 2

For every feasible leader’s decision \((x_1,x_2) \in [ {x}_1,{\bar{x}}_1] \times [ {x}_2,{\bar{x}}_2]\), the set of optimal solutions of the lower-level problem (2) is a singleton.

In particular, Result 2 means that there is no need to distinguish between the optimistic and the pessimistic approach to bilevel optimization; see, e.g., [9]. Thus, we can finally determine an optimal leader’s decision for the overall bilevel problem (1) and (2). As \((y^*_{n+1},y^*_{n+2}) = (x_1,x_2)\) holds in the optimal follower’s decision \(y^*\), the leader actually solves the linear problem

$$\begin{aligned} \max _x \quad -x_1 + x_2 \quad \text {s.t.}\quad (x_1,x_2) \in [ {{ x}}_1,{\bar{x}}_1] \times [ {{ x}}_2,{\bar{x}}_2]. \end{aligned}$$

The unique optimal solution is given by \(x^* = ( {x}_1,{\bar{x}}_2)\).

Result 3

The bilevel problem (1) and (2) has a unique solution given by  \(x^* = ( {x}_1,{\bar{x}}_2)\) with an optimal objective function value of \(F^* = - {x}_1 + {\bar{x}}_2\).

To sum up, the bilevel problem (1) and (2) not only has nice properties such as a convex and bounded lower-level feasible set as well as a lower-level problem that satisfies Slater’s constraint qualification, but also has a unique optimal solution. Moreover, the strict complementarity condition holds for which we give a proof in Appendix B. Overall, the bilevel problem (1) and (2) is thus well-behaved.

4 \(\varepsilon \)-Feasibility

In what follows, we determine an optimal solution of the bilevel problem (1) and (2) under the assumption that we allow for small violations of the nonlinear lower-level constraints according to the following notion, which is motivated by the necessary special treatment of nonlinear (and, in particular, nonconvex) constraints in global optimization as we discussed it in the introduction.

Definition 4.1

Let \(0 < \varepsilon \in \mathbb {R}\), \(f: \mathbb {R}^n \rightarrow \mathbb {R}\), \(g: \mathbb {R}^n \rightarrow \mathbb {R}^m\), and \(h: \mathbb {R}^n \rightarrow \mathbb {R}^p\) be given. A point  \(x \in \mathbb {R}^n\) is called \(\varepsilon \)-feasible for the problem \(\max _{x \in \mathbb {R}^n} \{f(x):g(x) \le 0, h(x) = 0\}\) if \(g_i(x) \le 0\) and \(h_j(x) = 0\) holds for all \(i \in \{1,\ldots ,m\} {\setminus } N\) as well as for all \(j \in \{1,\ldots ,p\} {\setminus } M\) and if \(\max \{\max \{g_i(x):i \in N\}, \max \{|h_j(x)|:j \in M\}\} \le \varepsilon \) holds, where \(N \subseteq \{1,\ldots ,m\}\) and \(M \subseteq \{1,\ldots ,p\}\) denote the index sets of all nonlinear inequality and equality constraints.

A follower’s decision of the form

$$\begin{aligned} y_i = 2^{{-2}^{i-1}},\, i \in \{1,\ldots ,n-1\}, \ y_n = 0, \ y_{n+1} \in [0,x_1], \ \text {and} \ y_{n+2} \in [-x_2,x_2]\nonumber \\ \end{aligned}$$
(5)

is \(\varepsilon \)-feasible with \(\varepsilon = 2^{-2^{n-1}}\) for every feasible leader’s decision \((x_1,x_2) \in [ {x}_1,{\bar{x}}_1] \times [ {x}_2,{\bar{x}}_2]\) due to the following. The constraints

$$\begin{aligned} y_1 + y_n&= \frac{1}{2}, \\ y^2_i&\le y_{i+1}, \quad i \in \{1,\ldots ,n-2\}, \\ y_i&\ge 0, \quad i \in \{1,\ldots ,n\}, \\ y_{n+1}&\in [0,x_1], \\ y_{n+2}&\in [-x_2,x_2] \end{aligned}$$

are (exactly) satisfied, whereas only the constraint \(y^2_{n-1} \le y_n\) is violated by \(\varepsilon = 2^{-2^{n-1}}\). Moreover, the lower-level objective function value is 1/2.

Result 4

If \(\varepsilon \ge 2^{-2^{n-1}}\), there is an \(\varepsilon \)-feasible follower’s decision y with \(y_n = 0\) for every feasible leader’s decision \((x_1,x_2) \in [ {x}_1,{\bar{x}}_1] \times [ {x}_2,{\bar{x}}_2]\).

It can easily be seen that by increasing n, we can obtain arbitrarily small values for \(\varepsilon \). In particular, there is no \(\varepsilon \)-feasible follower’s decision that yields a better objective function value than 1/2. The reasons are as follows. Using the equality constraint (2b), the lower-level objective function can be re-written as

$$\begin{aligned} f(x,y) = \frac{1}{2} - y_n - y_n \left( x_1 + x_2 - y_{n+1} - y_{n+2}\right) . \end{aligned}$$

For all \(\varepsilon \)-feasible follower’s decisions, we have

$$\begin{aligned} x_1 + x_2 - y_{n+1} - y_{n+2} \ge 0 \end{aligned}$$

because of the linear constraints (2e) and (2f). Consequently, a lower-level objective function value larger than 1/2 could only be obtained if \(y_n < 0\). However, this is not \(\varepsilon \)-feasible w.r.t. the variable bounds (2d). In addition, a follower’s decision of the form stated in (5) is thus an \(\varepsilon \)-feasible solution of the lower-level problem (2). Let us point out that, in contrast to the exact case, the follower’s variables \(y_{n+1}\) and \(y_{n+2}\) do not affect the lower-level objective function value in this setting and can thus be chosen arbitrarily. Therefore, the set of \(\varepsilon \)-feasible follower’s solutions is not a singleton anymore.

Result 5

If \(\varepsilon \ge 2^{-2^{n-1}}\), the set of \(\varepsilon \)-feasible follower’s solutions is not a singleton for every feasible leader’s decision \((x_1,x_2) \in [ {x}_1,{\bar{x}}_1] \times [ {x}_2,{\bar{x}}_2]\).

Due to Result 5, we need to distinguish between optimistic and pessimistic solutions. Following the optimistic approach, the follower chooses \(y_{n+1} = 0\) as well as \(y_{n+2} = x_2\) such as to favor the leader w.r.t. the leader’s objective function value. Therefore, the leader actually solves the linear problem

$$\begin{aligned} \max _x \quad x_1 + x_2 \quad \text {s.t.}\quad (x_1,x_2) \in [ {{ x}}_1,{\bar{x}}_1] \times [ {{ x}}_2,{\bar{x}}_2]. \end{aligned}$$

The optimistic optimal leader’s decision is thus given by \(x^* = ({\bar{x}}_1,{\bar{x}}_2)\). In the pessimistic case, the follower chooses \(y_{n+1} = x_1\) as well as \(y_{n+2} = -x_2\) such as to adversely affect the leader’s decision. In this setting, the leader solves the linear problem

$$\begin{aligned} \max _x \quad -x_1 - x_2 \quad \text {s.t.}\quad (x_1,x_2) \in [ {{ x}}_1,{\bar{x}}_1] \times [ {{ x}}_2,{\bar{x}}_2]. \end{aligned}$$

Hence, the pessimistic optimal leader’s decision is given by \(x^* = ( {x}_1, {x}_2)\). To sum up, let us state the main observations of this section.

Result 6

Let \(\varepsilon \ge 2^{{-2}^{n-1}}\) and suppose that we allow for \(\varepsilon \)-feasible follower’s solutions. Then, the optimistic optimal solution of the bilevel problem (1) and (2) is given by \(x_{\text {o}}^* = ({\bar{x}}_1,{\bar{x}}_2)\) with an optimal objective function value of \(F^*_{\text {o}} = {\bar{x}}_1 + {\bar{x}}_2\). The pessimistic optimal solution is given by \(x_{\text {p}}^* = ( {x}_1, {x}_2)\) with an optimal objective function value of \(F^*_{\text {p}} = - {x}_1- {x}_2\).

We now finally compare the results of the exact bilevel solution with the results for the optimistic and pessimistic setting for the case of only \(\varepsilon \)-feasibility of the lower level. In the optimistic setting, the distance between the solutions is \({\bar{x}}_1 - {x}_1\) and the difference between the corresponding objective function values is \({\bar{x}}_1 + {x}_1\). Two aspects are remarkable. First, by enlarging the feasible interval for the variable \(x_1\), we get an arbitrarily large error and, second, this error is independent of \(\varepsilon \), i.e., this arbitrarily large error occurs independent of how accurate one solves the lower-level problem.

For the pessimistic setting, the distance between the solution is \({\bar{x}}_2 - {x}_2\) and the difference between the objective function values is \({\bar{x}}_2 + {x}_2\). Hence, we obtain the same qualitative behavior but now in dependence of the variable \(x_2\) instead of \(x_1\).

In summary, we obtain the following two main observations. First, we can be arbitrarily far away from the overall exact bilevel solution. Second, we also obtain arbitrarily large errors regarding the optimal objective function value of the leader. The latter is very much in contrast to the situation in single-level optimization for which sensitivity results are available; see, e.g., Proposition 4.2.2 in [3]. This is particularly the case for linear optimization problems, where standard sensitivity analysis results (see, e.g., Theorem 5.5 in [7]) apply as well and state that a small change in the right-hand side of the problem’s constraints can only lead to a small change in the optimal objective function value.

Lastly, let us comment on that only very moderate values of n are required to get the wrong solution. Taking the inequality for \(\varepsilon \) from Result 6, it is easy to see that for a given tolerance \(\varepsilon \), the parameter n needs to satisfy \(n \ge \log _2(\log _2(1/\varepsilon ^2))\) so that numerically computed solutions do not coincide with the exact solution for the given \(\varepsilon \). For instance, a tolerance of \(\varepsilon = 10^{-8}\) already leads to a wrong result for \(n = 6\). This particularly means that the considered bilevel problem is moderate in size w.r.t. the number of constraints and variables. For \(n=6\), we only have  16 constraints and 8 variables on the lower level. We further note that the used constraint coefficients are all 1 and that the coefficients are independent from n and the given tolerance \(\varepsilon \).

A Python code for the example considered in this paper is publicly available at https://github.com/m-schmidt-math-opt/ill-behaved-bilevel-example and can be used to verify the discussed results.

5 Analysis of the \(\varepsilon \)-Feasible Linear Case

In this section, we analyze the linear bilevel case, i.e., we study the problem

$$\begin{aligned} \min _{x, y} \quad&c_x^\top x + c_y^\top y \end{aligned}$$
(6a)
$$\begin{aligned} \text {s.t.}\quad&A x \ge a, \end{aligned}$$
(6b)
$$\begin{aligned}&y \in {{\,\mathrm{arg\,min}\,}}_{{\bar{y}}}\left\{ d^\top {\bar{y}}:Cx + D{\bar{y}} \ge b\right\} \end{aligned}$$
(6c)

with \(c_x \in \mathbb {R}^{n_x}\), \(c_y,\, d \in \mathbb {R}^{n_y}\), \(A \in \mathbb {R}^{m \times n_x}\), \(a \in \mathbb {R}^m\), \(C \in \mathbb {R}^{\ell \times n_x}\), \(D \in \mathbb {R}^{\ell \times n_y}\), and \(b \in \mathbb {R}^{\ell }\). We assume that the set \(\{(x,y) \in \mathbb {R}^{n_x} \times \mathbb {R}^{n_y}:Ax \ge a,\, Cx + Dy \ge b\}\) is non-empty and compact and that for every feasible upper-level decision x, there exists a feasible lower-level decision y. This implies that the lower-level problem is bounded for every feasible upper-level decision and that the dual problem of the lower level is feasible. We also assume that the set \(\{x \in \mathbb {R}^{n_x}:Ax \ge a\}\) is bounded. Moreover, we consider the setting in which the underlying linear algebra and linear optimization routines are of finite precision only.

When finite-precision procedures are used, an algorithm that solves Problem (6) will output a pair \(({\hat{x}}, {\hat{y}})\) that may be slightly infeasible. The concern, should that happen, is that the solution being output can be superoptimal to a degree that is not proportional to its infeasibility. As discussed in the previous section, such an outcome can be observed for general, i.e., nonlinear, bilevel problems. In this section, however, we show that linear bilevel problems behave better in some sense. To this end, we assume that our underlying solver can ensure the following properties:

  • \(A{\hat{x}} \ge a - \varepsilon e_m\) and \(C{\hat{x}} + D{\hat{y}} \ge b - \varepsilon e_{\ell }\),

  • \(d^\top {\hat{y}} \ge \min \{d^\top y:C{\hat{x}} + Dy \ge b\} - \varepsilon \).

Here and in what follows, \(0< \varepsilon < 1\) is a given tolerance, \(e_k \in \mathbb {R}^k\) is the vector of all ones, and \(({\hat{x}},{\hat{y}})\) is used to denote a nearly feasible solution of the bilevel problem (6).

Prior to our analysis, we present a general result that will be used below. This result can be read from Theorem 3.38 (Page 112) of [8]. It can also be obtained from Corollary 3.2b (Page 20) of [26] or from Theorem 10.2 (Page 121) of [26]. We will use the term size to refer to the (bit) encoding length of a matrix, vector, or formulation, as appropriate.

Definition 5.1

Let \(P = \{x \in \mathbb {R}^n:Hx = h, \, x \ge 0\}\) with \(H \in \mathbb {R}^{m \times n}\) and  \(h \in \mathbb {R}^m\). Given \(z \in \mathbb {R}^n\), we say that z is basic if \(H z = h\) and, defining \(B = B(z) = \{j:z_j \ne 0\}\), the submatrix \(H_B\) of H corresponding to the columns in B has rank |B|. Furthermore, if in addition \(z \ge 0\), we say that z is basic feasible.

Remark 5.1

Let P be as in Definition 5.1. The extreme points of P are precisely the vectors z that are basic feasible.

Theorem 5.1

Let \(P = \{x \in \mathbb {R}^n:Hx = h, \, x \ge 0\}\) with \(H \in \mathbb {R}^{m \times n}\) and  \(h \in \mathbb {R}^m\). There is a constant  \(\kappa (H) > 0\) of size polynomial in the size (of the bit-encoding) of H such that, for any basic vector v, we have

$$\begin{aligned} \Vert v\Vert _{\infty } \le \kappa (H) \Vert h\Vert _{\infty }. \end{aligned}$$

Proof

Let v be basic and set \(J = \{j:v_j \ne 0\}\). Since v is basic, there is a subset of rows I of H with \(|I| = |J|\) such that the following holds:

  1. (i)

    The submatrix \(H_{I,J}\) of H indexed by rows I and columns J is invertible.

  2. (ii)

    As a consequence, it holds \(v_J = H_{I,J}^{-1} h_I\), where \(v_J\) is the subvector of v indexed by J and \(h_I\) is the subvector of h indexed by I.

Using submultiplicativity of the norm, we get

$$\begin{aligned} \Vert v\Vert _{\infty } \le \Vert H_{I,J}^{-1}\Vert _{\infty } \Vert h_I \Vert _{\infty }. \end{aligned}$$

The result now follows by defining \( \kappa (H)\) to be the maximum over all \(\Vert B^{-1} \Vert _{\infty }\) for B being an invertible submatrix of H. \(\square \)

In Theorem 5.1, we use what is usually termed the standard representation of a polyhedron. Similar statements can be derived using other representations of polyhedra, e.g., \(\{x \in \mathbb {R}^n:Hx \le h\}\), via well-known reformulations.

5.1 Linear Optimization with Errors

We start with some simple observations for classic, i.e., single-level, linear problems of the form

$$\begin{aligned} v^* \mathrel {{\mathop :}{=}}\min _{x \in \mathbb {R}^{n_x}} \left\{ v^\top x:Mx \ge f\right\} \end{aligned}$$
(7)

with \(v \in \mathbb {R}^{n_x}\), \(0 \ne M \in \mathbb {R}^{m \times n_x}\), and \(f \in \mathbb {R}^m\). Throughout this section, we assume that the feasible region for problem (7) is non-empty and bounded. Moreover, we denote the corresponding dual problem by

$$\begin{aligned} \max _{z \in \mathbb {R}^{m}} \left\{ f^{\top } z:M^{\top }z = v, \ z \ge 0\right\} . \end{aligned}$$

Next, we will derive estimates involving near-feasible and near-optimal points for Problem (7).

Lemma 5.1

Suppose that there is a point \({\hat{x}} \in \mathbb {R}^{n_x}\) that is nearly feasible for Problem (7), i.e., \(M {\hat{x}} \ge f - \varepsilon e_{m}\). Then, the following holds.

  1. (a)

    It holds

    $$\begin{aligned} v^\top {\hat{x}} \ge v^* - \varepsilon \kappa (M)\Vert v\Vert _{\infty }, \end{aligned}$$

    where \(\kappa (M) > 0\) is a constant of polynomial size.

  2. (b)

    There exists \(x^*\) feasible for Problem (7) such that

    $$\begin{aligned} \Vert x^* - {{\hat{x}}}\Vert _{\infty } \le \varepsilon \kappa _1(M) \end{aligned}$$

    holds for a certain constant \(\kappa _1(M) > 0\) of polynomial size.

Proof

  1. (a)

    Let \(z^*\) be an optimal solution of the dual problem of (7). Then,

    $$\begin{aligned} v^\top {\hat{x}} = (z^*)^\top M {\hat{x}} \ge (z^*)^\top (f - \varepsilon e_m) = v^* - \varepsilon \Vert z^* \Vert _1 \end{aligned}$$

    holds. In particular, this equation applies to any dual optimal \(z^*\). Since \(z \ge 0\) is a constraint of the dual problem, the dual feasible region is a pointed polyhedron, and, w.l.o.g., \(z^*\) is an extreme point. The result now follows from Theorem 5.1.

  2. (b)

    Consider the linear optimization problem

    $$\begin{aligned} \min _{x, \delta } \quad&\delta \end{aligned}$$
    (8a)
    $$\begin{aligned} \text {s.t.}\quad&\Vert x - {\hat{x}}\Vert _{\infty } \le \delta , \end{aligned}$$
    (8b)
    $$\begin{aligned}&M x \ge f. \end{aligned}$$
    (8c)

    That this is indeed a linear program follows by reformulating (8b) as

    $$\begin{aligned} x_j - \delta \, \le \, {\hat{x}}_j, \quad -x_j - \delta \, \le \, -{\hat{x}}_j, \quad 1 \le j \le n_x. \end{aligned}$$
    (9)

    Clearly, the resulting problem is both feasible and bounded since (7) is. Moreover, \(({{\hat{x}}}, 0)\) satisfies the constraints of this problem with additive error of at most \(\varepsilon \). We can therefore apply (a) to this problem to obtain \(x^*\) being feasible for Problem (7) and such that

    $$\begin{aligned} \Vert x^* - {{\hat{x}}}\Vert _\infty \le \varepsilon \kappa _1(M) \end{aligned}$$

    holds, where \(\kappa _1(M)\) is the \(\kappa \)-constant (of polynomial size in M) that applies to the matrix for Constraints (9) and (8c). \(\square \)

Let us emphasize that the result in Lemma 5.1 applies for any \(\varepsilon > 0\), no matter how large. In particular, it is not required that \(\varepsilon \) is “sufficiently small”.

Lemma 5.2

Suppose that there is a nearly primal-dual feasible and nearly primal-dual optimal pair \(({\hat{x}}, {\hat{z}}) \in \mathbb {R}^{n_x} \times \mathbb {R}^m\) for Problem (7), i.e., \(({\hat{x}}, {\hat{z}})\) satisfies

  1. (i)

    \(M {\hat{x}} \ge f - \varepsilon e_{m}\),

  2. (ii)

    \( \Vert M^\top {\hat{z}} - v \Vert _{\infty } \le \varepsilon \), \({\hat{z}} \ge -\varepsilon e_m\), and

  3. (iii)

    \(v^\top {\hat{x}} - f^\top {\hat{z}} \le \varepsilon \).

Then, there exists an optimal solution \(x^*\) for Problem (7) such that

$$\begin{aligned} \Vert x^* - {\hat{x}} \Vert _{\infty } \le \varepsilon \kappa _3(M, v) \max \{1,\Vert f \Vert _{\infty }\} \end{aligned}$$

holds for a certain constant \(\kappa _3(M,v) > 0\), whose size is polynomial in the size of the input data M and v.

Proof

First, we note that Condition (ii) simply states that \({{\hat{z}}}\) is feasible for the dual of (7) up to an error of \(\varepsilon \). We can thus apply Part (a) of Lemma 5.1 to obtain

$$\begin{aligned} f^\top {\hat{z}} \le v^* + \varepsilon \kappa _2(M) \Vert f\Vert _{\infty }, \end{aligned}$$
(10)

where \(\kappa _2(M)\) is the \(\kappa \)-constant for the dual of (7), which is of polynomial size in M. Together with (iii), this implies

$$\begin{aligned} v^{\top } {{\hat{x}}} \le v^* + \varepsilon (1 + \kappa _2(M) \Vert f\Vert _{\infty }). \end{aligned}$$
(11)

Next, we consider the polyhedron given by

$$\begin{aligned} M x&\ge f, \end{aligned}$$
(12a)
$$\begin{aligned} v^\top x&\le v^*, \end{aligned}$$
(12b)

which is feasible and bounded. By (i) and (11), \({{\hat{x}}}\) satisfies these inequalities with feasibility error of at most \(\varepsilon (1 + \kappa _2(M) \Vert f\Vert _{\infty })\). By applying Part (b) of Lemma 5.1, we obtain that there is a feasible point \(x^*\) for (12), i.e., an optimal solution \(x^*\) for (7), such that

$$\begin{aligned} \Vert x^* - {\hat{x}} \Vert _\infty \le \varepsilon \kappa _1(M, v)(1 + \kappa _2(M)\Vert f\Vert _{\infty }) \end{aligned}$$

holds. \(\square \)

5.2 Application to Linear Bilevel Problems

We now return to the bilevel setup as stated in (6). To this end, note that for a given upper-level decision \(x \in \mathbb {R}^{n_x}\), the dual of the lower-level problem (6c) reads

$$\begin{aligned} \max _{z} \quad&(b - Cx)^\top z \end{aligned}$$
(13a)
$$\begin{aligned} \text {s.t.}\quad&D^\top z = d, \end{aligned}$$
(13b)
$$\begin{aligned}&z \ge 0. \end{aligned}$$
(13c)

Lemma 5.3

Let \(x \in \mathbb {R}^{n_x}\), \({\hat{y}} \in \mathbb {R}^{n_y}\), and \({\hat{z}} \in \mathbb {R}^{\ell }\) be such that

  1. (i)

    \(Ax \ge a\),

  2. (ii)

    \(D{\hat{y}} \ge b - C {x} - \varepsilon e_{\ell }\),

  3. (iii)

    \(\Vert D^\top {\hat{z}} - d\Vert _{\infty } \le \varepsilon \), \({\hat{z}} \ge -\varepsilon e_\ell \), and

  4. (iv)

    \( {d^\top {\hat{y}} - (b - Cx)^\top {\hat{z}}} \le \varepsilon \).

Then, there exists an optimal solution \(y^*\) for the  x-parameterized lower-level problem(6c) such that

$$\begin{aligned} \Vert y^* - {\hat{y}}\Vert _\infty \le \varepsilon \kappa _4(A,C,D,a,b,d) \end{aligned}$$
(14)

holds for a constant \(\kappa _4(A,C,D,a,b,d) > 0\), whose size is polynomial in the size of the input data A, C, D, a, b, and d.

Proof

By assumption, for a given x, the lower-level problem is feasible and bounded. We can thus apply Lemma 5.2 since Conditions (ii)–(iv) correspond to Conditions  (i)–(iii) of Lemma 5.2. Thus, there exists an optimal point \(y^*\) for the lower-level problem such that

$$\begin{aligned} \Vert y^* - {\hat{y}}\Vert _\infty \le \varepsilon \kappa _3(D, d) \max \{1,\Vert b - Cx \Vert _{\infty } \} \end{aligned}$$

holds. Using the triangle inequality and the submultiplicativity of the norm, we obtain

$$\begin{aligned} \Vert b - Cx \Vert _{\infty } \, \le \, \Vert b \Vert _{\infty } + \Vert C \Vert _{\infty } \Vert x\Vert _{\infty }. \end{aligned}$$

Since the feasible region for the upper-level problem is bounded, \(\Vert x \Vert _{\infty }\) is upper bounded by the \(\infty \)-norm of some extreme point. We can apply Theorem 5.1 to obtain

$$\begin{aligned} \Vert x \Vert _{\infty } \le \kappa '(A) \Vert a \Vert _{\infty }, \end{aligned}$$

where \(\kappa '(A)\) is the \(\kappa \)-constant (of polynomial size in A) for the system \(Ax \ge a\). The proof is now concluded by appropriately defining \(\kappa _4(A,C,D,a,b,d)\). \(\square \)

Now, we consider the entire bilevel problem (6) and recall a basic definition from linear optimization.

Definition 5.2

Let \(z \in \mathbb {R}^\ell \) satisfy \(D^{\top } z = d\) and define  \(B = B(z) = \{j:z_j \ne 0\}\). We say that z is dual basic if the submatrix \(D^{\top }_B\) of \(D^{\top }\) corresponding to the columns in B has rank |B|.

Note that z is dual basic and feasible (i.e., \(z \ge 0\)) if and only if z is an extreme point of the dual polyhedron to any lower-level problem.

Theorem 5.2

Let \({\hat{x}} \in \mathbb {R}^{n_x}\), \({\hat{y}} \in \mathbb {R}^{n_y}\), and \({\hat{z}} \in \mathbb {R}^{\ell }\) be such that

  1. (i)

    \(A{\hat{x}} \ge a - \varepsilon e_m\),

  2. (ii)

    \(D{\hat{y}} \ge b - C{\hat{x}} - \varepsilon e_{\ell }\),

  3. (iii)

    \(\Vert D^\top {\hat{z}} - d\Vert _{\infty } \le \varepsilon \), \({\hat{z}} \ge -\varepsilon e_\ell \),

  4. (iv)

    \(d^\top {\hat{y}} - (b - C{\hat{x}})^\top {\hat{z}} \le \varepsilon \),

  5. (v)

    \(\Vert {\hat{z}} - {\tilde{z}}\Vert _\infty \le \varepsilon \) for some dual basic \({\tilde{z}}\).

Then, there exists a pair \((x^*,y^*)\) that is feasible for the bilevel problem (6) such that

$$\begin{aligned} \Vert (x^*,y^*)^\top - ({\hat{x}},{\hat{y}})^\top \Vert _\infty&\le \varepsilon {\kappa _5(A,C,D,a,b,d)}, \\ |c_x^\top x^* + c_y^\top y^* - (c_x^\top {\hat{x}} + c_y^\top {\hat{y}}) |&\le \varepsilon {\kappa _6(A,C,D,a,b,c,d)} \end{aligned}$$

hold for certain constants \(\kappa _5(A,C,D,a,b,d)\) and \(\kappa _6(A,C,D,a,b,c,d) > 0\), whose sizes are polynomial in the size of the input data.

Proof

By Assumptions  (i) and (ii), the pair \(({\hat{x}},{\hat{y}}) \in \mathbb {R}^{n_x} \times \mathbb {R}^{n_y}\) is nearly feasible for the upper- and the lower-level problem of (6).

Applying Part (b) of Lemma 5.1 to \(({\hat{x}},{\hat{y}})\) and the system

$$\begin{aligned} \begin{bmatrix} A &{} 0\\ C &{} D \end{bmatrix} \begin{pmatrix} x\\ y \end{pmatrix} \ge \begin{pmatrix} a\\ b \end{pmatrix} \end{aligned}$$
(15)

yields \((x^*,y')\) with \(Ax^* \ge a\), \(Dy' \ge b - Cx^*\), and

$$\begin{aligned} \Vert (x^*,y')^\top - ({\hat{x}},{\hat{y}})^\top \Vert _\infty \le \varepsilon {\kappa _7(A,C,D)}, \end{aligned}$$
(16)

where  \(\kappa _7(A,C,D)\) is the \(\kappa \)-constant for System (15). Next, we use (iv) and obtain

$$\begin{aligned} \begin{aligned}&\ d^\top y' - (b - Cx^*)^\top {\hat{z}} \\ \le&\ \varepsilon + d^\top (y' - {\hat{y}}) - {\left( C({\hat{x}} - x^*)\right) ^\top {\hat{z}}} \\ \le&\ \varepsilon + \left|d^\top ({\hat{y}} - y')\right| + {\left|\left( C({\hat{x}} - x^*)\right) ^\top {\hat{z}}\right|}. \end{aligned} \end{aligned}$$
(17)

Note that, if \(u,\, v \in \mathbb {R}^n\), then \(|u^\top v| \le \sum _{j = 1}^n |u_j v_j| \le n \Vert u\Vert _\infty \Vert v\Vert _\infty \) holds. Moreover, if \(Q \in \mathbb {R}^{m \times n}\) and \(u \in \mathbb {R}^n\), then, for \(1 \le i \le m\), \(| (Q u)_i| \le (\sum _{j = 1}^n |q_{ij}|) \Vert u \Vert _\infty \, \le \, \Vert Q \Vert _\infty \Vert u \Vert _\infty \) by definition of the infinity-norm of a matrix. Hence, using (17) we obtain

$$\begin{aligned} d^\top y' - (b - Cx^*)^\top {{\hat{z}}} \le \ \varepsilon + n_y \Vert d\Vert _\infty \Vert {\hat{y}}-y'\Vert _\infty + \ell \Vert C\Vert _\infty \Vert {\hat{x}}-x^*\Vert _\infty \Vert {\hat{z}}\Vert _\infty .\qquad \quad \end{aligned}$$
(18)

Further, since \({\tilde{z}}\) is dual basic, \(\Vert {\tilde{z}} \Vert _{\infty }\) is upper bounded by \(\kappa (D)\Vert d\Vert _\infty \) due to Theorem 5.1. Thus, using (v) yields \( \Vert {\hat{z}} \Vert _\infty \le \varepsilon + {\kappa (D)\Vert d\Vert _\infty }\). These facts, together with (16) and (18), yield

$$\begin{aligned} d^\top y' - (b - Cx^*)^\top {{\hat{z}}} \le \varepsilon {\kappa _8(A,C,D,d)}, \end{aligned}$$

with \(\kappa _8(A,C,D,d) \ge 1\) being appropriately defined and of polynomial size. To sum up, \(x^*\), \(y'\), and \({{\hat{z}}}\) satisfy

  1. (a)

    \(A x^* \ge a\),

  2. (b)

    \(D y' \ge b - C x^*\),

  3. (c)

    \(\Vert D^\top {{\hat{z}}} - d\Vert _{\infty } \le \varepsilon \), \({\hat{z}} \ge -\varepsilon e_\ell \),

  4. (d)

    \(d^\top y' - (b - C x^*)^\top {{\hat{z}}} \le \varepsilon {\kappa _8(A,C,D,d)}\).

Thus, by Lemma 5.3 applied to the error \(\varepsilon \kappa _8(A,C,D,d) \ge \varepsilon \), there exists an optimal solution \(y^*\) for the \(x^*\)-parameterized lower-level problem (6c) such that

$$\begin{aligned} \Vert y^* - y'\Vert _\infty \le \varepsilon {\kappa _8(A,C,D,d)} \kappa _4(A,C,D,a,b,d) \end{aligned}$$
(19)

holds. Using this inequality and (16) concludes the proof. \(\square \)

Remark 5.2

Assumption (v) in Theorem 5.2 states that the distance between a nearly feasible and nearly optimal solution for the dual of the lower-level problem and a basic solution for the dual is small, which is a reasonable assumption in our setting.

To summarize the statement of the theorem, the distance to feasibility and the superoptimality of a nearly feasible pair \(({\hat{x}}, {\hat{y}})\) for the bilevel problem (6) is linear in \(\varepsilon \) with coefficients \(\kappa \) that have polynomial size in the input data. This type of guarantee with polynomial sized coefficients is simply unavailable in the nonlinear case as we have seen in the previous sections.

6 Conclusion

In this paper, we consider an exemplary bilevel problem with continuous variables and a nonconvex lower-level problem and illustrate that numerically obtained solutions can be arbitrarily far away from an exact solution. The discrepancy between exact and numerically computed solutions is based on the fact that we cannot exactly satisfy all constraints of the nonconvex lower level when using global optimization techniques such as spatial branching. The considered problem itself is well-posed in the sense that we do not use large constraint coefficient ranges or high-degree polynomials. Moreover, we show that the constraint set of the lower-level problem is convex, compact, and that it satisfies Slater’s constraint qualification. In an exact sense, we prove that the lower-level problem as well as the overall bilevel problem possess unique solutions. It is further established that LICQ holds in every follower’s solution for every feasible leader’s decision. While working computationally, however, we can only expect to obtain \(\varepsilon \)-feasible solutions of the nonconvex lower-level problem. Furthermore, the set of \(\varepsilon \)-feasible follower solutions is not a singleton anymore. Thus, we determine both an optimal solution for the optimistic and the pessimistic variant of the bilevel problem. By doing so, we establish that not only the obtained \(\varepsilon \)-feasible bilevel solutions can be arbitrarily far away from the overall exact bilevel solution but that there can also be an arbitrarily large error in the objective function value of the leader.

We also show that the pathological behavior observed for nonlinear lower-level problems seems to be due to the nonlinearities by showing that linear bilevel problems behave better at least on the level of feasible points. As an important question for future research, it is still open if one can prove that the bad behavior can also not appear for more general problems than linear ones, such as convex problems, in the lower level.

Finally, our results show that computational bilevel optimization with continuous but nonconvex lower levels needs to be done with great care and that ex-post checks may be needed to avoid considering arbitrarily bad points as “solutions” of the given bilevel problem.