Solving bilevel programs with the KKT-approach

Bilevel programs (BL) form a special class of optimization problems. They appear in many models in economics, game theory and mathematical physics. BL programs show a more complicated structure than standard finite problems. We study the so-called KKT-approach for solving bilevel problems, where the lower level minimality condition is replaced by the KKT- or the FJ-condition. This leads to a special structured mathematical program with complementarity constraints. We analyze the KKT-approach from a generic viewpoint and reveal the advantages and possible drawbacks of this approach for solving BL problems numerically.


Introduction
In the present article we consider bilevel problems (BL) of the form: 1+q n+m := C 2 (R n+m , R 1+q ) and (φ, v 1 , . . . , v l ) ∈ [C 3 ] 1+l n+m . Note that we consider local minimizers in the upper and global minimizers in the lower level.
Bilevel problems form an important class of mathematical programs. They appear for example in equilibrium models, in Stackelberg Games (cf., [2]), and in semi-infinite programming (see [19,20]). The bilevel structure makes BL difficult to solve. Even for the feasibility check, obviously, a finite program (in y) has to be solved. During the last 20 years, books and many papers are dedicated to this topic, see e.g., [2,6,13] and the references therein.
Also from a topological viewpoint, BL is more complicated than standard finite programming. The feasible set of a BL may for example not be closed (see e.g., [20]). This phenomenon arises when the feasible set Y (x) of the lower level problem does not depend continuously on x, and non-closedness can even be stable with respect to (wrt.) small, smooth perturbations of the problem functions.
An appealing way to deal with general BL's is the so called Karush-Kuhn-Tucker (KKT) approach where the lower level constraint, that y is a global minimizer of the program Q(x), is firstly relaxed to the condition that y is a local minimizer of Q(x). The latter condition is then replaced by the KKT-conditions (1. 2) In this exposition, we more generally replace the lower level constraint by the Fritz-John (FJ) (necessary) conditions  where M FJBL = (x, y, λ) ∈ R n+m+l+1 | (1.3) holds and g j (x, y) ≥ 0, j ∈ J . Similarly we define the program P KKTBL with corresponding feasible set M KKTBL = (x, y, λ) ∈ R n+m+l | (1.2) holds and g j (x, y) ≥ 0, j ∈ J . Note that each minimizer y(x) of Q(x) necessarily has to solve the FJ-conditions (1.3), so that the inclusion (1.5) must hold. Here, M FJBL | R n ×R m denotes the projection of M FJBL into the space R n ×R m . So, the main purpose of the KKT-approach is to find (local) minimizers of the original BL program by computing (local) minimizers of the relaxation P FJBL . Remark 1.1 Note that in case, the lower level problem Q(x) is convex and satisfies a constraint qualification, y is a global minimizer of Q(x) iff the KKT-condition (or equivalently the FJ-condition) is satisfied. So, under this condition, P B L and P KKTBL (as well as P FJBL ) are equivalent.
However, in general, (since (1.4) does not guarantee that y is a solution of Q(x)) P B L and P FJBL are not equivalent. So, the genericity results on P FJBL in this paper do not (directly) allow conclusions on the generic structure of bilevel programming.
By (1.5) however, P FJBL does yield a valid relaxation of the original problem P B L . We emphasize that in general a solution of Q(x) does not necessarily satisfy the KKTconditions (1.2). So, the inclusion M B L ⊂ M KKTBL | R n ×R m is not true in general. It is therefore preferable to consider P FJBL instead of P KKTBL . Note that in [12] it has been shown (for n = 1) that the inclusion M B L ⊂ M KKTBL | R n ×R m holds generically.
Both problems P KKTBL , P FJBL represent specially structured mathematical programs with complementarity constraints (MPCC). These MPCC problems have a less complicated structure than the original BL. In particular the feasible sets M KKTBL , M FJBL are always closed. For literature on MPCC we refer the reader e.g., to , [4,5,7,14] and [16]. To solve P KKTBL , P FJBL numerically, we can e.g., apply a smoothing procedure, where the complementarity constraints λ i v i = 0 are replaced by the perturbed relations λ i v i = τ with small τ > 0. In this way , we obtain a perturbed problem P KKTBL (τ ) or P FJBL (τ ) which can be solved with methods from standard nonlinear programming. This approach has been successfully applied to the numerical solution of semi-infinite programs (see [21]).
The aim of the present article is to analyze this KKT-approach for solving BL. For that purpose the generic structure of the MPCC problem P FJBL is studied. It appears that some of the difficulties of BL disappear in the KKT formulation P FJBL , but a part of the singular behavior also persists in P FJBL . A main result of our paper (Theorem 3.2), however, reveals that at a local solution of BL where the KKT approach leads to a singular system, generically, the minimizer can be computed by a (non-singular) reduced system. This suggests a (conceptual) computational approach which is able to overcome the singular behavior of the original BL. We emphasize that our genericity analysis exhibits the intrinsic structure of the KKT approach and precisely describes the situations any generic solution method for P FJBL should be able to deal with.
The article is organized as follows. In Sect. 2 we sketch the results on MPCC problems needed later on. Section 3 considers the MPCC program P FJBL and analyzes the generical properties of its feasible set and the critical points. The consequences of these results in terms of the original BL problem are discussed in Sect. 4. The whole investigations lead to an algorithmic approach for solving BL which is described in Sect. 5 along with some numerical experiments.

Preliminaries
In this section, we sketch concepts and results on MPCC needed later on, such as stationarity, optimality conditions, the smoothing method and certain genericity results (see [3] for details). Let us firstly recall some definitions for standard finite programs: . . , g q ∈ C 2 . For example, the active index set J 0 (x), the Lagrangian function L(x, λ, μ) with Lagrangian multipliers (λ, μ), the LICQ-and MFCQ-condition, the tangent space T x M, the Karush-Kuhn-Tucker condition as well as SC (strict complementarity) and the second order condition (SOC), (∇ 2 x L(x, λ, μ)| T x M is regular). We refer the reader to [10], [3] for details. We like to notice that in this section we use the symbols f, g j to denote functions in a different context than in the other sections.
We now consider MPCC problems of the form with I = {1, . . . , l}. For this class of programs the standard concept has to be adapted as follows (see e.g., [13,15]). For a feasible point x ∈ M CC we introduce the active index sets The Lagrangian function of FJ-type (near x) is given by: We say that MPCC-LICQ holds at In the sequel, h, g, r and s stands for (h 1 , . . . , h q 0 ), (g 1 , . . . , g q ), (r 1 , . . . , r l ) and (s 1 , . . . , s l ) respectively. For y ∈ R m and some index set I 0 ⊂ {1, . . . , m} we use the abbreviation y I 0 to denote the subvector (y i , i ∈ I 0 ) and write ∇s I 0 instead of [∇s i , i ∈ I 0 ].
Note that in the MPCC literature mostly the concept of weakly stationary points is used ("critical points that need not to satisfy MPCC-LICQ"). Proposition 2.1 (cf. [8,15]) If x is a local minimizer where MPCC-LICQ is satisfied, then x is a strongly stationary point.

Definition 2.2 Let x be a critical point of P CC with associated multiplier
3) The MPCC-second order condition (MPCC-SOC) is satisfied if V is a matrix with as columns a basis of the tangent space T If x is a non-degenerate critical point in the MPCC-sense, such that μ j > 0, j ∈ J 0 ( x ), ρ i , σ i > 0, i ∈ I rs ( x ) are fulfilled and the matrix ∇ 2 x L( x, 1, λ, μ, ρ, σ ) | T x M CC is positive definite, then it can be seen that x is a local minimizer of P CC .

Remark 2.1
To show that our MPCC program P FJBL in Sect. 1 generically satisfies MPCC-SOC we will need the following fact: It is well-known (see e.g., [9]) that is nonsingular and B has full rank If MPCC-LICQ is fulfilled, then it is easy to see that MPCC-SOC holds (at ( x, 1, λ, μ, ρ, σ )) for P CC if and only if the following matrix is non-singular (J 0 := J 0 ( x ) etc.): We also recall some basics in genericity theory. To denote the space C κ (R N , R M ) we use the shorthand notation [C κ ] M N . This space can be endowed with the so-called strong C τ S -topology (τ ≤ κ) (see [10] for details). We say that a property is gener- N which are open and dense sets wrt. the C τ S -topology. By identifying the MPCC program with its problem functions ( f, h, r, s, g) the set of MPCC's can be identified, e.g., with the set [C 2 ] 1+q 0 +2l+q n . We now give genericity results for MPCC in two different forms: is regular in the MPCC-sense (almost all is to be understood in the sense of the Lebesgue measure).
Moreover, generically in [C 2 ] 1+q 0 +2l+q n wrt. the C 2 S -topology, the problems P CC are regular in the MPCC-sense.
The second statement is proven in [17] (and [4]) based on the genericity results of Jongen-Jonker-Twilt for standard programs (see e.g., [9]). The first statement can be shown by using Lemma 2.1 (Parameterized Sard Lemma, cf. [9]) Let F(z, u) be in [C κ ] l n+ p , with κ > max {0, n − l} and z ∈ R n , u ∈ R p . Let us assume that 0 is a regular value of F (i.e., ∀(z, u) : F(z, u) = 0 ⇒ ∇ (z,u) F(z, u) has rank l). Then for almost every u ∈ R p , 0 is a regular value of the functionF u : R n → R l ,F u (z) = F(z, u).
Finally, we consider the smoothing approach for solving the MPCC problem, where instead of (2.2), we solve the perturbed program P τ , where τ > 0 is a (small) perturbation parameter. We refer to [4] for convergence results for P τ , τ → 0.

Genericity analysis of the KKT approach
We now consider the KKT formulation P FJBL (cf., (1.4)) of the bilevel problem P B L . Since it is a complementarity constrained program with a special structure, the genericity results of Theorem 2.1, valid for generic MPCC programs, have to be adjusted to the special structure of P FJBL .
Remark 3.1 Note that in [18] a genericity analysis wrt. MPCC-LICQ has been done for a program (MPCC's with stationary constraints) similar to the KKT formulation. However, our FJ approach leads to a program with a (slightly) different structure so that (unfortunately) a separate genericity analysis is needed. Besides, our approach does not lead to an (artificial) condition (φ, v, g) ∈ [C ], > max{1, n} as in [18].
It appears that also for P FJBL generically MPCC-LICQ is satisfied at all feasible points. But for the local minimizers ( x, y, λ), the situation is more complicated than for general MPCC problems. We will show that in the generic situation the conditions MPCC-SC and MPCC-SOC may fail at local minimizers ( x, y, λ) of P FJBL .
With respect to a feasible point ( x, y, λ) of problem P FJBL , we introduce the active index sets: We begin by showing that, MPCC-LICQ is generically fulfilled for P FJBL . The density part is proven by applying the Sard Lemma 2.1 to an appropriately chosen perturbation of (fixed) problem functionsφ,v i ,ĝ j of a given BL program. We define perturbations of these functions: in ( ) define the perturbations. So, ( ) defines perturbed problem functions depending on the parameters Moreover, generically in the set

Stopology, MPCC-LICQ holds at all points of the feasible set M FJBL .
Proof In [3] this result has been proven for M KKTBL . However, since in the proof for M FJBL additional (non-trivial) technical difficulties appear we give the first part of the proof in detail. This first part is shown by using the Parameterized Sard Lemma. To do so, we consider a feasible point ( x, y, λ ) of the problem P FJBL defined by the problem . . , q}, such that ( x, y, λ ) solves the feasibility conditions: such that (x, y, λ, α, β, μ, γ, ρ) solves the equations: To apply the Sard Lemma we have to show that for all solutions of (3.3) its Jacobian (wrt. the variables and appropriate parameters) has full row rank. In order to simplify the analysis we consider different cases.
Sub-case μ = 0, β = 0: Denoting by C y0 φ and C y0 i the first column of C y φ and C y i , respectively, the Jacobian matrix wrt. the given variables and parameters reads: (6) in (3.3) corresponding to the partial derivatives wrt. the variables x, y, respectively We now show that the rows of this matrix are linearly independent (l.i.). Obviously, the rows corresponding to row-block 5,2 are l.i. with respect to the other rows. As β = 0, also matrix (I n ) in the first block has full row rank. To show the linear independence of the row blocks 6 y , 7, 1 we show that the sub-matrix formed by row blocks 6 y , 7, 1 and columns corresponding to the derivatives with respect to ρ, Let us suppose that this does not hold. Then there is a vector (a, b, c) = 0 such that for the corresponding combination of the rows we find: and (see column ∂ ρ ) i b i = 0. Taking the first equation of each system in (3.4) yields Multiplying the second row by y 1 and subtracting from the first, we obtain λ i a 1 = 0, ∀i and l Summing up, using l i=0 λ i = 1 and i b i = 0, we find c 1 = 0 and then b i = 0 for all i = 1, . . . , l. So, the system (3.4) reduces to for j = 1, . . . , m. By repeating the same trick, we multiply the second row by y 1 and subtract it from the first and obtain: λ 0 a j = 0, −λ i a j = 0. Finally using l i=0 λ i = 1 again we conclude a j = 0 for all j and analogously c = 0, contradicting (a, b, c) = 0.
So, we have shown that row blocks 6 y , 7, 1 are l.i. with respect to the other blocks. Now the independence of blocks 3, 4 is a consequence of part ∂ λ .
The perturbation arguments hold for any choice of active index sets J 0 , J 0 , 0 , J λ 0 , J 0g . By considering the (finite) intersection of all corresponding parameters (For details we refer to [17] where such a genericity result has been proven for another class of programs). The openness property is shown by using stability arguments. If we finally consider the intersection of the open and dense sets for N = 1, 2, . . ., we obtain the generic set of functions where MPCC-LICQ holds at all feasible points.
The next result describes the generic properties of the critical points of P FJBL . , at all solutions ( x, y, λ, α, β, μ, γ , ρ) of the corresponding system  that ( x, y, λ) is an isolated non-degenerate critical point of P FJBL (in the MPCC-sense). BL-2: If α = 0, then the multipliers μ j , β i associated with g j ( x, y ), v i ( x, y ), j ∈ J 0g ( x, y ), i ∈ J 0v ( x, y ), are not equal to zero and the inequality |J 0 ( x, y, λ )| ≥ m + |J λ 0 ( x, y, λ )| holds. If λ is such that  point ( x, y, λ )) there always exists a vertex solution λ * (of (3.14) below) such that ( x, y, λ * ) is a critical point of P FJBL satisfying rank
Case α = 0: For fixed N ∈ N we consider solutions of (3.12) with α > 1 N and apply the Parameterized Sard Lemma as in the proof of Theorem 3.1 as follows. We compute the Jacobian of the system (3.12) wrt. the variables (x, y, λ, α, β, μ, γ, ρ) and the parameters (b, c φ , c v , d v , d g ). This gives a matrix similar to the Jacobian matrices in the proof of Theorem 3.1. It can be checked that this Jacobian has full row rank. The Parameterized Sard Lemma then implies that for almost every (b, c φ , c v , d v , d g ), the Jacobian matrix of the system (3.12), with respect to the variables (x, y, λ, α, β, γ, μ, ρ), has full row rank E := n + m + l + 1 + m + |J 0v | + |J 0 | + | 0 | + 1 + |J λ 0 | + |J 0g | + |J * 0β | + |J * 0γ | + |J * 0g |. But this rank cannot exceed the number V := n + m + l + 1 + m + |J 0 | + |J 0 | + |J 0 | + | 0 | + |J λ 0 | + |J 0g | + 1 of involved variables. So, in view of J 0 ∪ J 0 = J 0v we must have i.e. MPCC-SC holds. With similar arguments, using the full rank condition for the Jacobian, one shows that for almost all parameters (b, c φ , c v , d v , d g ) MPCC-SOC and MPCC-LICQ holds (the last also follows more generally from Theorem 3.1). In particular this implies that the critical point ( x, y, λ ) is an isolated non-degenerate critical point of P FJBL . By taking all finitely many possible combinations of active index sets into account, we conclude that for almost every linear perturbation of (f ,φ,v,ĝ), the solutions of the system (3.12) with α > 1 N are non-degenerate critical points of P FJBL . Taking the intersection ∩ N ∈N of all these function sets, we conclude that for almost every linear perturbation the non-degeneracy condition holds at all critical points with α = 0.
Case α = 0: For a solution of (3.12) this assumption implies ρ = 0 and γ i = 0, i ∈ J 0 ∪ 0 (see (3.8)). As the set J 0v ( x, y ) = J 0 ( x, y, λ ) ∪ J 0 ( x, y, λ ) does not depend on the particular choice of λ, the critical point condition for ( x, y, λ) decomposes into a system in (x, y, β, μ), (3.13) and for fixed ( x, y) a system in λ, Note that any solution ( x, y, λ, β, μ) of the system (3.13), (3.14) yields a critical point ( x, y ) of the standard program, with corresponding multipliers β and μ. So, for almost all d v i , d g j the point ( x, y ) is a non-degenerate critical point of (3.15), i.e., β i , μ j = 0 for all i, j and the Hessian of (3.13) (∇ denotes ∇ (x,y) and we again skip the arguments (x, y)), is nonsingular. This follows by the genericity results for standard programs (see [10]). We now define The application of the Sard Lemma to the system (3.13), (3.14) implies that, for almost has full row rank which in particular yields (by comparing the number of rows and columns) m + |J λ 0 | ≤ |J 0 |. (3.18) Note that by Charatheodory's Lemma, for any given critical point ( x, y, λ ) we can choose a solution λ * of (3.14) such that ( x, y, λ * ) is a critical point satisfying (3.11) (cf., (3.18)). Now we wish to prove that MPCC-SOC is fulfilled if Recall that v 0 = φ. Using Remark 2.1, as MPCC-LICQ holds, we only need to prove the regularity of . . .
Here we assume that the vectors ∇ y v 0 , ∇ y v 1 , . . . , ∇ y v l are ordered according to the index sets J 0 ∪ {0}\J λ 0 , J 0 ∪ J λ 0 , 0 . Now if ( ) holds, then the columns of Using ( ) we see that the number of these columns is: Deleting these l + 1 columns ([c 1 , . . . , c l+1 ]) in M we obtain a matrixM (with N rows and N − l − 1 columns) that contains the matrix (3.17) as submatrix and has additional l + 1 zero-rows. Since (3.17) (as shown above) has full row rank the matrix M has row rank N − l − 1 and the same column rank. By adding again the l + 1 l.i. columns [c 1 , . . . , c l+1 ] it follows that the matrix M in (3.19), has full rank N .

Remark 3.2
Note that in the case where ( ) (see proof above) is satisfied the vector λ is a vertex of the polyhedron (3.14).

Remark 3.3 For the special case that Q(x) does'nt contain any constraints, i.e., Q(x)
is an unconstrained problem, then P FJBL = P KKTBL reduces to a standard finite program (with constraint ∇ y φ(x, y) = 0) and the genericity results for standard finite programs in [10] can directly be applied to find that P FJBL generically satisfies LICQ, SOC and SC. Remark 3.4 An important subclass of critical points are the so called C-stationary points, see [11] (i.e., critical points such that β i γ i ≥ 0∀i ∈ J 0 ). Cases BL-1 and BL-2 will appear (generically) even for this class and corresponding genericity results can be similarly obtained.
We combine the generic properties of Theorems 3.1, 3.2 in a definition. Obviously, this definition directly yields

Corollary 3.1 For almost all perturbations of
,the corresponding problems P B L are KKT-regular.
From Theorem 3.2 we conclude that generically P FJBL (see (1.4)) may have singular critical points only for solutions with α = 0. We now describe the possible singular behavior at critical points ( x, y, λ ) in this case α = 0 in Theorem 3.2, case BL-2, more precisely. In this case (generically) the lower level problem partially vanishes. By (the proof of) Theorem 3.2, ( x, y ) is a critical point of the nonlinear program (3.15) (with the upper and lower level constraints). Moreover λ must be a solution of system (3.14) (for ( x, y )). Recall (see proof of Theorem 3.2), that for ( x, y, λ ) we can construct a critical point ( x, y, λ * ) of P FJBL which also satisfies (3.11). For this particular critical point, (generically) one of the following sub-cases will hold in BL-2: The following subcases can occur.

Interpretation of the results in terms of P BL
In this section we analyze the relation between the original program P B L and the corresponding relaxation P FJBL (or P KKTBL ) in the generic case, assuming that P B L is KKT-regular (see Definition 3.1).
We begin with the case that ( x, y, λ ) is a local minimizer of P FJBL which satisfies the conditions BL-1 or BL-2 in Case 1(a), Case 2(a) above (see also Theorem 3.2). Then, according to Theorem 3.2 the point ( x, y, λ ) is an isolated non-degenerate local minimizer satisfying MPCC-LICQ, MPCC-SC, and MPCC-SOC. By the results in [4,Theorems 3.3,3.4] generically ( x, y, λ ) is an (isolated) local minimizer of P FJBL either of order p = 1 or of order p = 2. This means that with constants ε > 0, κ > 0 the inequality holds: satisfying (x, y, λ) − ( x, y, λ ) < ε. Note that the point ( x, y ) need not be feasible for the original problem P B L , i.e., y need not be a local minimizer of Q( x ). However, if ( x, y ) is feasible for P B L , then it is also an isolated local minimizer of P B L . This is stated in Corollary 4.1 Let P B L be a KKT-regular problem and let ( x, y, λ ) be an (isolated, non-degenerate) local minimizer of order p = 1 or p = 2 of the corresponding program P FJBL in (1.4). Then, the solution λ of (3.14) is uniquely determined. Moreover under these conditions, if ( x, y ) ∈ M B L (feasible) then it is also a local minimizer of P B L of (the same) order p = 1 or p = 2.
Proof Assume now, that (3.14) has two solutions λ =λ. Then for δ ∈ [0, 1] also ( x, y, (1 − δ)λ + δλ) are feasible points of problem (1.4) with the same minimal objective value f ( x, y ). So, for small δ > 0, ( x, y, (1 − δ)λ + δλ) is a local minimizer of P FJBL , contradicting the fact that ( x, y, λ) is an isolated critical point of P FJBL . Now, let ( x, y ) be feasible for P B L , i.e., y solves Q( x ), and consider any (x, y, λ) ∈ M FJBL with (x, y) ≈ ( x, y ). In view of the fact that λ is the unique solution of (3.14) a continuity argument shows that also λ ≈ λ must hold. Hence, in view of the inclusion (1.5), from (4.1) we can conclude that the point ( x, y ) is a (locally unique) minimizer of P B L of the same order p = 1 or p = 2.
Next, we consider the situation where ( x, y, λ ) is a local minimizer of P FJBL such that the condition BL-2, of Theorem 3.2 holds and MPCC-SC is not fulfilled (BL-2, Case 1(b), Case 2(b)). In this case λ is not the unique solution of (3.14). Let furthermore ( x, y) be feasible for P B L , i.e., y is a solution of Q( x ). In this situation we cannot expect that around x the solution y(x) of Q(x) can be described by a smooth function y(x) so that in this case, the original program P B L cannot be solved by a reduction approach. We give an illustrative (generic) example for the case BL-2, case 1(b).
Let us discuss the structure for the case BL-2, cases 1, 2 (b) further. We again analyze only the first subcase, where at a critical point ( x, y, λ) of P FJBL we consider the solution set of (3.14) with |J 0v ( x, y)| > m. This solution set is a polyhedron of dimension d, d ≤ |J 0v ( x, y)| − m. We denote this polyhedron by R( x, y). For each λ * ∈ R( x, y), the point ( x, y, λ * ) is a critical point of P KKTBL . The vertices of R( x, y) are given by those solutions λ * such that rank(∇ y v J 0 ( x,y,λ * ) ( x, y)) = |J 0 ( x, y, λ * )| = m. In the present situation the following bad behavior may occur: The points ( x, y, λ) with λ ∈ R( x, y) and J 0 ( x, y, λ) = ∅ (i.e., λ is in the relative interior of R( x, y)) may be local minimizers of P FJBL , but for a vertex λ of R( x, y), the point ( x, y, λ) is no longer a local minimizer. This means, in particular, that the set of local minimizers may not be closed. We give an example: The corresponding KKT relaxation P KKTBL is Obviously the points (x, y, λ 1 , λ 2 ) = (0, 0, 1, λ 2 ), with λ 2 > 0, are feasible with |J 0v (x, y)| = 2 > 1 = m and have the same objective value f (x, y) = 0. It is not difficult to see that these points are local minimizers of P KKTBL . However, for the vertex solution (λ 1 , λ 2 ) = (1, 0) of (3.14) (with λ 0 = 1) the corresponding point (0, 0, 1, 0) is no longer a local minimizer. Indeed, the feasible points (x, 0, 1, 0), x > 0, have a smaller objective value f (x, 0) = −x.
The preceding example also shows that in contrast to the other cases (see Corollary 4.1) in these cases BL-2, subcases 1, 2(b), the fact that ( x, y, λ ) is a local minimizer of P FJBL , and that ( x, y ) is feasible for BL, does not imply that ( x, y ) is a local minimizer of the original BL program. In these cases only a weaker statement than in Corollary 4.1 can be proven (see [3] for details).  that ( x, y, λ) is a critical point of P FJBL satisfying BL-2, subcases 1, 2(b). Assume that for all possible lower level multiplier vertices λ * of (3.14), the condition β J 0 ( x,y,λ * ) > 0 and A| T x M 0 holds (with A as in (3.16), M the feasible set of (3.15)). Then ( x, y ) is a local minimizer of the bilevel problem.
Proof The proof is similar to that of Corollary 5.2.5 in [3] for the case of P KKTBL .

Remark 4.1
For semi-infinite programs it is known that generically for a solution ( x, y ) of the BL formulation the condition LICQ is satisfied at y wrt. Q( x ). So for SIP we can restrict the KKT approach to P KKTBL and in Theorem 3.2 only the cases BL-1 and BL-2 subcase 1(a) can occur.

A numerical approach for solving BL
This section deals with the numerical aspects of the KKT approach for solving the original BL problem P B L . In particular, we discuss the consequences of the preceding genericity results for this approach. The results suggest that we have to distinguish between the cases BL-1, BL-2 Case 1, 2(a) and the cases BL-2 Case 1, 2(b). At a minimizer ( x, y, λ ) of P FJBL satisfying BL-1 (or BL-2,(a)) the regularity conditions MPCC-LICQ,-SC,-SOC, are fulfilled so that this minimizer can be computed numerically with methods from MPCC, e.g., with the smoothing approach, where the problem P FJBL is replaced by the perturbed version (see (2.5)): where τ > 0 is a (small) perturbation parameter. The program P(τ ) represents an ordinary finite program and can numerically be solved by using software for standard programs. From [4, Theorem 5] we obtain At minimizers ( x, y, λ ) of P FJBL in cases BL-2,1(b), 2(b), a degenerate structure occurs, so that for the computation of these minimizers the KKT approach may not work. We give a generic example for the case BL-2,1(b) where the minimizer of P KKTBL cannot be approximated by minimizers of P(τ ).
However, fortunately our analysis in Sect. 3 has revealed that in case BL-2, generically, the corresponding minimizer ( x, y ) of P B L can directly be found by computing a minimizer ( x, y ) of the reduced (standard) problem (3.15) and then by checking whether ( x, y ) is feasible for P B L (i.e., y solves Q( x )). Recall that for minimizers of P FJBL in the case BL-2, 1, 2(a) both approaches are possible. The results obtained so far suggest the Conceptual method for solving P B L (in the generic case): According to our analysis a solution of a (KKT-regular) program P B L can be obtained as a solution of P FJBL . The latter is either a (nondegenerate) solution of (3.15) (case BL-2) or a nondegenerate solution of P FJBL (case BL-1). So we try both alternatives: 1. Try to compute the minimizer ( x, y, λ) of P FJBL which satisfy BL-2 as a solution ( x, y) of the relaxation (3.15) (such that also (3.14) holds) and check whether y solves Q( x). If so, ( x, y ) is a minimizer of P B L .
2. Try to compute a (nondegenerate) solution ( x, y, λ) of P FJBL by applying the smoothing approach P(τ ) in (5.1) (or some other method) for solving the MPCC program (1.5). In case the procedure converges to a nondegenerate solution ( x, y, λ ) (case α = 0), check whether y solves Q( x). If so, ( x, y) is a minimizer of P B L . If the case α = 0 is detected, i.e., the method generates (x k , y k ) → ( x, y), α k → 0, we switch to step 1 (with last iterate (x k , y k ) as starting point).
Remark 5.1 Note that not any solution ( x, y, λ) of P FJBL computed by the above method need to lead to a solution ( x, y ) of P B L . We have to additionally verify sufficient conditions (second order conditions wrt. Q( x )) (see Corollary 4.1 for step 2, Corollary 4.2 for step 1).
To illustrate our solution method we consider 3 (simple) bilevel problems (all are taken from http://www-unix.mcs.anl.gov/~leyffer/MacMPEC). These problems are solved numerically with the help of the corresponding program P KKTBL using the smoothing approach P(τ ). The (standard) finite programs P(τ ) have been computed with the MATLAB procedure fmincon. In the following the numerical results are given in 2 decimal places, i.e., 12.00009 is written as 12.00.
To study the dependence of the procedure on the starting point (x 0 , y 0 ) we tried to solve the same problem using 50 random starting points first in [−1, 5] 2 and then in [−20, 20] 2 . Our smoothing approach succeeded for 33 starting points in the first and for 16 in the second case.