A Second-Order Sufficient Optimality Condition for Risk-Neutral Bi-level Stochastic Linear Programs

The expectation functionals, which arise in risk-neutral bi-level stochastic linear models with random lower-level right-hand side, are known to be continuously differentiable, if the underlying probability measure has a Lebesgue density. We show that the gradient may fail to be local Lipschitz continuous under this assumption. Our main result provides sufficient conditions for Lipschitz continuity of the gradient of the expectation functional and paves the way for a second-order optimality condition in terms of generalized Hessians. Moreover, we study geometric properties of regions of strong stability and derive representation results, which may facilitate the computation of gradients.


Introduction
We study bi-level stochastic linear programs with random right-hand side in the lowerlevel constraint system. The sequential nature of bi-level programming motivates a setting where the leader decides nonanticipatorily, while the follower can observe the realization of the randomness. A discussion of the related literature is provided in the recent [1]. A central result of [1] states that evaluating the leader's random outcome by taking the expectation leads to a continuously differentiable functional if the underlying probability measure is absolutely continuous w.r.t. the Lebesgue Communicated by René Henrion.
B Matthias Claus matthias.claus@uni-due.de 1 University Duisburg-Essen, Essen, Germany measure. This allows to formulate first-order necessary optimality conditions for the risk-neutral model. The main result of the present work provides sufficient conditions, namely boundedness of the support and uniform boundedness of the Lebesgue density of the underlying probability measure, that ensure Lipschitz continuity of the gradient of the expectation functional. Moreover, we show that the assumptions of [1] are too weak to even guarantee local Lipschitz continuity of the gradient. By the main result, second-order necessary and sufficient optimality conditions can be formulated in terms of generalized Hessians. As part of the preparatory work for the proof of the main result, we in particular show that any region of strong stability in the sense of [1, Definition 4.1] is a finite union of polyhedral cones. This representation is of independent interest, as it may facilitate the calculation or estimation of gradients of the expectation functional and thus enhance gradient descent-based approaches. The paper is organized as follows: The model and related results of [1] are discussed in Sect. 2, while the main result and a variation with weaker assumptions are formulated in Sect. 3. Sections 4 and 5 are dedicated to geometric properties of regions of strong stability and related projections that appear in the representation of the gradient. Results of these sections play an important role in the proof of the main result that is given in Sect. 6. A second-order sufficient optimality condition is formulated in Sect. 7. The paper concludes with a brief discussion of the results and an outlook in Sect. 8.

Model and Notation
Consider the optimistic formulation of a parametric bi-level linear program where z ∈ R s is a parameter and the data comprise a nonempty polyhedron X ⊆ R n , vectors c ∈ R n , q ∈ R m and the lower-level optimal solution set mapping Ψ : is real valued and Lipschitz continuous on the polyhedron if dom f is nonempty. Let Z : → R s be a random vector on some probability space ( , F, P) and denote the induced Borel probability measure by μ Z = P • Z −1 ∈ P(R s ). Furthermore, we introduce the set If dom f is nonempty and we impose the moment condition is well defined and Lipschitz continuous by [1,Lemma 2.4]. In a situation where the parameter z in (1) is given by a realization of the random vector Z that the follower can observe while the leader has to decide x nonanticipatorily, the upper-level outcome can be modeled by F(x). If we assume X ⊆ F Z and the leader's decision is based on the expectation, we obtain the risk-neutral stochastic program The following is shown in [ is well defined, Lipschitz continuous and continuously differentiable at any x 0 ∈ int F Z .
We shall discuss some key ideas of the proof and introduce the relevant notation: Setq and then, f admits the representation

Remark 2.1
The subsequent analysis does not depend on the specific structure ofq,d andÂ and applies whenever (3) holds with some matrixÂ satisfying rankÂ = s.
As the rows ofÂ are linearly independent, the set A := {Â B ∈ R s×s :Â B is a regular submatrix ofÂ} of lower-level base matrices is nonempty. A base matrixÂ B ∈ A is optimal for the lower-level problem for a given (x, z) if it is feasible, i.e., BÂ N is nonnegative. Furthermore, for any optimal base matrixÂ B ∈ A, there exists a feasible base matrixÂ B ∈ A satisfyinĝ and assume dom f = ∅, and then, A key concept is the region of strong stability associated with a base matrixÂ B ∈ A * given by the set on which f coincides with the affine linear mapping Under the assumptions of Theorem 2.1, we have and the gradient of Q E admits the representation where D := {q BÂ −1 B T :Â B ∈ A * }, and the set-valued aggregation mappings W, W : R n × D ⇒ R s are given by imply continuity of the weight functional M Δ : R n → R, for any Δ ∈ D.

Main Result
We shall first show that the assumptions of Theorem 2.1 are too weak to guarantee Lipschitz continuity ∇Q E .

Example 3.1 Consider the case wherê
The feasible set of the lower-level problem is compact for any parameters in the polyhedral cone F = {(x, z) ∈ R × R 2 : z 1 ≥ 0, x + z 2 ≥ 0}, which implies that dom f coincides with F for anyq ∈ R 4 . As the objective function is constant, any feasible base matrix is optimal for the lower-level problem. Denote the elements of A = A * byÂ 1 , . . . ,Â 6 , and let be the set of parameters for whichÂ i is feasible for the lower-level problem. A straightforward calculation shows that we havê , and letq i denote the part of upper-level objective function that is associated withÂ i . We havê and a straightforward calculation yields Let the density δ Z : R 2 → R of Z be given by , and it is easy to see that hold true whenever x ∈]0, 1] (Fig. 1). Fig. 1, the darker square depicts the intersection of supp μ Z and W( 1 holds for any x ∈ [0, 1] and a simple calculation shows that is not locally Lipschitz continuous at x = 0.
Our main result is the following sufficient conditions for Lipschitz continuity of ∇Q E : Note the density in Example 3.1 is not bounded. The proof of Theorem 3.1 requires some preliminary work and will be given in Sect. 6. If the support of μ Z is unbounded, we still obtain a weaker estimate for the gradients:

On the Geometry of Regions of Strong Stability
In view of (4) and (5), the gradient ∇Q E (x) is given by a weighted sum of the probabilities of the sets W(x, Δ) or W(x, Δ) for Δ ∈ D. As these sets are defined using regions of strong stability, we shall first study properties of the sets S(Â B ) witĥ A B ∈ A * .  Proof Let (x, z) be an arbitrary point of some region of strong stability S(Â B ). The n-dimensional kernel of (T , I s ) contains some nonzero element (x 0 , z 0 ), and we have Our main result on the structure of S(Â B ) is the following: Theorem 4.1 Assume dom f = ∅, then any region of strong stability is a union of at most (s + 1) |A * | polyhedral cones and at most (s + 1) |A * |−1 of these cones have a nonempty interior. Moreover, the multifunction S : A * ⇒ R n × R s is polyhedral, i.e., gph S is a finite union of polyhedra.
Before we get to proof of Theorem 4.1, we will establish the following auxiliary result: Proof The inclusion cl W ⊆ W is trivial. Moreover, for any ξ 0 ∈ W = int W and ξ ∈ W the line segment principle (cf. [6, Lemma 2.1.6]) implies [ξ 0 , ξ) ⊆ W and thus ξ ∈ cl W.
We are now ready to prove Theorem 4.1.

Corollary 4.2 Assume dom f = ∅, then any region of strong stability is star shaped and contains the n-dimensional kernel of (T , I s ).
Proof Radial convexity is an immediate consequence of Theorem 4.1, as any region of strong stability contains the line segments from the origin to any feasible point. The second statement directly follows from Proposition 4.1.
Two-stage stochastic programming can be understood as the special case of bi-level stochastic programming where the objectives of leader and follower coincide. In this case, any region of strong stability is a polyhedral cone and thus convex: Proposition 4.2 Assume dom f = ∅ andq = αd for some α > 0. Then, any region of strong stability is a polyhedral cone.
Proof We shall use the notation of the proof of Theorem 4.1 and denote the part ofd associated withÂ i byd i . Fix any (x, z) ∈ F and consider any base matriceŝ A i ,Â j ∈ A * that are feasible and thus optimal for the lower-level problem. Aŝ both base matrices are also optimal with respect to the upper-level objective function. Thus, S(Â i ) coincides with the polyhedral cone Θ i . = (0, 0, 0, 0) holds in Example 3.1, we see the assumptionq = αd for some α ∈ R in Proposition 4.2 cannot be replaced with the weaker condition that {q,d} is linearly dependent.

Properties of the Aggregation Mappings
We shall now study the aggregation mappings W and W defined in Sect. 2. The following result is the counterpart of Theorem 4.1: Δ) is a finite union of polyhedra for any (x, Δ) ∈ R n × D.
The proof of Theorem 5.1 will be based on the following auxiliary result: C 1 , . . . , C l ⊆ R k be closed and convex. Then, Proof As the sets C 1 , . . . , C l are closed and the interior of a union is contained in the union of the interiors, we have i=1,...,l: where the first equality is due to the fact that the closure of a union equals the union of the closures and the second equation is a direct consequence of the line segment principle. Thus, For the reverse inclusion, suppose that there is some By definition, there are sequences {x n } n∈N ⊂ R k and { n } n∈N ⊂ R >0 satisfying x n → x and B n (x n ) ⊆ i=1,...,l C i for all n ∈ N. As i=1,...,l: int C i =∅ C i is closed, there exists some N ∈ N such that x n / ∈ i=1,...,l: int C i =∅ C i for all n ≥ N . Together with the previous considerations, the strong separation theorem (cf. [9, Theorem 11.4]) yields the existence of some δ N ∈ (0, N ] such that As any C i with int C i = ∅ is contained in an affine subspace of dimension strictly smaller than k (cf. [2, Section 2.5.2]), we obtain the contradiction which completes the proof.

Corollary 5.1 Let C ⊆ R k be a finite union of polyhedra (polyhedral cones). Then, cl int C is a finite union of polyhedra (polyhedral cones).
Proof The above statement is an immediate consequence of Lemma 5.1.
Proof (Proof of Theorem 5.1) As D is finite, it is sufficient to consider the multifunctions W(·, Δ) : R n ⇒ R s for fixed Δ ∈ D. We have which is a finite union of polyhedra by Corollary 5.1. Similarly, W(x, Δ) admits the representation By Theorem 4.1 and Corollary 5.1, the set is the intersection of a finite union of polyhedral cones and the affine subspace {(x , z ) ∈ R n × R s : x = x} and thus a finite union of polyhedral cones for any x ∈ R n and anyÂ B ∈ A * .
The following result on W is a simple consequence of the fact that the constraint system describing a region of strong stability only imposes conditions on (T x + z). Proof Fix any x, x ∈ R n , z ∈ R s and set z = z + T (x − x ), then T x + z = T x + z and thus Similarly, for anyÂ B ∈ A * , (x, z) ∈ S(Â B ) holds if and only if 1. there exists some y ∈ R m such that holds for any Δ ∈ D, which completes the proof.

Proof of the Main Result
We are finally ready to prove Theorem 3.1 based on the results of Sects. 4 and 5 as well as the two following auxiliary results: Lemma 6.1 Assume dom f = ∅, and let μ Z ∈ P(R s ) be absolutely continuous w.r.t. the Lebesgue measure, then holds for any x ∈ R n , Δ ∈ D and t ∈ R s .
Proof By the arguments used in the proof of [1, Lemma 4.2], we have where N x ⊂ R s is contained in a finite union of hyperplanes. Consequently, holds for any fixed Δ ∈ D. As both and there exists a finite upper bound α ∈ R for the Lebesgue density of μ Z , we have where λ s denotes the s-dimensional Lebesgue measure. By Theorem 5.
by Cavalieri's principle, which completes the proof.
Proof (Proof of Theorem 3.1) Continuous differentiability on int F Z is a direct consequence of [1,Corollary 4.7]. Fix any x, x ∈ int F Z ; then, (4) and Lemma 6.2 yield and thus the desired Lipschitz continuity.
Proof (Proof of Theorem 3.2) Fix any κ > 0. As μ Z is tight by [3, Theorem 1.3], there exists a compact set C(κ) ⊂ R s such that μ Z [R s \ C(κ)] < κ. Combining this with the estimate from the first part of the proof of Lemma 6.2 and using the same notation established therein, we see that holds for any Δ ∈ D. Thus, We therefore have and choosing κ = 2|D| yields the desired estimate. denote the set of points at which ∇Q E is differentiable, then generalized Clarke's Hessian of Q E at some x ∈ int F Z is the nonempty, convex and compact set Let the feasible set of (2) be given by X = {x ∈ R n : Bx ≤ b} with some B ∈ R k×n and b ∈ R k . The following second-order sufficient condition is based on [7]: Theorem 7.1 Assume dom f = ∅, X ⊆ int F Z and let μ Z be absolutely continuous w.r.t. the Lebesgue measure and have a bounded support as well as a uniformly bounded density. Moreover, let (x,ū) be a KKT point of (2), i.e., and assume that any H ∈ ∂ 2 Q E (x) is positive definite on h ∈ R n : e i Bh = 0 ∀i :ū i > 0 e j Bh ≤ 0 ∀ j :ū j = e j Bx = 0 .
Then,x is a strict local minimizer with order 2 of (2), i.e., there exist a neighborhood U ofx and a constant L > 0 such that holds for any x ∈ X ∩ U .
Proof This is a straightforward conclusion from [7, Theorem 1].

Remark 7.1
There are various other approaches for optimization problems with data in the class C 1,1 , which consists of differentiable functions with locally Lipschitzian gradients. For instance, second-order optimality conditions can also be formulated based on Dini (cf. [5,Section 4.4]) or Riemann (cf. [8]) derivatives.

Conclusions
We have derived sufficient conditions for Lipschitz continuity of the gradient of the expectation functional arising from a bi-level stochastic linear program with random right-hand side in the lower-level constraint system. Invoking the structure of the upper level constraints, we used this result to formulate a second-order sufficient optimality condition for the risk-neutral bi-level stochastic program in terms of the generalized Hessian of Q E . Moreover, the main result on the geometry of regions of strong stability and its counterpart for the aggregation mapping W may facilitate the computation or sample-based estimation of gradients of the expectation functional, which enhances gradient descent-based methods. As any region of strong stability is a finite union of polyhedral cones, a promising approach is to employ spherical radial decomposition techniques to calculate ∇Q E (cf. [4,Chapter 4]). The details are beyond the scope of this paper but shall be addressed in future research.