Estimates of Generalized Hessians for Optimal Value Functions in Mathematical Programming

We consider the optimal value function of a parametric optimization problem. A large number of publications have been dedicated to the study of continuity and differentiability properties of the function. However, the differentiability aspect of works in the current literature has mostly been limited to first order analysis, with focus on estimates of its directional derivatives and subdifferentials, given that the function is typically nonsmooth. With the progress made in the last two to three decades in major subfields of optimization such as robust, minmax, semi-infinite and bilevel optimization, and their connection to the optimal value function, there is a need for a second order analysis of the generalized differentiability properties of this function. This could enable the development of robust solution algorithms, such as the Newton method. The main goal of this paper is to provide estimates of the generalized Hessian for the optimal value function. Our results are based on two handy tools from parametric optimization, namely the optimal solution and Lagrange multiplier mappings, for which completely detailed estimates of their generalized derivatives are either well-known or can easily be obtained.

(1.1) We only consider inequality constraints, in order to focus our attention on the main ideas.Note however that all the results in this paper remain valid, with the corresponding adjustments, if we include equality constraints to problem (1.1).Our focus will be on the optimal value function ϕ(x) := min We assume throughout the paper that S(x) ∅ for all x ∈ R n .As a consequence, ϕ will be finite-valued at all x ∈ R n .For most results obtained in this paper, we can easily accommodate the case where ϕ (1.2) is an extended real-valued function.But to concentrate on the main points, we leave this specific case for future analysis.
The function ϕ (1.2) has played a major role in the development and understanding of the structure of the underlying parametric optimization problem (1.1), and has been substantially analyzed in the literature.Initial work on stability/sensitivity analysis of optimization problems is almost as old as the field of optimization itself, given that early works on linear programming and the simplex method already provided interesting insights on the behavior of optimal values under perturbations; see, example, the 12th Chapter of the 1963 book by Dantzig [9].
The study of continuity and differential properties of ϕ, in the context of nonlinear optimization, which is our main focus here, grew dramatically following works by Fiacco [16], Gauvin and Dubeau [17], amongst many others.Recent publications on the topic include the papers [23,24], where the tools by Mordukhovich are used to provide different types of subdifferential estimates for ϕ.
It should be emphasized that most of the aforementioned works focus on the derivation of continuity properties and the estimation of directional derivatives and subdifferentials.As far as second order differentiation properties for ϕ are concerned, a few publications have been devoted to the estimation of second order-type directional derivatives; see, e.g., [33,34].We are however not aware of any work pursuing generalized Hessian evaluations for ϕ, in the sense considered in this paper; cf.(1.4).Fiacco [16,Chapter 3] provides Hessian formulas for ϕ in the case where S (1.3) is single-valued and continuously differentiable.This assumption is very restrictive and cannot hold for most applications.However, inducing second order sufficient conditions for approximating problems can give rise to efficient smoothing methods for many problems involving nonsmooth functions [5,6].
We consider various scenarios in this paper, including the case where S (1.3) is single-valued and continuously differentiable, leading to results that coincide with those by Fiacco just mentioned above; see details in the next subsection and in Section 4. In [7], generalized Taylor expansion and other second order generalized differentiation tools are applied to a function closely related to ϕ; but their result are completely different from the ones obtained here.
It is also important to recall that the optimal value function naturally appears either in the constraints or objective functions of many mainstream optimization problems, including robust [1], minmax [8], generalized semi-infinite [31], and bilevel optimization [10,37].Considering the recent developments in the construction of Newton-type methods using the Mordukhovich generalized second order differentiation tools (see [19,26,27]), the results of this paper could play an important role in solving the aforementioned problems.
Our main goal in this paper is to provide estimates of the Generalized Hessian of the optimal value function ϕ, in the sense of Mordukhovich.Note that for x ∈ R n and x ∈ ∂ϕ( x), the generalized Hessian of ϕ in the sense of Mordukhovich is defined by for all x * ∈ R n .Here, ∂ϕ denotes the subdifferential of ϕ in any standard sense (convex analysis, Clarke or Mordukhovich) and D * (∂ϕ) stands for the coderivative of Mordukhovich [22].In the next subsection, we provide a general flavour of how the estimates developed in this paper look like.Further details on the mathematical tools and the proofs of the main results are provided in Sections 2-5.
1.2.Summary of the main results-outline of the paper.We assume throughout the paper that the functions f and g, cf.
Similarly, λ will be used to represent Λ when it is single-valued.
If the constraint function g is independent from the parameter x and the corresponding optimal solution set-valued map S is single-valued and locally Lipschitz continuous around x, then we show in Subsections 4.1-5.1 that under further appropriate conditions, the generalized Hessian of ϕ (1.2) can be obtained as where ȳ = s( x) and ∂s( x) ⊤ stands for the set of transposed generalized Jacobians of s in the sense of Clarke, cf.(2.5).If we further impose the continuous differentiability on s at the point x, then we can obtain the equality If we drop the assumption that the feasible set of the parametric optimization problem (1.1) is unperturbed, we can obtain the following estimate for the generalized Hessian of ϕ, provided that S (1.3) and Λ (1.5) are both single-valued and Lipschitz continuous around x and ( x, ȳ), respectively, with ȳ = s( x) and ū = λ( x, ȳ): Here, the symbol ∂ refers to the subdifferential, in the sense of Mordukhovich, cf.
(2.4).Obviously, as in the case of (1.7), supposing that λ and s are both single-valued and differentiable functions, we will have (1.10) Before moving to that, we first provide some background results, which are useful in their own right, in particular, in the understanding of the subdifferential of ϕ (1.2) and the Lipschitz-likeness, and generalized differentiability properties of the related mappings S (1.3) and Λ (1.5); cf.Section 3. Some final comments on limitations and potential applications of the results from this paper, as well as topics for future research, are discussed in Section 6.

Notation and mathematical tools needed
We start this section with some notation and basic concepts used throughout the paper.We will use v j , j = 1, . . ., n, to denote the jth component of a vector v ∈ R n , while v i ∈ R n , i = 1, . . ., m, will represent the ith vector component of a vector of vectors v ∈ m i=1 R n .Furthermore, for v ∈ R n and I ⊆ {1, . . ., n}, writing v I will refer to all components v i of v for which i ∈ I. To avoid confusions at some points, we will use { n } or  n for a n-dimensional zero vector.When a distinction is also necessary, we will use I n for the n × n identity matrix.We use the notation Ψ : R n ⇒ R m for a set-valued mapping and the lower case form ψ : R n → R m to symbolize a singlevalued mapping.Most notably, as already mentioned in the previous section, such transitions from upper to lower case will be used for the optimal solution and Lagrange multipliers set-valued mappings S (1.3) and Λ (1.5), in the forms s and λ, respectively.Additionally, for a set-valued mapping Ψ : R n ⇒ R m , it will be said to be closed if its graph denoted by gph Also recall that for a set C ⊆ R n , co C will be used to denote the convex hull of C.
For a closed subset C of R n , the Mordukhovich (also known as basic or limiting) normal cone to C at one of its points x is the set (see, e.g., [22]) where N C denotes the dual of the contingent/Boulingand tangent cone to C: We have the following well-known result, which can be found in [21,30].
, where Ξ ⊆ R m is a closed set and ψ : R n → R m a Lipschitz continuous function around x, then we have provided the following basic-type qualification condition is satisfied at x: Equality holds in (2.2), provided that the set Ξ is normally regular at ψ( x), i.e., N Ξ (ψ( x)) = N Ξ (ψ( x)).This is obviously the case if Ξ is a convex set.
In (2.2) and (2.3), the term ∂ v, ψ ( x) refers to the Mordukhovich subdifferential of the function x → m i=1 v i ψ i (x) at x.If ψ : R n → R, then the Mordukhovich (also known as basic or limiting) subdifferential of ψ at x can be defined by where epiψ stands for the epigraph of ψ.If ψ is Lipschitz continuous around x, then we can also define the Clarke (or convexified) subdifferential of ψ at x: ∂ψ( x) := co ∂ψ( x). (2.5) In the case where ψ is convex, ∂ψ( x) and ∂ψ( x) coincide with the subdifferential in the sense of convex analysis.
Using the above concept of basic normal cone, we now introduce the notion of coderivative for a given set-valued map Ψ : R n ⇒ R m , at some point ( x, ȳ) ∈ gph Ψ, which corresponds to a homogeneous mapping D * Ψ( x| ȳ) : R m ⇒ R n , defined by for all y * ∈ R m .Here, N gph Ψ represents the basic normal cone (2.1) to gph Ψ.The following chain rule from [21, Theorem 5.1] will be pivotal in this work.
Theorem 2.2.Let the set-valued mappings Φ : R n ⇒ R m and F : R m ⇒ R q have closed graph.Furthermore, let z ∈ (F • Φ)( x) and assume that the set-valued map is locally bounded around ( x, z) and the qualification condition is fulfilled.Then for all z * ∈ R q , we have The following result from [13, Proposition 3.3], providing a coderivative estimate for a Cartesian product of finitely many set-valued mappings, will also be useful in the development of the main results of this paper.
Next, we provide an estimate of the coderivative of a set-valued mapping defined by the convex hull of another set-valued mapping, which is not necessarily convexvalued.To proceed, consider Ψ : R n ⇒ R m and define Φ : R n ⇒ R m by Φ(x) := co Ψ(x). (2.11) Ψ is assumed to be nonconvex-valued at some points of R n , as "co" can obviously be dropped at points where the map is convex.An upper estimate of the coderivative of Φ in terms of the coderivative of Ψ can then be obtained as follows.To make the presentation of the result easier, we introduce the set (2.12) Proposition 2.4.Consider ( x, ȳ) ∈ gph Φ and suppose that the set-valued mapping Ψ (2.11) is closed and locally bounded around x. Furthermore, assume that (2.9), with Ψ s := Ψ for s = 1, . . ., m + 1, holds for all (a, b) ∈ Γ( x, ȳ).Then for all y * ∈ R m , we have Proof.Start by recalling that as Φ(x) ⊆ R m for all x ∈ R n , it follows from the wellknown Theorem of Carathéodory that Φ(x) can be rewritten as Based on this expression, we can easily check that Φ can take the form s=1 a s = 1 .Considering the continuous differentiability of ℓ, the closedness of the set Ξ and the set-valued mapping Ψ, it follows from the chain rule above, cf.Theorem 2.2, that for x ∈ R n and ȳ ∈ Φ( x), it holds that ) is locally bounded around ( x, ȳ).Obviously, from the definition of this mapping, M(x, y) ⊆ Ξ × m+1 s=1 Ψ(x) for all (x, y).Hence, M is locally bounded around ( x, ȳ) given that Ξ is a bounded set and Ψ is assumed to be locally bounded around x. Now observe that any b := (b s ) n+1 s=1 ∈ m+1 s=1 R m is a m × (m + 1) matrix.We rearrange it as a m 2 + m-dimensional column vector and proceed with the following notation for the rest of the proof: for y * ∈ R m .Considering this formula, the application of Theorem 2.3 to the setvalued mapping Q at the vector ∇ℓ(a, b) ⊤ y * leads to the inclusion given that the coderivative of the constant mapping defined by Ξ is { n } and condition (2.9) is assumed to hold for all To close this section, we introduce the Lipschitz-likeness property that will used in the next section.A set-valued mapping Ψ : R n ⇒ R m is Lipschitz-like at ( x, ȳ) ∈ gph Ψ if there are neighborhoods U of x, V of ȳ, and a constant κ >  such that d(y, Ψ(x)) ≤ κ x − u for all x, u ∈ U and y ∈ Ψ(u) ∩ V , where d stands for the usual distance function.We can add that a weaker Lipschitz property, known as calmness, can be obtained if we fix u to x; precisely, Ψ is calm at ( x, ȳ) ∈ gph Ψ if there are neighborhoods U of x, V of ȳ, and a constant κ >  such that d(y, Ψ(x)) ≤ κ x− x for all x ∈ U and y ∈ Ψ(u)∩V .A closed set-valued mapping Ψ is Lipschitzlike around ( x, ȳ) if and only if the condition known as the coderivative/Mordukhovich criterion, is satisfied at ( x, ȳ); cf.[22, Theorem 5.7] and [30,Theorem 9.40].Observe for instance that if this criterion holds for F in (2.8) and Ψ i (i = 1, . . ., p) for (2.9), then the corresponding conditions are automatically satisfied.This is therefore the case if these set-valued maps are closed and Lipschitz-like around the corresponding points.

On the subdifferential of the optimal value function
To start this subsection, we recall that the fundamental goal of this paper is to develop generalized Hessians (also known as second order subdifferentials) of the optimal value function ϕ (1.2).Hence, it would be natural to first clarify the expressions of the subdifferentials or first order subdifferentials, to be precise, of this function.These quantities and further properties have been extensively studied in the literature; see, e.g., [4,8,16,17,23,24,35] and references therein.Below, we recall the relevant aspects of these properties while adding some crucial aspects based on the concaveconvexity, that we define below.Before, note that from here on, the feasible set of the parametric problem (1.1) will be defined by the following set-valued mapping: A function ψ defined from R n+m to R by (x, y) → ψ(x, y) will be said to be concaveconvex if the function ψ(., y) is concave for all y ∈ R m , while the function ψ(x, .) is convex for all x ∈ R n .Subsequently, problem (1.1) will be said to be convex-concave if the functions f and g i , i = 1, . . ., p, are concave-convex.Similarly, problem (1.1) will just be said to be convex if the latter functions are convex w.r.t.y.We will also use the Mangasarian-Fromovitz constraint qualification (MFCQ) and the linear independence constraint qualification (LICQ) where It is well-known that if the LICQ holds at ( x, ȳ), then the MFCQ will automatically hold at the same point.
Then for all x near x, with s(x) := y, it holds that ∂ϕ(x) = u∈Λ(x,y) (iii) If gph K is compact and the LICQ holds at (x, y), for all y ∈ S(x), then the following equality holds around x, with u = λ(x, y): Proof.(i) Equality (3.4) is a well-known result by Danskin [8].As for (3.5), the maximization case proven in [3] can easily be adapted to our minimization case in (1.2).
(ii) As the MFCQ holds at ( x, y) with ȳ = s( x), and remains persistent in some neighborhood of this point, it is well-known that the formula (3.6) will hold in some neighborhood of x, given that gph K is compact; see, e.g., [35].(iii) Start by noting that as in the previous case, the LICQ being persistent near ( x, y) for all y ∈ S( x), where it holds, then we have (3.7) from Gauvin and Dubeau [17] given that gph K is compact.
It is important to recall that the compactness assumption on gph K can be relaxed by instead imposing some set-valued-type continuity properties on S (1.3); see, e.g., [4,17,23,24,35].But for the purpose of simplifying the framework used in this paper, we do not consider such relaxations here.However, most of the developed results will remain valid under such assumptions.It is also important to mention that the generalized Hessian constructions developed in this paper can rely on any exact formulas of the subdifferential of ϕ; exact formulas for subdifferentials of ϕ based on intersection operators can be found in [2, Theorem 5.1d] or [35,Chapter 6], for example.But, of course, depending on the expression of ∂ϕ(x) used, we might need different requirements and the resulting estimates might also be different.
It is clear from Theorem 3.1 that the subdifferential of ϕ is a "function" of the Lagrange multipliers and optimal solution set-valued mappings.It is therefore natural to imagine that second order subdifferentials for ϕ can primarily be constructed based on the generalized differentiation tools for this mappings.Hence, to get well prepared for our main results in the next sections, we first provide some useful properties of these mappings here.We start with a coderivative estimate for Λ and deduce a condition ensuring that this mapping is Lipschitz-like.From here on, we will also assume that the graph of the set-valued mapping Λ (1.5) is nonempty.Given that S(x) ∅ for all x ∈ R n , the latter is automatically satisfied if there exists a point (x, y) ∈ gph S, where a constraint qualification, e.g., the MFCQ or LICQ, holds.Also recall that for a point ( x, ȳ, ū) ∈ gph Λ, we can define the standard partition of the indices of the constraints of the feasible set of problem (1.1).This allows us to introduce the following special class of multipliers, which permits an elegant presentation of the remaining results of this section: with ( x, ȳ, ū) ∈ gph Λ and u * ∈ R p .
Proposition 3.2.Consider a point ( x, ȳ, ū) ∈ gph Λ and suppose that we have Then for all u * ∈ R p , we have the following upper estimate: Furthermore, Λ is Lipschitz-like around ( x, ȳ, ū), provided we also have Proof.Start by quickly recalling that the set-valued mapping Λ is closed, given that all the functions involved in problem (1.1) are assumed to continuously differentiable.Furthermore, it can be written as Then, by the definition of the concept of coderivative (2.6), (x * , y * , −u * ) ∈ N gph Λ ( x, ȳ, ū).Hence, it follows from Theorem 2.1 that there exists a vector (a, b, c) provided that the counterpart of (2.3) holds at ( x, ȳ, ū).For the latter requirement and the finalization of the estimate in (3.15), note that Combining this equality with (3.14), one can easily check that the counterpart of (2.3) is satisfied under assumption (3.10).As for the Lipschitz-likeness of Λ around ( x, ȳ), observe from the discussion above that Hence, under (3.12),D * Λ(( x, ȳ)| ū)() = {}.This ensures that Λ is Lipschitz-like around ( x, ȳ, ū), based on the Mordukhovich coderivative criterion (2.16).
Remark 3.3.The conclusions of Proposition 3.2 remain valid if the multipliers set ℧( x, ȳ, ū, u * ) in (3.9) is replaced by the following one: The implications in (3.10) and (3.12) correspond to M(or Mordukhovich)-type conditions while (3.15) can be labeled as M-type estimate of the coderivative of Λ.Similarly, with (3.16), we will respectively have C(or Clarke)-type conditions and a C-type estimate for the coderivative of Λ.More details on constructions and vocabulary in this vein can be found, for example, in [11,12].It is also important to recall that the result in Proposition 3.2 can be obtained if we replace condition (3.10) by the weaker calmness of the set-valued mapping where the function ψ and the set Ξ are defined in (3.13); see, e.g., [2,18].
Next, we provide a simple, yet powerful relationship between the coderivatives of S and Λ, that will allow the derivation of a complete estimate of the former based on Proposition 3.2.Proposition 3.4.Suppose that the functions f (x, .)and g i (x, .),i = 1, . . ., p, are convex, for all x ∈ R n , and the MFCQ holds at ( x, ȳ) ∈ gph S. Then for all y * ∈ R m , (3.17) If additionally, condition (3.10) holds at ( x, ȳ, u) for all u ∈ Λ( x, ȳ), then for all y * ∈ R m , Proof.Under the assumption that the functions f (x, .)and g i (x, .),i = 1, . . ., p, are convex, for all x ∈ R n and the MFCQ holds at ( x, ȳ) ∈ gph S, it follows that near this point, the optimal solution set-valued mapping can take the form Observe from the definition of the set-valued map Q and the function Π 1 that we have ).The set-valued mapping Λ is closed, given that the functions f and g are assumed to be continuously differentiable throughout the paper.Then applying the chain rule from Theorem 2.2 to the above expression of S, provided that the set-valued mapping M(x, y) ).The latter is indeed true and can easily be shown.On the other hand, we have considering the fact that the graph of Q is the same as that of Λ. Combining the last equality in (3.20) with inclusion (3.19), we have (3.17).As for the Lipschitz-like property of S at ( x, ȳ), this is based on inclusion (3.17), while applying the coderivative criterion (2.16), given that S is a closed set-valued mapping.
Remark 3.5.Suppose that the functions f (x, .)and g i (x, .),i = 1, . . ., p, are convex, for all x ∈ R n , and the MFCQ holds at ( x, ȳ) ∈ gph S. If in addition, the set-valued mapping S is closed, then S is Lipschitz-like around ( x, ȳ) provided that Λ is Lipschitzlike around ( x, ȳ, u), for all u ∈ Λ( x, ȳ).It is clear from Proposition 3.2 that the latter holds if the qualification condition (3.12) holds at ( x, ȳ, u), for all u ∈ Λ( x, ȳ).This estimate of the coderivative of S in (3.18) was obtained in [25, Theorem 4.3] using a different approach, which relies on the computation of the coderivative of a certain normal cone map.Also note that the mapping in the latter reference is slight more general and the assumption framework is based on calmness constructions.Our results are amenable to more general settings, and corresponding calmness based constructions are possible; see some related discussions in the next sections.
Coming towards the end of this section, it is important to emphasize that the Lipschitz-like property for Λ (1.5) or S (1.3), in the sense of set-valued mappings, is not necessary to achieve the principal goal of this paper.The main condition in the process is (3.10), which allows us to estimate the coderivative of the relevant map.Hence, let us provide a small example to illustrate the condition (3.10).
Finally, to close this section, we would like to mention that we will occasionally require that the set-valued mapping Λ or S is locally single-valued and Lipschtz continuous.We do not specifically address this topic in this paper, as it is out of the scope of our work.But it is important to recall that in the case of S, many publications have addressed this question; see Subsections 4.2 and 5.1, where more details and references are provided.As for Λ, it is closely related to which has been widely studied in the literature; e.g., [29,15].Obviously, the coderivative calculations will easily show some close interplays between the two mappings, similarly, though much simpler than, to those between S and Λ provided in Proposition 3.4.More precisely, note that it is obvious that Λ can be written as where the function ψ and the set-valued mapping Φ are respectively defined by ψ(x, y, u) := − ∇ y L(x, y, u) ⊤ , g(x, y) ⊤ ⊤ and Φ(u) := { m } × N R p + (u).Many papers have been devoted to conditions ensuring that general mappings of the above form are locally single-valued and Lipschitz continuous; see, e.g., [14,29].4. Generalized Hessian estimates in the presence of convexity 4.1.Case where the feasible set is unperturbed.In this subsection, we assume that the feasible set of problem (1.2) is independent of x.More precisely, for g : R m → R p , our attention here will be on the following function assumed to be finite-valued: where the sets Λ( x, y) and ℧( x, y, u, ) are defined in (1.5) and (3.9), respectively.
Proof.Note that under the assumptions of the theorem, we have equality (3.5) from Theorem 3.1.This equality can equivalently be written as ∂ϕ(x) = ∇ x f • Ψ(x), where Ψ(x) := {x} × S(x).Further observe that the set-valued map Ψ is closed, given that the counterpart of S (1.3) for (4.1) can take the form S(x) and is thus closed as ϕ is locally Lipschitz continuous under the imposed continuous differentiability of the function f and compactness of the set Y .Also note that the set-valued mapping M(x, z) for some neighborhoods X and Z of x and x, respectively, with assumed X to be bounded.Hence, we have from the chain rule in Theorem 2.2.Finally, for the right-hand-side of (4.3), it follows that since S is closed as shown above, applying Theorem 2.3 to Ψ leads to given that the corresponding counterpart of qualification condition (2.9) is automatically satisfied, as  ∈ D * S( x| ȳ)() by the positive homogeneity of the coderivative mapping.Finally, inclusion (1.8) is obtained by combining (4.3) and (4.4).As for inclusion (4.2), it obviously follows from (3.18).One can easily check that if the matrix ∇ 2 yx f ( x, y) is full rank, then the qualification condition (3.10) holds.
The tools involved in the calculations in this results are standard and can easily be checked; note in particular that requiring the matrix ∇ 2 yx f ( x, y) is full rank helps to ensure that condition (3.10) is satisfied.To illustrate this result, we consider the following example with a linear program having left-hand-side perturbation.where Y := {y ∈ R m | Ay ≤ b} is compact and A is a full rank matrix.For a couple ( x, x), we have x ∈ ∂ϕ( x) if and only if x ∈ S( x).We also have ū = λ( x, x), as A is full rank.For ( x, x, ū), consider the definitions of ν, η, and θ given in (3.8) for the constraint set Y , let Further note that the qualification condition (3.10) holds at the point ( x, x, ū), as A has full rank.It therefore follows from inclusion (4.2) that To conclude this subsection, recall that different structures are possible for the set ℧( x, y, u, ), which depends on the reformulation of the complementarity conditions defined in Λ( x, y), as discussed in Remark 3.3.

4.2.
Single-valued optimal solution map.We assume here that the optimal solution mapping S (1.3) is single-valued; i.e., S := s.On the other hand, we let Λ (1.5) be setvalued.We can then estimate the generalized Hessian of ϕ (1.2) as follows.
Theorem 4.3.Suppose that gph K (3.1) is compact and S is single-valued (i.e., S := s) and Lipschitz continuous around x. Furthermore, assume that problem (1.1) is convex and the MFCQ holds at ( x, ȳ) with ȳ = s( x) and the qualification condition holds at ( x, ȳ, u) with ȳ = s( x) for any u ∈ Λ( x, ȳ).Then, for x ∈ ∂ϕ( x) and any x * ∈ R n , we have the following estimate for the second order subdifferential of ϕ: If additionally, we suppose that conditions (3.10) and (3.12) are satisfied at ( x, ȳ, u) for all u ∈ Λ( x, ȳ) with ȳ = s( x), then the qualification condition (4.5) holds at the relevant points and for x ∈ ∂ϕ( x) and x * ∈ R n , we have Then by definition, given that s is assumed to be locally Lipschitz continuous around x.We can find a subsequence of {c k }, with the same notation, provided there is no confusion, such that c k / c k converges to some c with c = 1.Inserting this subsequence in the second line of (4.8) and dividing the terms containing c k by its norm, we arrive at for the point ( x, ȳ, c), as k → ∞.Given that the MFCQ holds at ( x, ȳ), it follows that we must have c = .This contradicts the hypothesis that c k ≥ k for all k ∈ N. Hence, M is locally bounded around ( x, x).Therefore, by the chain rule in Theorem 2.2, given that Φ • ψ is closed, as s is locally Lipschitz continuity around x and the functions f and g are continuously differentiable.Next, note that the set-valued mapping defined by } is locally bounded around all the points ( x, z) with z ∈ Φ • ψ( x) and ∇ x L( z) = x, given that s is Lipschitz continuous around x and for some neighborhoods X, A, B, C of x, ā, b, and c, respectively, with X being a bounded set while A is a compact set, we have where From the product rule in Theorem 2.3, it follows that for any z * = (x * , y * , u * ) and some ū ∈ Λ(a, b) such that z = (a, b, ū), we have given that the counterpart of (2.9) is automatically satisfied.This obviously leads to D * Φ(a, b|z)() ⊆ D * Λ(a, b| ū)().Then considering the fact that given that (a, b, u) ∈ Φ • ψ( x) is equivalent to a = x, b = ȳ and u ∈ Λ( x, ȳ).Clearly, since ∂ (a * , b * ), ψ ( x) = a * + ∂ b * , s ( x), we have (4.6) from a combination of inclusions (4.9), (4.10), and (4.11).As for inclusion (4.7), it follows from the insertion of (3.15) in (4.6).Also note that that (4.5) is automatically satisfied if condition (3.12) holds at the relevant points; i.e., ( x, ȳ, u) with ȳ = s( x) for any u ∈ Λ( x, ȳ).The latter follows from the fact that the fulfilment of condition (3.12) at ( x, ȳ, u) ensures that we have D * Λ( x, ȳ|u)() = {}, by the coderivative criterion (2.16).
A few remarks are in order to clarify a few things in this result.At first, to guaranty that S is single-valued and locally Lipschitz continuous around x, one just needs to additionally impose that the Strong Second Order Sufficient Condition (SSOSC) and the constant rank constraint qualification (CRCQ) are satisfied; cf.[28].Recall that the SSOSC is said to hold at a point ( x, ȳ) if for all u ∈ Λ( x, ȳ) and all d  such that As for the CRCQ, it is said to hold at ( x, ȳ) if there is a neighborhood W of this point and any subset I of {i = 1, . . ., p| g i ( x, ȳ) = }, the family of gradients {∇ y g i (x, y)| (x, y) ∈ W } has the same rank.To be precise, according to the latter reference, under the MFCQ, CRCQ, and SSOSC, s is in fact a piecewise continuously differentiable (PC 1 ) function.Furthermore, its Clarke generalized Jacobian can be obtained as where cl int denotes the closure of the interior of Supp s, s i := x| s(x) = s i (x) , cf.
[10, Chapter 4].A more detailed expression of (4.12) in terms of the corresponding problem data can be found in the latter reference.Further details on PC 1 functions can also be found in [32].Secondly, note that a completely detailed upper estimate of ∂ 2 ϕ( x|x)(x * ) based on (4.7) is possible.One way to do this is simply to observe that the last term in the right-hand-side of the formula (4.7) is included in the set where ∂s( x) represents the Clarke generalized Jacobian of s at x.There are various results on the computation of ∂s( x) in the literature; see [10] and references therein.
It is important to highlight the fact condition (4.5) automatically holds if the mapping Λ is Lipschitz-like holds at all the relevant points; cf.details in Section 3. Next, we provide an example showing that estimates in Theorem 4.3 are still possible in some cases where condition (4.5) fails.
from [28].We can easily check that the optimal solution map S reduces to The compactness of gph K, required in Theorem 3.1(ii), does not hold.But (3.6) remain valid thanks to the continuity of the solution function s and the fact the MFCQ holds at any feasible point.Problem (4.13) is obviously convex in y and calculations show that We can easily show that condition (4.5) fails.However, thanks to the fact that Λ is polyhedral, the detailed estimates (see (4.6)) for ∂ 2 ϕ( x|x)(x * ) can still be obtained thanks to the calmness property; cf.discussions following Proposition 3.4.

Generalized Hessian estimates in the absence of convexity
We assume here that the functions involved in (1.1) are not necessarily convex.5.1.Single-valued optimal solution and multipliers maps.We assume throughout this subsection that the optimal solution mapping S (1.3) and the Lagrange multipliers mapping Λ (1.5) are all single-valued.Before we move to the general case, note that if s is single-valued in (4.1), we can get the following result, where the concaveconvexity of f or the qualification condition (5.10) are not necessary.Theorem 5.1.Consider the optimal value function ϕ (4.1) and let the corresponding optimal solution map S (1.3) be single-valued (i.e., S := s) and Lipschitz continuous around x, where s( x) = ȳ.Then, Proof.Consider the function ψ : R n → R n+m defined by ψ(x) := (x ⊤ , s(x) ⊤ ) ⊤ .Then ∂ϕ(x) := ∇ x f • ψ(x) and we can check that , while respectively using the chain and product rules in Theorems 2.2 and 2.3.Obviously, if s is single-valued and continuously differentiable at x, we get the equality in (1.7).Also, it can be useful to observe that (1.6) provides an upper bound for the generalized Hessian of ϕ which is looser than the one in (5.1); cf.(2.5).Next, we show that under additional assumptions, inclusion (1.9) is valid in the case where the constraint function g effectively depends on both x and y.Theorem 5.2.Consider ϕ (1.2) and suppose that gph K is compact and the MFCQ holds at ( x, y), for all y ∈ S( x).Further assume that the mappings S and Λ are singlevalued (i.e., S := s and Λ := λ) and Lipschitz continuous around x and ( x, ȳ), respectively, with ȳ = s( x) and ū = λ( x, ȳ).Then, we have inclusion (1.9).
Proof.It is clear that with gph K compact and the MFCQ satisfied at ( x, y), for all y ∈ S( x), we have from [17] that ∂ϕ(x) ⊆ co y∈S(x) u∈Λ(x,y) holds near x.S and Λ being both single-valued around x and ( x, ȳ), respectively, it follows from inclusion (5.2) that we have ∂ϕ ∂ y * , ψ ( x) (5.5) since the function ∇ x L is differentiable and φ and ψ are Lipschitz continuous around ψ( x) and x, respectively.One can easily observe that (5.4)-(5.5)result from the chain rule in Theorem 2.2.It now remains to evaluate the subdifferentials involved in (5.5).
In fact, we can easily check that the following equalities hold: (5.8) It then follows from (5.6) and (5.7) that we have Substituting this, together with (5.8), in (5.5), we arrive at inclusion (1.9).This result is clearly a special case of Theorem 4.3, as both estimates will coincide if Λ is locally single-valued and Lipschitz continuous.Note that the latter property ensures that qualification condition (4.5) automatically holds.If in addition to the assumptions made in Theorem 5.2, we suppose that the functions s and λ are differentiable at x and ( x, ȳ), respectively, then we have equality (1.10).To ensure that the latter holds, recall that we have already discussed in Subsection 4.2 how to get the function s locally single-valued and Lipschitz continuous.If stronger assumptions are made, i.e., the SSOSC, LICQ, and strict complementarity condition (i.e., θ = ∅) for ( x, ȳ), then the optimal solution set-valued mapping S (1.3) and the Lagrange multiplier mapping u (as a function of just x) are locally unique and based on the implicit function theorem, their derivatives can be obtained from the system (5.9) cf. [16,Chapter 3].Under additional invertibility assumptions, complete expressions of ∇s( x) can be written in terms of the problem data; see, e.g., [35,Chapter 7].
5.2.Set-valued optimal solution map.We start here by considering the case ϕ (4.1), but while further assuming that the involved functions are not necessarily convex.Unlike in the previous subsection, we let S (1.3) be set-valued throughout this subsection.We denote by Theorem 5.3.Consider the optimal value function ϕ (4.1) and let Y be a compact set.Furthermore, consider a point ( x, x) such that x ∈ ∂ϕ( x) and the implication is satisfied at all (a, z) ∈ Γ • ( x, x).Then, for all x * ∈ R n , it holds that Proof.Let us first recall that under the compactness of Y and the continuously differentiability of f , we have from (3.4) that ∂ϕ(x) = co ∇ x f •Ψ(x) with Ψ(x) := {x}×S(x).
Next, recall that Ψ is closed, following the discussion in the proof of Theorem 4.1.Furthermore, one can easily check that Ψ is locally bounded around any point in R n .Now, consider a sequence Obviously, we can find a sequence By the local boundedness of Ψ, {b k } admits a convergent subsequence that we denote similarly, provided there is no confusion.By the closedness of Ψ, we have b k → b ∈ Ψ( ā), for some b ∈ R n+m .Additionally, note that ∇ x f ( b) = c, as f is assumed to be continuously differentiable throughout this paper.Thus c ∈ ∇ x f • Ψ( ā); confirming that ∇ x f • Ψ is a closed set-valued mapping.Next, we consider a sequence x k → x and any sequence z k ∈ ∇ x f • Ψ(x k ).Then, we can find a sequence {y k } such that y k ∈ S(x k ) and z k = ∇ x f (x k , y k ).By definition of the counterpart of S (1.3) for (4.1), it follows that y k ∈ Y for all k.Hence, as Y is compact, it follows from the well-known Bolzano-Weierstrass Theorem that {y k } has a convergent subsequence, that we denote similarly, provided there is no confusion.Let ȳ be the limit of this subsequence.Then, we have ȳ ∈ S( x), given that S is closed as shown in Theorem 4.1.Subsequently, as f is continuously differentiable, the sequence {z k } converges to ∇ x f ( x, ȳ).This shows that ∇ x f • Ψ is locally bounded around x.It then follows from Proposition 2.4 that if the counterpart of (2.9) for the mapping in (3.4) holds at all (a, b) ∈ Γ( x, x), it holds that Finally, observe that (5.10) is a sufficient condition for the counterpart of qualification condition (2.9) for (3.4) to hold.
Observe that based on the coderivative criterion (2.16), the qualification condition (5.10) is automatically satisfied if S is Lipschitz-like around ( x, y) for all y ∈ ∆( x, z s ), (a, z) ∈ Γ • ( x, x).Also, it is clear that the estimate of generalized Hessian of ϕ obtained in Theorem 4.1 is much tighter than the one derived in Theorem 5.3.
Similarly to Theorem 5.3, we now consider ϕ in the general case Consider (1.2), but without the convexity assumption.Then we have the following result, where Theorem 5.4.Consider (1.2) and suppose that gph K is compact and the LICQ holds at all (x, y) ∈ gph S. Further assume that Λ is single-valued and Lipschitz continuous around ( x, y) with u = λ( x, y) for all y ∈ S( x) such that ∇ x L( x, y, u) = x.Furthermore, let S (1.3) be closed and locally bounded around x, and the qualification condition holds at all (a, z, w) ∈ Γ λ ( x, x), where x ∈ ϕ( x).Then, for all x * ∈ R n , it holds that Proof.Based on the compactness of gph K, the closedness of S and the Bolzano-Weierstrass Theorem we can easily show that ∇ x L•φ•Ψ is closed and locally bounded around x, thanks to the locally Lipschitz continuity of φ.The rest of the proof follows the steps of that of Theorem 5.3.
Similarly to the previous result, the qualification condition (5.12) is automatically satisfied if S is Lipschitz-like around ( x, y) for all y ∈ ∆( x, z s ), (a, z, w) ∈ Γ λ ( x, x).The only remaining thing to clarify here is how to estimate the coderivative of S, as well as ensuring that S is Lipschitz-like, in the absence of convexity.To proceed, we simply restate the following result from [11, Theorem 5.9].Theorem 5.5.Let the mapping S (1.3) be inner semicontinuous at ( x, ȳ) ∈ gph S, while the set-valued map Ξ(v) := {(x, y)| g(x, y) ≤ , f (x, y) − ϕ(x) ≤ v} is calm at (, x, ȳ).Furthermore, if the MFCQ holds at ( x, ȳ), then for all y * ∈ R m , Furthermore S is Lipschitz-like around ( x, ȳ) if in addition, it holds that Recall that to get ∂ϕ( x) in (5.13), we can use (3.4) and (3.7) for Theorems 5.3 and 5.4, respectively.For more details on this result, as well as examples where the assumptions hold, can be found in [11].

Final discussion and future work
Our focus in this paper was to provide upper estimates for the generalized Hessian of ϕ under different scenarios.In some cases, these upper estimates may be substantially larger than the generalized Hessians being estimated.This is, for example, the case for the problem in Example 4.4, which leads to Then based on partitions of the sets A, B, and C, we introduce the following sets: By basic calculations, see, e.g., [13,37], we can check that the Fréchet normal cone to the graph of ∂ϕ can be obtained as It follows from this equality and considering the definition of the basic normal cone given in (2.1), we have Hence, from the definition of the concept of coderivative, it holds that  We can easily conclude from the formula (6.1) that for x <  and x = x, we have This confirms that in the case of this example, the upper estimate of ∂ 2 ϕ( x|x)(x * ) obtained from (4.6) is larger than the actual generalized Hessian of ϕ.Another important observation from (6.1) is that for a specific structure of the functions f and g that define ϕ, exact formulas for ∂ 2 ϕ( x|x)(x * ) can be obtained.The generalization of such a calculation to broader classes of quadratic or linear versions of f and g will be carefully studied in a separate work.
To conclude this section, we provide a flavor of how the results in this paper can be used in practice, by illustrating how they can be used in an approximation scheme to solve a bilevel optimization problem defined by min where F, f : R n × R m → R, G : R n × R m → R p , and g : R m → R q are all continuously differentiable functions.Under the well-known partial calmness concept (see, e.g., [20,36]), a point that is locally optimal for problem (6.2) is also a local optimal solution for the partially penalized problem min x,y F(x, y) + η(f (x, y) − ϕ(x)) s.t.G(x, y) ≤ , g(x, y) ≤ , (6.3) for some penalization parameter η > .Hence, for a given value of η, a point (x, y) will be said to be Karush-Kuhn-Tucker (KKT)-stationary for problem (6.Considering recent developments in the construction of Newton-type methods for nonsmooth inclusions, using the Mordukhovich coderivative (see, e.g., [19,26,27]), we can solve the following equation in d at iteration k: where As it is well-known how to calculate elements from ∂φ G (ζ k ) and ∂φ g (ζ k ), the first main question here is how to calculate an element from ∂ 2 ϕ x k |x k (d x ).In the case where it is not possible to have an exact formula for this generalized Hessian as in the example above, see (6.1), elements from the upper estimates obtained in the fourth and fifth sections of this paper could be used for a heuristic scheme to solve inclusion (6.4).A separate work is ongoing to study the theoretical properties of the generalized Newton scheme (6.5)-(6.6) in the case where an exact formula for ∂ 2 ϕ x k |x k (d x ) can be obtained, and also evaluate the practical efficiency in the case where only upper bounds of this generalized Hessian can be estimated.

1 . Introduction 1 . 1 .
Aim of the work.Considering the functions f and g defined from R n+m to R and R p , respectively, we are interested in the parametric optimization problem min y {f (x, y)| g(x, y) ≤ } .

Example 4 . 4 .
Consider the optimal value function