On the Strong Subregularity of the Optimality Mapping in an Optimal Control Problem with Pointwise Inequality Control Constraints

This paper presents sufficient conditions for strong metric subregularity (SMsR) of the optimality mapping associated with the local Pontryagin maximum principle for Mayer-type optimal control problems with pointwise control constraints given by a finite number of inequalities \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_j(u)\le 0$$\end{document}Gj(u)≤0. It is assumed that all data are twice smooth, and that at each feasible point the gradients \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_j'(u)$$\end{document}Gj′(u) of the active constraints are linearly independent. The main result is that the second-order sufficient optimality condition for a weak local minimum is also sufficient for a version of the SMSR property, which involves two norms in the control space in order to deal with the so-called two-norm-discrepancy.

minimize J (x, u) := F(x(0), x(1)), (1) x(t) = f (x(t), u(t)) a.e. in [0, 1], (2) G(u(t)) ≤ 0 a.e. in [0, 1], where F : R 2n → R, f : R n+m → R n , and G : R m → R k are of class C 2 , u ∈ L ∞ , x ∈ W 1,1 . More precisely, we investigate the property of Strong Metric subRegularity (SMsR) of the so-called optimality mapping, associated with the system of first order necessary optimality conditions (Pontryagin's conditions in local form) for problem (1)- (3). These optimality conditions may have various forms. In this paper we deal with the representation using the augmented Hamiltonian, where the control constraints are included with corresponding Lagrange multipliers (see next section for a detailed formulation).
In general, the local Potryagin principle can be written in the form of an inclusion (also called optimality system) where y incorporates the state, control, adjoint variables, and possibly the Lagrange multipliers associated with the control constraints. In this general setting, y belongs to a metric space (Y , d Y ) and the image of Φ is contained in another metric space (Z , d Z ). Each of these spaces is endowed with an additional metric: The definition of strong metric subregularity of the mapping Φ that we use is a slight (however substantial) extension of the standard one, introduced under this name in [9], also see [10,Chapter 3.9] and the recent paper [6]. The difference is, that the definition below involves the four metrics, d Y , d Y• in Y , and d Z , d Z • in Z , instead of a single metric in each of the two spaces.

Definition 1.1
The set-valued mapping Φ : Y ⇒ Z is strongly metrically subregular (SMsR) at (ŷ,ẑ) ∈ Y × Z ifẑ ∈ (ŷ) and there exist number κ ≥ 0 and neighborhoods B Y ofŷ in the metric d Y• and B Z ofẑ in the metric d Z • , such that for any z ∈ B Z and any solution y ∈ B Y of the inclusion z ∈ Φ(y), it holds that d Y (y,ŷ) ≤ κ d Z (z,ẑ).
Versions of the SMsR property have also been introduced and utilized in [3,5,11]. Metric regularity properties with two norms in the space Z (a Banach space) are first introduced in [22], while utilization of two metrics in Y , in relation with the SMsR property, is important in [2]. It is well recognized that the SMsR of the optimality mapping in optimal control is a key property for ensuring convergence with error estimates of numerous methods for solving optimal control problems: discretization methods, gradient methods, Newton-type methods, etc. (see e.g. [3,6,21], in addition to a large number of papers where the SMsR property is implicitly used).
We mention that there exists an amount of literature on Lipschitz continuity (related to the property of strong metric regularity) and differentiability of the optimal solution with respect to parameters; see e.g. [8] and [13], correspondingly, as well as the bibliography therein. These properties are stronger than SMsR, therefore the corresponding sufficient conditions for their validity are also stronger. On the other hand, the SMsR property is useful enough for the applications mentioned in the last paragraph.
The SMsR property of the optimality mapping associated with optimal control problems has been investigated and used in several papers, e.g. [1,7,20,21]. However, the sufficient conditions obtained in these papers require various kinds of coercivity conditions for a quadratic form defined by the second derivatives of the (augmented) Hamiltonian. These conditions have to be satisfied for all (sufficiently small) admissible variations of the reference solution of the optimality system. In the present paper, we require coercivity of this quadratic form on an extended critical cone only, which is a subset of the set of all admissible variations. Namely, we establish that the known second-order sufficient optimality conditions for problem (1)-(3) (in terms of the extended critical cone) are also sufficient for SMsR. This makes the conditions for SMsR close to those in mathematical programming. A remarkable additional result is that in the second-order sufficient optimality conditions, the extended critical cone can be replaced with the usual critical cone, provided that a point-wise Legendre-type condition is satisfied. Moreover, we show that the converse is also true: the latter condition together with coercivity of the quadratic form on the critical cone implies coercivity on the extended critical cone. In Sect. 2 we introduce some basic notations and assumptions. In Sect. 3 we define the extended critical cone and recall a second order sufficient optimality condition ensuring local quadratic growth of the objective function (1). This condition involves coercivity of the quadratic form associated with the Hamiltonian along the directions of the extended critical cone. In Sect. 4 we prove that for the local quadratic growth it suffices to require coercivity on the usual (not extended) critical cone, together with a Legendre-type condition. The main result-the sufficient conditions for SMsR-is formulated in Sect. 5, while the long Sect. 6 contains its proof.

Notations and Assumptions
First we recall some standard notations. The scalar product and the norm in the Euclidean space R n is defined in the usual way: x, x := x 1 x 1 + . . . + x n x n , and |x| = √ x, x for any x = (x 1 , . . . , x n ) ∈ R n and x = (x 1 , . . . , x n ) ∈ R n . The elements of R n are regarded as column-vectors with the exception of the adjoint variables p and λ (to appear later), which are row-vectors. For a function ψ : R k → R r of the variable z we denote by ψ (z) its derivative (Jacobian), represented by an (r × k)matrix. For r = 1, ψ (z) denotes the second derivative (Hessian), represented by a (k × k)-matrix. For a function ψ : R k×q → R of the variables (z, v), ψ (z, v) and ψ (z, v) still denote the first and the second derivatives with respect to (z, v), however the partial derivatives are denoted by ψ z , ψ v , ψ zz , ψ zv and ψ vv .
The space L k = L k ([0, 1], R r ), with k = 1, 2 or k = ∞, consists of all (classes of equivalent) Lebesgue measurable r -dimensional vector-functions defined on the interval [0, 1], for which the standard norm · k is finite. As usual, W 1,1 = W 1,1 ([0, T ], R r ) denotes the space of absolutely continuous functions x : [0, T ] → R r for which the first derivative belongs to L 1 . For convenience, the norm in W 1,1 is defined as x 1,1 := |x(0)| + ẋ 1 , so that x ∞ ≤ x 1,1 . The specification ([0, 1], R r ) will be omitted if clear from the context. According to (3), the set of admissible control values is Let G i denote the ith component of the vector G. For any v ∈ U define the set of active indices

Assumption 2.1 (regularity of the control constraints)
The set U is nonempty and at each point v ∈ U the gradients G i (v), i ∈ I (v) are linearly independent.

Assumption 2.2
The triplet (ŵ,p,λ) ∈ W ×W 1,1 ×L ∞ satisfies the following system of equations and inequalities: Observe that this system represents the first order necessary optimality condition for a weak local minimum 1 of the pairŵ = (x,û) (see e.g. [14, part 1, section 18]); later on we refer to it as to optimality system. Namely, ifŵ is a point of weak local minimum in problem (1)-(3), then there existp ∈ W 1,1 andλ ∈ L ∞ such that the optimality system is fulfilled. Note that for a givenŵ the pair (p,λ) is uniquely determined by these conditions. Indeed, the adjoint variable p is uniquely determined by adjoint equation (6) and transversality conditions (5), and thenλ is uniquely determined by equation (7) and complementary slackness condition in (4) due to Assumption 2.1. Introduce the Hamiltonian and the augmented Hamiltonian Then equations (6) and (7) take the form −ṗ(t) = H x (ŵ(t),p(t)),H u (ŵ(t),p(t),λ(t)) = 0 a.e. in [0, 1].
Notice that here and below, the dual variables p and λ are treated as row vectors, while x, u, w, f , and G are treated as column vectors.

Second-Order Sufficient Conditions for a Weak Local Minimum
Now we discuss the second-order sufficient conditions for a weak local minimum (references will be given at the end of Sect. 4). Set Define the critical cone It can be easily verified that F (q)q = 0 for any element w of the critical cone.
In many cases (in "smooth problems" of mathematical programming and the calculus of variations) it is sufficient for local minimality that the critical cone consists only of the zero element. However, this is not the case for optimal control problems with a control constraint of the type u(t) ∈ U .
An equivalent definition of the critical cone is the following. Set Then, due to (7).
We introduce an extension of the critical cone. For any > 0 and j = 1, . . . , k we set For any > 0 we set Notice that the cones K form a non-increasing family as → 0+. In particular, K ⊂ K for any > 0.
Define the quadratic form: Assumption 3.1 There exist > 0 and c > 0 such that Remark 3.1 Assumption 3.1 is equivalent to the following: there exist > 0 and c > 0 such that with some c > 0. The required equivalence follows. (14) is true for some > 0 and c > 0, then it is true for any positive < and the same c .

Remark 3.2 Notice that if
In the sequel we use the notations c, c , c , c 1 , c 2 , etc. for constants which may have different values in different estimations.
We recall the following theorem, first published in [15,16] in a slightly different formulation.
In the next section, we discuss the equivalent formulation of this theorem and then provide references to the literature, where proofs can be found.
Further, for any > 0 and any t ∈ [0, 1] denote by I C (t) the cone of all vectors v ∈ R m satisfying for all j = 1, . . . , k the conditions For any > 0 and any j ∈ {1, . . . , k} we set Clearly, meas m → 0 as → 0+.

Assumption 4.2 (strengthened Legendre condition on m ). There exist
> 0 and (18) is true for some > 0 and c L > 0, then it is true for any positive < and the same c L .

Remark 4.1 Similarly as in Remark 3.2, if
In the sequel, we often omit the argument t of x, u,x,û, etc.
The following lemma follows from the definition of in (13). where Moreover, there exists a constant c, independent of w and w , such that Henceforth, for w = (x, u) ∈ W we set It is clear that γ 0 (w) ≤ γ (w), and, as shown in Remark 3.1, ifẋ = f w (ŵ)w, then there exists c > 0, independent of w, such that Note that α( ) → 0+ as → 0+. We may assume that is so small that α( ) ≤ 1.
where χ m is the characteristic function of the set m . Obviously, u (t) ∈ I C (t) a.e. on [0, 1] and, therefore, Let x be the solution to the equatioṅ Hence, Obviously, Using the estimate (20) in Lemma 4.1, Assumptions 4.1, 4.2, and the third relation in (23), we obtain the inequality We consecutively estimate where c and c are appropriate constants. Using these relations and (22) in (24), we obtain that with some constant c . Take > 0 such that keeping the same constant c L (see Remark 4.1). Then which completes the proof, since c is independent ofw ∈ K .
The converse is also true. Proof Let Assumption 3.1 be fulfilled, i.e., there exist > 0 and c > 0 such that According to Remark 3.2, one may fix > 0 arbitrarily small without changing c , which will be done below.
Since K ⊂ K , this inequality holds also on K , therefore Assumption 4.1 is fulfilled.
Let us prove that Assumption 4.2 is also fulfilled. Take any u ∈ L ∞ satisfying the conditions where χ m is the characteristic function of the set m . Define x by the conditionṡ Set w = (x, u). Then, obviously, w ∈ K , whence it follows that Moreover, where α( ) is defined in (21). The latter implies that with some c > 0. Using these estimates and (13), we get Take any > 0 such that Then we have The connection between the strengthened Legendre condition and the so-called "local quadratic growth of the Hamiltonian" (defined below) was studied in [4]. Let us formulate the corresponding result from [4] which may be useful for the problem under consideration.

Proposition 4.3 [4] Assumption 4.2 implies the local quadratic growth condition of the Hamiltonian.
The converse is not true. As shown in [4], the condition of the local quadratic growth of the Hamiltonian is somewhat finer than Assumption 4.2.
There is the following more subtle second-order sufficient condition for a weak local minimum at the pointŵ in problem (1) A sufficient second order condition of this type for a much more general optimal control problem (together with the corresponding second order necessary condition) was first published by the first author back in 1978 in [12]. A relatively simple proof of Theorem 4.1 in the case of k = 1 was recently published in [19]. Proofs of much more general results of this type can be found, for example, in [17] and [18].
where B is a fixed closed ball in R m . Then, according to Assumption 2.1, For any ε > 0, we set Since G is uniformly continuous on the compact set B, there existsδ > 0 such that Decreasing, if necessary,δ, we can assume thatδ ≤ε.
Further, note that p 1,1 is bounded due to conditions (28) and (29) and also because w ∞ , |ν| and π 1 are bounded. Therefore, p 1,1 is also bounded. Moreover, the following is true.
Here G I (t) and Q I (t),δ are defined similarly to G I and Q I ,δ in Part 1 of the proof.

Obviously, λ(t)G (u(t)) = λ I (t) (t)G I (t) (u(t)) for a.a. t ∈ M(λ)
, and, therefore, (Note that the dimensions of the vector λ I (t) (t) and the matrices G I (t) (u(t)) and A I (t) (u(t)) depend on t.) Multiplying this equation by the transposed matrix (G I (t) (u(t))) * on the right, we get

Further, subtracting (8) from (31) we obtain that
It follows that with some L > 0, where Using the Grönwall inequality, we get In what follows we use a more rough estimate. Namely, since u 1 ≤ u 2 and ξ 1 ≤ ω , we have Consequently, Clearly, relation (36) implies As usual, for ε ∈ R + , the symbol O(ε) means that there exists a constant C > 0, independent of ε, such that |O(ε)| ≤ C|ε| as ε → 0+, and the symbol o(ε) means that o(ε)/ε → 0 as ε → 0+. We use these symbols for O(ε) and o(ε), taking values in R or in R n . Moreover, throughout the paper, the functions O and o may directly depend on w, not only on the norms appearing as arguments at the place of ε. However, the "smallness" with respect to the arguments of O and o will be uniform in w, satisfying w ∞ ≤ δ. For example, O(| w| 2 ) in (40), which is a shortening of O(| w(t)| 2 ), means that there exists a constant C such that O(| w(t)| 2 ) ≤ C| w(t)| 2 for all w satisfying w ∞ ≤ δ and for a.e. t ∈ [0, 1]. Similarly, o(γ ( w)), appearing later, means that o(γ ( w))/γ ( w) → 0 with γ ( w) → 0, uniformly with respect w satisfying w ∞ ≤ δ. 4. Subtracting (5) from (28) we obtain This implies that 5. Subtracting (6) from (29) we obtain Using the Grönwall inequality and the inequality u 1 ≤ u 2 we get with some c > 0. Using (38), (39), (42) in this inequality, and also taking into account the definition of ω , we obtain with some C > 0. Moreover, since w ∞ ≤ δ and ω ≤ δ, we also get Further, we have Therefore, relation (44) implies 6. Next we analyze condition (30). Subtracting (7) from (30), we obtain Consequently, From herê Here,p Therefore, Using this equality and the boundedness of λ ∞ and w ∞ , we estimate with some C > 0.
In the next paragraphs, we shall utilize Assumption 2.1 and Lemma 6.1 to estimate for a.e t ∈ [0, 1] with some C > 0. Set If meas M( λ) = 0 the estimate is trivial, therefore we assume that meas M( λ) > 0. For any t ∈ M( λ), we set Let λ J (t) (t) be a row vector, composed of all nonzero components of λ(t), and let G J (t) be a column vector with the components G j for all j ∈ J (t). Then, obviously, Let t ∈ M( λ), j ∈ J (t). If λ j (t) > 0, then, by the complementary slackness condition in (27), we have G j (u(t)) = η j (t), and hence, |G j (u(t))| ≤ε since η ∞ ≤ δ ≤δ ≤ε.
Thus, for all j ∈ J (t) we have |G j (u(t))| ≤ε. This implies that where the set Q J (t),ε is defined similarly to the set Q I ,ε and the ball B is defined as at the beginning of the proof of Proposition 6.1. By Lemma 6.1, it follows that According to (50) and the second equality in (52) we have for a.a. t ∈ M( λ). Consequently, Moreover, Consequently, Integrating this equality over the segment [0,1], we obtain Integrating by parts the first integral on the left side of this equality and applying (43), we get Substituting this expression into the previous equality and taking into account definition (13) of , we get Notice that Using this equality and equality (40) in equality (58), we obtain According to (47), we have p ∞ ≤ 2Cδ. Therefore, with some c > 0. Similarly, In addition, in view of (54), with some c > 0. Hence, (59) gives with some C > 0. 8. Now we estimate the first term in the righ-handt side of inequality (63). Let us fix j ∈ {1, . . . , k} and consider the term We use conditions (4), (9), (27), and (32). If λ j = 0, then this term is equal to zero. Therefore, we assume that the set has a positive Lebesgue measure. 8.1. Consider the set A.e. on this set we have Then, by the complementary slackness condition in (4), G j (û) = 0. In this case, the

Consider the set
Then, by the complementary slackness condition in (27), a.e. on this set we have (a) Let also G j (û) = 0. Then Multiplying this equality by − λ j , we get (b) Let now G j (û) < 0. Then, by the complementary slackness condition in (4), we haveλ j = 0, and then λ j = λ j > 0.
Again, by the complementary slackness condition (but now in (27)), we have G j (u) = η j , which implies Multiplying this equality by − λ j < 0, we get Consequently, inequality (64) holds a.e. on the set M( λ j ), and then it holds a.e. on [0.1]. This implies that Recall that according to (54), λ ∞ ≤ Cδ. Therefore, with some C > 0. This and (65) imply If λ j = 0, then this equality also holds. Thus, it is true for all j = 1, . . . , k. Consequently, This and inequality (63) imply with some c > 0. Using now the inequality η 2 ≤ ω , we obtain from this that
Funding Open access funding provided by Austrian Science Fund (FWF). The authors have not disclosed any funding.

Conflict of interest The authors have not disclosed any competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.