Metric Regularity Properties in Bang-Bang Type Linear-Quadratic Optimal Control Problems

The paper investigates the Lipschitz/Hölder stability with respect to perturbations of optimal control problems with linear dynamic and cost functional which is quadratic in the state and linear in the control variable. The optimal control is assumed to be of bang-bang type and the problem to enjoy certain convexity properties. Conditions for bi-metric regularity and (Hölder) metric sub-regularity are established, involving only the order of the zeros of the associated switching function and smoothness of the data. These results provide a basis for the investigation of various approximation methods. They are utilized in this paper for the convergence analysis of a Newton-type method applied to optimal control problems which are affine with respect to the control.


Introduction
Stability analysis of solutions is a crucial topic in optimization theory due, in particular, to its applications for obtaining error estimates of numerical approximations. Although related investigations in optimal control theory accompany its development from its early stages, the systematic analysis of (Lipschitz) stability in the area started with the works of Dontchev, Hager and Malanowski (see [6,7]). In these papers, the authors prove Lipschitz dependence of the solutions with respect to perturbations, under a strict coercivity condition which also implies Lipschitz continuity of the optimal control.
In contrast, in the present paper we investigate a class of problems in which the control appears linearly, therefore the strict coercivity fails. Moreover, when the control set is the mdimensional hypercube [−1, 1] m , each component of the optimal control generally switches from ±1 to ∓1, possibly concatenating with arcs with values in the interior of [−1, 1]. That is, the optimal control is typically discontinuous.
Problems which are affine with respect to the control variable arise in many applications, such as engineering, biology and medicine (see e.g. [17][18][19]21]). Nevertheless, only few papers address the stability analysis in case of non-coercive problems and such with discontinuous optimal controls; in fact, many relevant questions still remain unanswered. Recent progress was made in [11,12,14,20] for control-affine problems and in [23] for problems with linear dynamics, and we build on these papers. We mention also the paper [25] and the references therein for problems with group sparsity. Applications to error estimates for time-discretization schemes are discussed in [1,2,15,22,25,28] for linear systems or problems of the type (P) below. We mention also the paper [3], where stability analysis is discussed for control-affine systems with bang-singular optimal controls.
In the present paper we focus our attention on the following class of optimal control problems:

(t) W (t)x(t) + x(t) S(t)u(t) dt.
Here, u(t) ∈ U and x(t) ∈ R n denote the control and the state of the system at time t ∈ [0, T ], the function g : R n → R is given, as well as A(t), W (t) ∈ R n×n , B(t), S(t) ∈ Linear terms in u or x are not included in the integrand, which is not a restriction of generality, since such terms can be shifted in a standard way into the differential equation.
The stability properties of the solution(s) of (P) will be analyzed through the Pontryagin minimum principle, which states that for any optimal pair (x,û), there exists an absolutely continuous functionp : [0, T ] → R n such that the triple (x,p,û) solves the following system a.e. on [0, T ]: 0 ∈ B(t) p(t) + S(t) x(t) + N U (u(t)), 0 = p(T ) − ∇g(x(T )). (PMP) Here N U (u) is the normal cone to U at u defined in the usual way: It will be assumed (see the next sections for precise formulations) that the data are smooth enough, Problem (P) satisfies some convexity-like assumptions, the (reference) optimal control is piece-wise constant with each component taking only the values −1 and 1. Moreover, it will be assumed that each component of the associated "switching function", t → B(t) p(t) + S(t) x(t), satisfies at its zeros a certain growth condition, characterized by a number κ ≥ 1 (κ can be regarded as the multiplicity of the zeros if the switching function is smooth). We recast the system (PMP) as the generalized equation 0 ∈ F (x, p, u), (1.1) where F is the set-valued mapping acting in a suitable Banach space X (x, p, u) with values in a linear normed space Y. The set N U (u) in (1.2) is a functional replacement for the point-wise cones N U (u(t)) in (PMP) and will be strictly defined in the next section together with the spaces X and Y.
As usual, we investigate the stability of the solution of problem (P) by introducing a perturbation y ∈ Y in the system of necessary optimality conditions, that is, considering the perturbed inclusion y ∈ F (x, p, u). Under the assumptions briefly mentioned above, the unperturbed system 0 ∈ F (x, p, u), that is the system of necessary optimality conditions (PMP), has a unique solution (x,p,û).
Two main concepts of stability are investigated in the paper.
The first concept is a stronger version of the Hölder strong metric sub-regularity (see the recent paper [5]). Roughly speaking, we prove that for all sufficiently small perturbations y, the inclusion y ∈ F (x, p, u), associated with problem (P), has a solution and all the solutions are at distance (in the space X ) at most proportional to y 1/κ from the unique solution (x,p,û) of the inclusion 0 ∈ F (x, p, u). We mention that a similar result was proved in [2,Theorem 9], but with different functional spaces and on slightly stronger assumptions. Moreover, the claim in our result is somewhat stronger, which is rather essential for the analysis of the strong bi-metric regularity and the convergence of Newton's method which will be discussed below.
The second concept extends the standard strong metric regularity introduced in the seminal paper [24] by Robinson (see also [8,Chapter 3.7]). The new feature is that a second metric space Y ⊂ Y is involved (presumably with a non-equivalent and larger metric than that in Y) and only disturbances from this space are considered. Roughly, strong bimetric regularity relative to Y ⊂ Y of F atẑ := (x,p,û) means that the inverse mapping Y y → F −1 (y) = {z ∈ X : y ∈ F (z)} is locally (aroundẑ) single-valued when restricted to a sufficiently small ball in Y, centered at y = 0. Moreover, this single-valued mapping is Lipschitz continuous with respect to the metric of Y. In the terminology of [8], this means that F has a single-valued localization in X × Y and it is Lipschitz continuous, but the Lipschitz property holds with respect to the metric of Y.
The general notion of strong bi-metric regularity was introduced in somewhat more restrictive form in [23], where applications to Mayer's type problems for linear control systems were in the focus. Similarly as the strong metric regularity, it has the important property to be invariant with respect to small (in an appropriate sense) functional perturbations of F . This property is often referred to as Lyusternik-Graves type theorem, see e.g. [8,Chapter 5.5]. In the present paper we prove a general Lyusternik-Graves type theorem for strong bimetrically regular inclusions, which is a substantial improvement of the one in [23], since most of the assumptions are now formulated in terms of the (smaller) metric of Y rather than in the metric of Y, as in [23].
We prove strong bi-metric regularity of the mapping F associated with Problem (P), which extends the result in [23] concerning Mayer's problems. This extension is nontrivial, since, technically speaking, the integral cost introduces the state variable in the switching function, making this function nonsmooth. This forces us, among other things, to consider the present slightly more general notion of bi-metric regularity compared with the one in [23]. As an application we give a Lipschitz stability result with respect to small non-linear perturbations in the differential equation.
In the last section of the paper, we investigate the convergence of a Newton-type method (as interpreted in the context of generalized equations, see e.g. [8,Chapter 6.3]) applied to a class of control-affine problems for which (P) can be regarded as a linearization. Notice that the known convergence results (cf. [6]) are inapplicable for non-coercive problems, where the strong metric regularity in the usual space settings fails. We will give sufficient conditions under which the considered Newton's method converges, and does so quadratically. The proof is based on a strengthened version of the metric sub-regularity proved in the present paper for Problem (P). We mention that the stability analysis and the convergence properties of Newton methods still remain not fully understood when singular arcs occur. Some advances have been done recently in [12] for the first issue, and in [3,13] for the latter. However, these issues remain as interesting topics for future research.
The paper is organized as follows. In Section 2, we recall some basic facts and introduce the main assumptions on Problem (P) together with some notations. Section 3 is devoted to the proof of the Hölder sub-regularity of Problem (P) (actually, of the associated mapping F ). In Section 4, we introduce the definition of strong bi-metric regularity, and prove an extension of the Lyusternik-Graves theorem suitable to this new notion. After that, we prove the strong bi-metric regularity of the mapping F resulting from problem (P) and give a result about the invariance of this property under a class of non-linear perturbations. In Section 5, we investigate the convergence of a Newton-type method applied to some control-affine problems with bang-bang solutions.

Preliminaries
Throughout the paper we use the following common notations. The standard n-dimensional Euclidean space is denoted by R n , with the scalar product and norm denoted by ·, · and | · |, respectively. The superscript denotes transposition. Further, L 1 ([0, T ], R n ) and L ∞ ([0, T ], R n ) are the spaces of all measurable and absolutely integrable, respectively essentially bounded, functions with the corresponding norms · 1 and · ∞ , which sometimes will be abbreviated as L 1 and L ∞ , respectively. Moreover, W 1,k ([0, T ], R n ) is the space of all absolutely continuous functions from [0, T ] to R n whose first derivatives belonging to L k , k ∈ {1, ∞}. The corresponding norms are denoted by · 1,1 and · 1,∞ , respectively. We also denote W 1,1 We introduce the following assumptions, some of which will be strengthened in the next sections.
Assumption (A1) The matrix-functions B and S are continuous, A, W and d are measurable and bounded. The matrix W (t) is symmetric for every t ∈ [0, T ]. The function g is differentiable with globally Lipschitz continuous gradient ∇g.
We stress that the assumption about global Lipschitz continuity of ∇g is made for technical convenience only and is not a real restriction. Since the reachable set in Problem (P) is compact, any modification of g outside a neighborhood of the reachable set does not affect the problem.
For every u ∈ U the differential equation in problem (P) with the given initial condition has a unique (absolutely continuous) solution x on [0, T ]. Every such pair (x, u) is called "admissible", and the set of all admissible pairs is denoted by F.
Thanks to Assumption (A1), a standard compactness argument implies the existence of an optimal solution of Problem (P). In what follows we consider a fixed optimal solution (x,û).
In the next assumption we postulate that the optimal controlû is strictly bang-bang, with a finite number of switching times on [0, T ], and that the switching function exhibits a certain growth in a neighborhood of any zero.
We denote by d Y the distance induced by · . As in the introduction, we recast the first order optimality conditions (Pontryagin system) (PMP) for Problem (P) as the generalized equation Notice that this definition is consistent with the general definition of a normal cone if U is considered as a subset of the space L 1 (although U is also contained in L ∞ ; but then N U (u) should be a cone in the dual space to L ∞ ). In the following sections, given a perturbation y = (ξ, π, ρ, ν) ∈ Y, we will study the inclusion y ∈ F (x, p, u), (2.3) which, written in detail, looks as follows: for a.e. t ∈ [0, T ],

Strong Metric Sub-regularity
In this section we prove an important regularity property of the mapping F defined in (1.2), related to, but stronger than, strong Hölder metric sub-regularity, see [5].
We begin with some important properties of switching functions that fulfill Assumption (A3). First we fix some notations. Given any continuous function σ : [0, T ] → R m (σ j will denote its j -th component) satisfying Assumption (A3) with constants κ, α and τ , and a real number δ > 0, we define Note that this minimum always exists and is indeed positive since σ is continuous and [0, T ] \ I j (σ, τ ) is compact for any j ∈ {1, . . . , m}. Now we state an auxiliary result which presents an inverse integral inequality for functions satisfying Assumption (A3) of the type of those developed in Theorem 2.1 and Corollary 2.1 and 2.2 in [27]. It extends [26,Lemma 1.3], which in its turn originates from [11,Lemma 3.3].  Proof If v = 0, then the inequality in Lemma 3.1 is fulfilled. If v = 0 then due to the homogeneity of order κ + 1 of the two sides of (3.2) with respect to v, it is enough to prove the lemma in the case of v ∞ = 1, which will be assumed in the remaining part of the proof. Now we chooseδ ∈ (0, τ ) such that αδ κ < l min (σ, τ ). Then for all δ ∈ (0,δ] and j ∈ {1, . . . , m} we have where λ is the sum of the number of zeros of σ j for all j ∈ {1, . . . , m}. (Notice that Hence, by defining c 0 := min αδ κ 2T κ , α 2 2κ+1 λ κ we obtain that Since we can chooseδ to only depend on κ, α, τ and m 0 and there is an upper bound to λ which only depends on m, T and τ , the constant c 0 also only depends on m, T , κ, α, τ and m 0 . This proves Remark 3.2. The following theorem establishes a stability property of the mapping F associated with system (PMP) which is a somewhat stronger form of the well-known property of metric sub-regularity, [8,Section 3I]. It extends [2,Theorem 8] in that Assumption (A3) is weaker than the corresponding assumption there (since we allow 0 and T to be feasible zeros of some components of the switching function), the norm in the space Y is somewhat weaker, and the function g is not necessarily quadratic. Most importantly, the size of the disturbance y for which the claim of the theorem holds is not a priori restricted (as in the definition of metric sub-regularity, [8, Section 3H] and in [2, Theorem 8]).

Theorem 3.3 Let (x,p,û) be a solution of (PMP) such that Assumptions (A1)-(A3) are fulfilled. Then for any
, and any such triple satisfies Remark 3.4 Due to further needs, in the proof of the above theorem we will care about how the constant c depends on the data of the problem and the associated switching functionσ . More precisely, the following statement will be proved. Let the natural numbers n, m and the real number T > 0 be fixed. Given constants κ ≥ 1, α > 0, τ > 0, m 0 > 0, b > 0 and K, there exists a number c > 0 with the following property. 1 Let the (n × n)-matrix functions A(t) and W (t) the (n × m)-matrix functions B(t) and S(t) be defined on [0, T ], and g : R n → R be such that Assumption (A1) is fulfilled, and in addition, Let (x,p,û) be a solution of (PMP) (i.e. of (1.1)) such that Assumption (A2) holds, the corresponding switching functionσ fulfills Assumption (A3) with constants κ, α and τ , and .2)) has a solution and for every solution (x, p, u) the estimation (3.4) holds.
Proof First of all, we note that the inclusion y ∈ F (x, p, u), for any y = (ξ, π, ρ, ν) ∈ Y, represents the system of necessary optimality conditions of the following problem: subject toẋ Due to the linearity in u and the convexity and compactness of the constraining set U this problem has a solution, hence also the inclusion y ∈ F (x, p, u). Now, let b > 0 be arbitrarily chosen and let (x, p, u) be a solution of y ∈ F (x, p, u), where y = (ξ, π, ρ, ν) ∈ Y and y ≤ b. The following notations will be used. As before, and skip the argument t whenever this does not lead to ambiguity.
Integrating by parts, we have Substituting here the expressions for x and p resulting from the inclusions y ∈ F (x, p, u) and 0 ∈ F (x,p,û) in view of (1.2) we obtain that Rearranging the terms in this equality and using Assumption (A2) we get Using this inequality and the definitions of the functions σ andσ we obtain The third component of the inclusion where the constant c 0 only depends on κ, α, τ and m 0 (see Remark 3.4). Then using (3.7) and the Hölder inequality we obtain Using Assumption (A1) and the solution formula of the Cauchy problem for x and p we get for some constants c 1 and c 2 that only depend on K (see (3.5) in Remark 3.4). (We mention that for the estimation of p ∞ we use the estimation for | x(T )| and the Lipschitz continuity of the gradient ∇g appearing in the end-point conditions for p andp in (1.1).) Therefore, by (3.8)-(3.9) we obtain that for some constant c 3 , only depending on c 0 , c 1 and c 2 . Now, we distinguish two cases. First, Inequalities (3.11) and (3.12) imply that for any b > 0 there exists c 4 > 0, depending on c 3 and b such that Then the claim of the theorem follows with a suitable constant c (depending only on c 1 , c 2 and c 4 ) from the above estimation together with (3.9). Notice that c 4 , hence also c, depend on b only due to the term b (κ−1)/κ in estimation (3.12), which equals 1 in the case κ = 1. This justifies Footnote 1.
Remark 3.5 Clearly, the property established in Theorem 3.3 implies that (x,p,û) is the unique solution of (PMP), thus (x,û) is the unique solution of problem (P). Therefore, (PMP), together with Assumptions (A1)-(A3), is a sufficient optimality condition.

Bi-metric Regularity
The notion of strong bi-metric regularity was introduced in [23] in order to grasp in a relevant way the dependence on perturbations of the solutions of Mayer's type optimal control problems for linear systems. Its extension to the Bolza problem considered in this paper is more complicated due to the missing smoothness of the switching function associated with the optimal control. In this section we present such an extension, starting from the abstract definition of strong bi-metric regularity and a new, substantially strengthened version of the Lyusternik-Graves type theorem proved in [23].

The Abstract Setting
First, we give the definition of strong bi-metric regularity, which is a more convenient extension of the one introduced in [23].
Let with radius a > 0 and b > 0 centered atx andȳ, respectively. We will suppose that the metric d Y andd Y are shift-invariant, which means, in terms of the metric d Y , that d Y (y + z, y + z) = d Y (y, y ), ∀y, y , z ∈ Y. > 0 and b > 0 if (x,ȳ) ∈ graph( ) and the following properties are fulfilled: ; a) is single-valued, and 2. for all y, y ∈ B Y (ȳ; b), It is important to notice that in this definition the "disturbances" y, y are taken from the smaller space Y (and are sufficiently small in the metric of this space), but the Lipschitz property (4.1) holds with the (smaller) metric d Y . This is the crucial difference with the standard definition of strong metric regularity (see e.g. [8, Section 3G] and [16]), where the spaces Y and Y coincide.
The next result resembles the main features of the Lyusternik-Graves-type theorem proved in [23, Theorem 2.1], but under substantially weakened requirements, as explained in the comments after the proof.

2)
and for every function ϕ : X → Y such that and (4.5) Thus s(y − ϕ(x)) is defined for all such pairs (x, y).
For an arbitrarily fixed y ∈ B Y (ȳ + ϕ(x); b ) we consider the mapping B X (x; a ) x → Z y (x) := s(y − ϕ(x)). We shall prove that the mapping Z y has a unique fixed point by using the contraction mapping theorem in the form of [8,Theorem 1A.2]. For this we denote λ = ςμ < 1 and estimate Then, according to [8,Theorem 1A.2], there exists a unique x = x(y) ∈ B X (x; a ) such that x = s(y − ϕ(x)). The latter implies that a ) is single-valued. Now, take two arbitrary elements y, y ∈ B Y (ȳ + ϕ(x); b ) and let x = s(y − ϕ(x)) and x = s(y − ϕ(x )) be the unique solutions of y ∈ ϕ(x) + (x) in B X (x; a ) corresponding to y and y , respectively. Then Hence, which completes the proof.
The main improvement in the above theorem, compared with [23, Theorem 2.1], is that the Lipschitz property (4.4) is required in [23, Theorem 2.1] to be fulfilled in the stronger metric d Y , which makes the theorem unusable in several applications, including that presented in Section 4.3.

Assumption (A1') The functions
(4.6) We denote by d Y the distance induced by · ∼ . Observe that d Y ≤d Y onỸ.

Assumption (A2') For every couple of admissible pairs (x, u), (x , u ) ∈ F it holds that ∇g(x(T )) −∇g(x (T )), x(T ) − x (T )
where "meas" stands for the Lebesgue measure in [0, T ]. This metric is shift-invariant and we shall shorten d # (u 1 , u 2 ) = d # (u 1 − u 2 , 0) =: d # (u 1 − u 2 ). Moreover, U is a complete metric space with respect to d # (see [10,Lemma 7.2]). Then the triple (x, p, u) is considered as an element of the space Clearly X is a complete metric space. (x,p,û) be a solution of (PMP) such that Assumption (A3) is fulfilled with κ = 1. Then the mapping F : , y), (4.13) for all y, y ∈ B Y (0; b), where b and c are as in Proposition 4.3. Thus the conditions in Definition 4.1 will be fulfilled even with a = +∞. Let us start by giving a reformulation of the perturbed version of (P), which will turn out to be useful in the sequel. Let us take an arbitrary y = (ξ, π, ρ, ν) ∈ Y. Then the perturbed system y ∈ F (x, p, u) is the set of necessary conditions for the problem (3.6) introduced in the proof of Theorem 3.3. Notice that (3.6) is exactly of the same form as (P) with the state and co-state variables augmented by one dimension, and the data A, B, d, W , S and g replaced with

Theorem 4.5 (Bi-metric regularity) Let Assumptions (A1') and (A2') be fulfilled. Let
is a solution of the system (4.15) The above system can be recast as a generalized inclusion where F y is defined as in (1.2) replacing A by A, and similarly for the other data. F y maps the spaceX where x 0 := (x 0 , 0) . In few words, the dimension of the state and co-state variable is augmented to n + 1 and the additional initial condition x n+1 (0) = 0 is added. Note that by construction for any y ∈ Y Assumption (A1) and Assumption (A2') are fulfilled for (4.16).
Choose b, α, τ and m 0 as in Proposition 4.3. Then there exists a constant K such that for any y with y ∼ ≤ b we have Then by Proposition 4.3 for any y = (ξ, π, ρ, ν) ∈ B Y (0; b) and any solution (x, p, u) of the perturbed problem y ∈ F (x, p, u) Assumption (A3) is satisfied by σ := B p + Sx − ρ with constants α, τ and l min (σ, τ ) ≥ m 0 . An easy calculation shows that the switching function of the solution ( x, p, u) (given by (4.14)) of (4.16) is given by B p + S x = B p + S x − ρ = σ . Then Theorem 3.3 in the detailed form in Remark 3.4 is applicable to (4.16) with the constant c independent of the particular y ∈ B Y (0; b). In particular, this implies that ( x, p, u) is the unique solution for (4.16). Therefore, u = u is bang-bang and F −1 is single valued on B Y (0; b). For any y = (ξ , π , ρ , ν ) ∈ B Y (0; b) and its solution (x , p , u ) of y ∈ F (x , p , u ) we define , 0)).
An easy calculation shows the inclusion y ∈ F y ( x , p , u ). Then Theorem 3.3 (in the form in Remark 3.4) implies where · Ŷ denotes the norm ofŶ. Hence by (4.17) we have Since u, u are bang-bang, similar to [23, p. 4130] we have u − u 1 ≥ 2d # (u − u ) which proves (4.13).
We mention that the strong bi-metric regularity for Mayer's problems is proved in [23] for a general polyhedral set U and also in the case κ > 1. Extension of Theorem 4.5 to a general compact polyhedral U set is a matter of modification of Assumption (A3) and technicalities that we avoid in this paper, while the case κ > 1 is still open and challenging for the Bolza problem.

Stability of Bi-metric Regularity Under Perturbations
In this subsection, we will apply Theorem 4.2 to prove that the strong bi-metric regularity property is stable under some class of nonlinear perturbations.

x(t) W (t)x(t) + w(x(t), t) + x(t) S(t)u(t) + s(x(t), t), u(t) dt.
Here a : T ] → R m are continuously differentiable functions. All these functions will be assumed "small" in a sense clarified in the theorem below.
The system of necessary optimality conditions for problem (4.18) is given by (4.19) where the subscript x (as in a x ) means differentiation with respect to x.

=ṗ(t)+ A(t)+ a x (x(t), t)+( B(x(t), t)u(t)) x p(t)+W (t)x(t)+ w x (x(t), t) +S(t)u(t),
The system (4.19) can be recast as where F (corresponding to the non-perturbed system) is given by (1. 2) and f is defined by As before we consider F as a set-valued mapping X ⇒ Y, where the spaces X and Y are defined in (4.11) and (2.1), respectively. We fix a solutionẑ := (x,p,û) of the inclusion 0 ∈ F (x, p, u).

Assumption (B)
The mapping F : X ⇒ Y is strongly bi-metrically regular relative to Y ⊂ Y atẑ ∈ X for 0 ∈ Y.
We recall that sufficient conditions for strong bi-metric regularity of F are given in Theorem 4.5.
Our purpose will be to prove that the strong bi-metric regularity of F is not destroyed by the disturbance f , provided that the disturbances in (4.18) are sufficiently "small". Notice that the space X contains elements (x, p, u) for which some of the norms x ∞ , p ∞ , ẋ ∞ , ṗ ∞ , may be arbitrarily large or even infinite (the latter applies to the derivatives), that is, elements which are irrelevant to the linear-quadratic problem to which F is associated. Moreover, the image f ( X ) is not necessarily contained in Y, which is important from a technical point of view. Therefore, for a given compact set D ⊂ R n we introduce the complete metric space (with the metric d X ) Also, denote by F D := F | X D : X D ⇒ Y and f D := f | X D : X D → Y the restrictions of F and f to X D . = (x,p,û) be a solution of the nonperturbed system (PMP), and let Assumption (B) be fulfilled. Then there exists a compact set D 0 ⊂ R n such that for every compact set D ⊂ R n containing D 0 the restriction f D maps X D into Y and the mapping F D : X D ⇒ Y is strongly bi-metrically regular relative to Y ⊂ Y atẑ ∈ X for 0 ∈ Y.

Lemma 4.6 Let Assumption (A1) be fulfilled, letẑ
Proof First note that because of continuity of a, B, s, a x , B x , w x and s x we have that for every compact set D 0 the first three components of f D 0 are in L ∞ . Moreover the third component is differentiable in t and since ( B(x, t) p) x is continuous as a function in x, p and t, and s x is continuous this derivative lies in L ∞ . Hence f D 0 maps into Y.
Further let ς ≥ 0, a > 0 and b > 0 be the constants corresponding the strong bimetric regularity of F . Let y = (ξ, π, ρ, ν) ∈ B Y (0; b) and (x, p, u) ∈ X be a solution the generalized equation y ∈ F (x, p, u) (i.e. of (2.4)). Moreover we denote x(t) := x(t) −x(t), p(t) := p(t) −p(t) and u(t) := u(t) −û(t). Then by the solution formula of the Cauchy problems for x and p we get Below we prove a stability result in the same spirit as [23,Theorem 4.1], which concerns Mayer's problems. We mention that there is a gap in the proof of [23,Theorem 4.1], but it can be easily corrected by using Theorem 4.2 instead of [23,Theorem 2.1]. This is done in the next theorem which, in addition, extends [23,Theorem 4.1] to Bolza problems.

Theorem 4.7
Let assumption (A1') be fulfilled, letẑ = (x,p,û) be a solution of the nonperturbed system (PMP), and let Assumption (B) be fulfilled. Let D ⊂ R n be a compact set such that f ( X D ) ⊂ Y and the mapping F D is strongly bi-metrically regular relative to Y ⊂ Y atẑ ∈ X D for 0 ∈ Y (see Lemma 4.6). Then there exist positive real numbers ε 0 , δ and c with the following property.
For any positive number ε ≤ ε 0 let a, B, g, w, s be any functions satisfying the assumptions given above in this section and such that (ii) the mapping f + F : X D ⇒ Y is strongly bi-metrically regular at z * for 0 relative to Y ⊂ Y.
Proof We want to apply Theorem 4.2 for the mappings = F and ϕ = f at the point (ẑ,ŷ), whereŷ := f (x,p,û). Let ς, a, b be the numbers in the definition of strong bimetric regularity of F atẑ for 0, and let μ, ς , a , b , γ be arbitrary numbers such that the conditions (4.2) are fulfilled.
Since a, B, s, a x , B x , w x , s x , B t , s t , ∇ g are all bounded by ε andẋ,p,ṗ are bounded by |D| := sup x∈D |x| and |û| ≤ √ m we have that for some constant C 1 only depending on |D|, m and T . Similarly for z ∈ B X D (ẑ; a ) we have  (4.23) for some constant C 2 only depending on |D|.
Hence, if we choose ε 0 , δ and c such that then we can apply Theorem 4.2 to see that f + F is strongly bi-metrically regular atẑ forŷ with constants ς , a and b . Therefore, there is a unique z * ∈ B X D (ẑ; a ) such that and we have which proves (i). Moreover since (z * , 0) ∈ int (B X D (ẑ; a ) × B Y (ŷ; b )), the map f + F is also strongly bi-metrically regular at z * for 0. This proves (ii).
We mention that the issue of stability with respect to linearization of the strong bi-metric regularity property (in the spirit of Robinson's theorem [24]) is more complicated and will be a subject of a separate investigation, together with further applications of this property.

A Newton-Type Method for Bang-Bang Optimal Control Problems
In this section we investigate the convergence of a Newton-type method for solving affine optimal control problems under conditions which guarantee that the (strengthened) subregularity property in Theorem 3.3 holds for the linearized problem along the optimal solution. For this, we first present an abstract result which is similar to, but stronger than [5, Theorem 6.1], since it is based on the stronger version of sub-regularity in Theorem 3.3.
Theorem 5.1 Let (X, · X ) and (Y, · Y ) be Banach spaces. Let the mapping ϕ : X → Y be Fréchet differentiable (Dϕ denotes the derivative) and let : X ⇒ Y be a set-valued mapping. Letx be a solution of the inclusion Assume that there are positive constants R, L and c such that Then for x ∈ B X (x, r), where r = min{R, 2 5cL }, and for every solution z ∈ X of the Newton inclusion Before proving the theorem we mention that condition (5.2) is a strengthened form of the metric sub-regularity of the partial linearization x → ϕ(x) + Dϕ(x)(x −x) + (x) of the mapping ϕ + . The inclusion z ∈ B X (x, r) implies that any finite or infinite sequence generated by the Newton inclusion (5.3) and starting from B X (x, r) (if such exists) stays in B X (x, r). Inequality (5.4) claims quadratic convergence of any such sequence which starts in the interior of B X (x, r).
Proof For any x ∈ B X (x, r), let z ∈ X be an arbitrary solution of (5.3) (if any). Then, This means that z solves (5.3) with perturbation y given by the right-hand side of the inclusion above. Therefore, (5.2) yields that Hence, Since 1 − cL x −x X ≥ (1 − cLr) ≥ 3 5 we obtain (5.4), which implies that z ∈ B X (x, r).

Remark 5.2
A similar convergence result of the Newton's method can be found in [4] for variational inequalities and nonlinear programming. In that paper, the author introduces the conditions of hemi-stability and hemi-regularity in order to ensure the convergence of the Newton's method. The assumptions in Theorem 5.1 are weaker, but existence of a Newton sequence is not claimed, similarly as to [5, Theorem 6.1]. Existence will follow in the analysis of optimal control problems that follow. Now, we shall use Theorem 5.1 to investigate the convergence of the Newton method for the following affine optimal control problem: Here the functions a : R n × R → R n , B : R n × R → R n×m , w : R n × R → R, s : R n × R → R m and g : R n → R are given. Further, we use the following assumptions.
Assumption (A1") The functions a, B, w, s are twice differentiable in x, and all these functions and derivatives of first and second order are continuous in t and locally Lipschitz in x, uniformly in t. g is twice continuously differentiable with Lipschitz derivate. The problem (5.5) has a solution, (x,û).

Remark 5.3
The optimality can be understood as local, since it is only important that the Pontryagin maximum principle is fulfilled for (x,û). Due to the linearity of the problem with respect to the control and the compactness and convexity of the control constraints, existence of an optimal solution is granted if the differential equation in (5.5) has a solution on [0, T ] for every u ∈ U .
By the Pontryagin minimum principle, there exists an absolutely continuous functionp such that the triple (x,p,û) solves for a.e. t ∈ [0, T ] the system

t)u(t)) x p(t) + w x (x(t), t) + s x (x(t), t) u(t),
0 ∈ B(x(t), t) p(t) + s(x(t), t) + N U (u(t)), 0 = p(T ) − ∇g(x(T )), (5.6) where the subscript x (as in a x ) means differentiation with respect to x. We rewrite system (5.6) as the following generalized equation 0 ∈ f (x, p, u) + G(x, p, u), G : X ⇒ Y is given by and X and Y are the spaces defined in Section 2, namely X = W 1,1 Following [8,Chapter 6.3], we define the Newton-type method for solving problem (5.7) as follows, where z k := (x k , p k , u k ) denotes the obtained iterate at step k = 0, 1, . . .. Newton's method: 1. Choose z 0 ∈ X . 2. Given z k , obtain z k+1 as a solution of the generalized equation Here, Df (z) is the Jacobian of f at z. We mention that if z k satisfies (5.10) then u k is an admissible control, because N U (u) = ∅ whenever u ∈ U .
For anyz ∈ X the inclusion f (z) + Df (z)(z −z) + G(z) 0 represents the Pontryagin system of necessary optimality conditions for a linear-quadratic problem which can be recast as (P) by introducing an additional state variable, similarly in the proof of Theorem 4.5. We denote this problem by LP (z) (we skip its explicit formulation, which can be found for instance in [9,Section 5]). For the next theorem it is important to ensure that the claim in Theorem 3.3 holds for the particular problem LP (ẑ) corresponding toz =ẑ, which obviously has the solutionẑ -the solution of the non-linearized problem (5.5). Therefore, we make the following assumptions, related to Assumption (A2) and (A3) in Section 2.
Assumption (A2") The objective functional in problem LP (ẑ) is convex on the set of all admissible pairs F (see Remark 4.4).
The next theorem claims that on the assumptions made, Newton's method generates a sequence quadratically converging to the optimal solution of (5.5).
Proof Since problem LP (z k ) has a solution and the generalized equation (5.10) represents the Pontryagin necessary optimality conditions for this problem, the iterate z k exists for every k. We will apply Theorem 5.1 with spaces X and Y (for X and Y ) and mappings f and G (for ϕ and ).
An easy but cumbersome calculation (which we skip) shows that Assumption (A1") implies that the mapping f : X → Y is Fréchet differentiable with locally Lipschitz derivative. Thus condition (5.1) in Theorem 5.1 is satisfied with ϕ = f and some constants R and L. Moreover, thanks to Assumptions (A1")-(A3"), Problem LP (ẑ) fulfills Assumptions (A1)-(A3) in Theorem 3.3. This implies (see Remark 3.4 and Footnote 1) that condition (5.2) in Theorem 5.1 is also fulfilled with some constant c. Then the convergence claimed in the present theorem follows from Theorem 5.1 with the neighborhood O defined as the open ball in X centered atẑ and with radius r, where r is defined in Theorem 5.1.

Conclusion
This paper contributes to the regularity theory for Bolza-type optimal control problems with linear dynamics, quadratic in the state and linear in the control objective integrand, and a non-linear terminal term. Conditions for Lipschitz/Hölder sub-regularity and bi-metric regularity are obtained and the results are utilized for obtaining a convergence result for the Newton method applied to non-linear problems that are affine with respect to the control. One of this conditions, which is particularly restrictive, requires that the optimal control is of pure bang-bang type. Extensions of the regularity results and the Newton method to control-affine optimal control problems with singular arcs is an important open area.