Higher-order numerical scheme for linear quadratic problems with bang–bang controls

This paper considers a linear-quadratic optimal control problem where the control function appears linearly and takes values in a hypercube. It is assumed that the optimal controls are of purely bang–bang type and that the switching function, associated with the problem, exhibits a suitable growth around its zeros. The authors introduce a scheme for the discretization of the problem that doubles the rate of convergence of the Euler’s scheme. The proof of the accuracy estimate employs some recently obtained results concerning the stability of the optimal solutions with respect to disturbances.


Introduction
Discretization schemes for optimal control problems have been largely investigated in the last 60 years (see, e.g., [6][7][8][9]17,21], and the more recent paper [5] and the references therein). In the aforementioned papers and in most of the literature, the optimal controls are typically assumed to be sufficiently smooth (at least Lipschitz continuous) and results are usually based on second-order optimality conditions. On the other hand, whenever the control appears linearly in the system, the lack of coercivity typically leads to discontinuities of the optimal controls.
Recently, new second-order optimality conditions for systems that are linear with respect to the control have been developed. We refer to [3,22] for analysis of secondorder necessary conditions for bang-bang and singular-bang controls, respectively. Results on the stability of solutions with respect to disturbances were also recently obtained, see [4,12,14,25] and the bibliography therein. Based on these results, error estimates for the accuracy of the Euler discretization scheme applied to various classes of affine optimal control problems were obtained in [1,2,13,18,26,27]. The error estimates are at most of first order with respect to the discretization step, which is natural in view of the discontinuity of the optimal control. For the same reason, using higher order Runge-Kutta discretization schemes on a fixed grid does not help to improve the order of accuracy. Seemingly, the first paper that addresses the issue of accuracy of discrete approximations for affine problems is [30], where a higher order Runge-Kutta scheme is applied to a linear system, but the error estimate is of first order or less.
A new type of discretization scheme was recently presented in [23] for Mayer's problems for linear systems. The idea behind this scheme goes back to [15,20,28] and is based on a truncated Volterra-Fliess-type expansion of the state and adjoint equations. The analysis of the convergence and of the error estimate makes use of the strong Hölder metric sub-regularity of the map associated with the Pontryagin maximum principle, proved in [25].
The goal of the present paper is to extend this discretization scheme and the pertaining error analysis to affine linear-quadratic problems. This extension is not a routine task, due to the appearance of the state function in the associated with the problem switching function, and of both the state and the control, in the adjoint equation.
More precisely, we consider the problem Here [0, T ] is a fixed time horizon, the state x is n-dimensional, the initial state x 0 is given, the control u is m-dimensional, Q ∈ R n×n , q ∈ R n , and the matrix functions A, W : [0, T ] → R n×n and S, B : [0, T ] → R n×m are given data; the superscript " " denotes transposition. Admissible controls are all measurable functions with values in the set U for a.e. t ∈ [0, T ]. Linear terms are not included in the integrand in (1.1), since they can be shifted in a standard way into the differential equation (1.2). The optimal controls in the problem (1.1)-(1.3) are typically concatenations of bang-bang and singular paths. In this paper, we assume that the optimal controls are of strictly bang-bang type with a finite number of switches, and the components of the switching function have a certain growth rate at their zeros, characterized by a number κ ≥ 1. This number appears in the error estimate obtained in this paper for the proposed discretization scheme. "Generically", κ = 1, and in this case the error estimate is of second order.
The paper is organized as follows. In Sect. 2 we recall some notations and formulate the assumptions. In Sect. 3 we introduce our discretization scheme and present the main result-the error estimate. Section 4 contains the proof. Section 5 presents an error estimate in case of inexact solution of the discretized problem. A numerical experiment confirming the theoretical findings is given in Sect. 6. Concluding remarks and further perspectives are discussed in Sect. 7.

Preliminaries
First of all we pose some assumptions, which are standard in the context of problem (1.1)-(1.3).
The set of all admissible controls will be denoted by U ⊂ L ∞ . A pair (x, u) formed by an admissible control u and the corresponding solution x of (1.2) is referred to as an admissible process, and the set of all admissible processes is denoted by F. The set F will be considered as a subset of the space W 1,1 has the usual meaning. 1 Due to the compactness and the convexity of U , the set F is compact with respect to the L 2 -weak topology for u and the uniform norm for x. Thus a minimizer (x,û) does exist in the space W 1,1 The following assumption requires a sort of "directional convexity" of the objective functional J at (x,û).

Assumption (A2)
Let (x,û) be an optimal process in problem (1.1)-(1.3). According to the Pontryagin (minimum) principle, there existsp ∈ W 1,∞ such that (x,û,p) satisfies the following system of generalized equations: for a.e. t ∈ [0, T ], 4) where N U (u) is the normal cone to U at u: System (2.1)-(2.4) can be shortly rewritten as 0 ∈ F(x, p, u), (2.5) where F is the set-valued map defined by The mapping F is considered as acting from the space X to the space Y, where The norms in these spaces are defined as usual: for (x, p, u) ∈ X and (ξ, π, ρ, ν) ∈ Y, and (ξ, π, ρ, ν) := ξ 1 + π 1 + ρ ∞ + |ν|, where x 1,1 abbreviates x W 1,1 and · s is the norm in L s , s ∈ {1, ∞}. Notice that the normal cone N U (u) to the set U ⊂ L 1 at u(·) ∈ U has the point-wise representation {ξ ∈ L ∞ : ξ(t) ∈ N U (u(t)) a.e. on [0, T ]}. We also define the distance Thus, for j = 1, . . . , m,û where σ j andû j stay for the j-component of σ andû, respectively. The following theorem plays a crucial role in the error analysis of the discretization scheme presented below. It is a modification (under weaker conditions and a different space setting) of [4,Theorem 8], and is proved in [24]. Theorem 2.2 Let Assumption (A1) and (A2) be fulfilled. Let (x,p,û) be a solution of generalized equation (2.5) and let Assumption (B) be fulfilled with some real number κ ≥ 1. Then for any b > 0 there exists c > 0 such that for every y := (ξ, π, ρ, ν) ∈ Y with y ≤ b, there exists a triple (x, p, u) ∈ X solving y ∈ F(x, p, u) and every such triple satisfies the inequality We mention that the above property of the mapping F and the reference point (x,û,p) ∈ X and 0 ∈ Y is a stronger version (non-local with respect to (x, p, u)) of the so-called metric sub-regularity [10].

Discretization scheme
In this section we propose a discretization scheme for problem (1.1)-(1.3) which has a higher accuracy than the Euler scheme without a substantial increase of the numerical complexity of the discretized problem. We recall that the Euler method has already been profoundly investigated in the case where bang-bang controls appear (e.g. [1,2,13,18,26]). As mentioned in the introduction, in doing this we use an idea that originates from [15,28] and was implemented in [23] in the case of Mayer's problems. The approach uses second order truncated Volterra-Fliess series, as described in the next subsection.

Truncated Volterra-Fliess series
where for shortness we skip the fixed argument t i in the appearing functions, that is, As usual, we denote by O(ε), ε > 0, any function that satisfies |O(ε)|/ε ≤ c, where c is a "generic" constant, that is, depending only on the data of the problem (thus, independent of i and t, although O(h 3 ) may depend on i and t in the above context). The second-order expansion of the solution of the adjoint equation (2.2) differs from that in [23, Section 3] due to the presence of an integral term in the objective functional (1.1), therefore we derive it below. For all Applying the first order Taylor expansion for A, W , S at t i , the representations (3.1) of x(s) and (3.2) of p(s), and skipping all third order terms we obtain that Hence, we obtain the following truncated Volterra-Fliess expansion for the adjoint function: Now we shall derive a second order approximation to the integral term of the objective functional J (x, u) on [t i , t i+1 ] in term of the first and second order momentum of the controls.
Concerning the quadratic term in the x-variable we make use of the first order part of representation (3.1) and of the Taylor expansion at t i for W . Remembering that W is a symmetric-matrix-valued function, we obtain that Now we consider the mixed term in the integral in (1.1). A calculation of the same fashion as the previous one yields Let us focus on the last term: Integrating by parts we obtain the relation Following [28], in order to obtain a second-order expansion expressed in term of the first and second-order momentum of u, we assume the following.
Indeed, using Assumption (I) we obtain from the last exposed equality the expression which can be substituted in (3.5).
Notice that Assumption (I) is always fulfilled if m = 1. The above obtained second order approximations will be used in the next subsection to define an appropriate discrete-time approximation of problem (1.1)-(1.3).

The numerical scheme
First of all, observe that the representation (3.1) of x(t) for t ∈ [t i , t i+1 ] depends on the control u only through the integrals The same applies to the approximations of the integral terms of the objective functional obtained in the last subsection. By changing the variable t = t i + hs, this pair of integrals can be represented in the form hz 1 and h 2 z 2 , respectively, where generates a strictly-convex and compact set Z m ⊂ R 2m . Note that Z m can be expressed as the Cartesian product m 1 Z , where Z is the Aumann integral As pointed out in [23], it is a matter of standard calculation to represent the set Z in the more convenient way as where φ 1 (α) := 1 4 −1 + 2α + α 2 and φ 2 (α) := 1 4 1 + 2α − α 2 . Following the hint provided by the representation (3.1), we introduce the notations and replace the differential equation (1.2) with the discrete-time controlled dynamics Taking into account the approximations of the objective functional in the previous subsection, we introduce its discrete-time counterpart: for We denote by (P h ) the discrete problem of minimizing (3.9) subject to (3.7)-(3.8). The Karush-Kuhn-Tucker theorem gives the following necessary conditions for optimality of (x 0 , . . . , In order to obtain (3.12) we use again Assumption (I).

Construction of continuous-time controls and order of convergence
The construction is by idea similar to that in [23] with the essential difference that now u takes values only in the set {−1, 1} and the construction is simpler.
We mention that in our framework the pairs (u A third possibility is that (u In fact, formula (3.15) gives a unified description of all the above cases, where some of the three subintervals intervals in (3.15) degenerate in the (typical) cases (i) and (ii).
We mention that, for time-invariant problems without singular arcs, Assumption (B) is typically fulfilled with a number κ ∈ {1, 2, 3, . . .}, corresponding to the multiplicity of the zeros of the switching function. As argued in [23], the case κ = 1 is in a certain sense "generic" and the error estimate (3.17) is of second order in this case. Also in the case κ > 1 the order of accuracy is doubled in comparison with that proved in [26] for the Euler scheme. Utilization of higher order Runge-Kutta schemes on a fixed mesh could not improve the accuracy of the Euler scheme due to the discontinuity of the optimal control. We embed the sequences {x i } and { p i } into the spaces W 1,1 x 0 and W 1,1 using the hint provided by the expansions developed in Sect. 3.1. Namely, for t ∈ [t i , t i+1 ), we define and We show below that p ∈ W 1,1 . From (3.4) and (3.14) it follows that the right-hand side of (4.3) is equal to the expression of p i given by (3.11). Thus, p is continuous at t i , and hence p ∈ W 1,1 . The proof that x ∈ W 1,1 x 0 is analogous and can be found in [23,Section 5].
By Theorem 2.2, for every b > 0 there exist a number c such that if y ≤ b then where y = (ξ, π, ν, ρ) is the residual that (x, p, u) gives in (2.1)-(2.4), that is, y ∈ F(x, p, u). Thus we have to estimate the norm of this residual. The estimate ξ 1 ≤ O(h 2 ) of the first residual is obtained in [23,Section 4], where the primal differential equation is the same as in the present paper. We shall analyze below the residual in the remaining Eqs. (2.2)-(2.4).

Residual in (2.3).
First of all, we derive a second order expansion of the term B p + S x appearing in (2.3). By (4.1), (4.2) and the Taylor expansion for B and r we have Then, by using the definition of B i and C i we obtain that Since Assumption (I) means B S = S B, We also mention that the set Z is the area surrounded by the two parabolas β = φ 1 (α) and β = φ 2 (α), where φ 1 (α) ≤ φ 2 (α). Thus the normal cone to Z is easy to calculate and the following expression is provided in [23,Section 4]: where ζ = sgn(α − 2β). We consider separately each of the cases (i), (ii) and (iii) for construction of continuous time control that appear in Sect. 3.3.

Case (i)
Then due to (3.12) for every j = 1, . . . , m there exist μ ≥ 0 and λ ≥ 0 such that thus the quantity above is non-negative for all t ∈ [t i , t i + h). Thus, adding up (4.7) and (4.8), the latter multiplied by (t − t i )/ h, we obtain that By (3.14) the quantity above is identical to the j-th component of the right-hand side of (4.5), modulo O(t; h 2 ). By the fact that u j (t) = −1 in case (i), we thus obtain By (3.12), there exists μ ≥ 0 such that (4.12) Observe that, from the definition of θ j i , it follows that the quantity is non-positive whenever t ∈ [t i , t i + hθ j i ), and non-negative whenever t ∈ [t i + hθ j i , t i+1 ]. Thus, adding up (4.11) and (4.12), the latter multiplied by (t − t i )/ h, and using the definition of u j we obtain that By (3.14) and (4.5), the left-hand side term of the relation above is equal to (B p(t) + S(t) x(t) + r (t)) j + O(h 2 ). This proves (4.10) in the case (ii).

Case (iii) By (3.12) and the fact that
Then, This and (4.5) yield (4.10) in the case (iii). We can finally conclude that ρ ∞ = O(h 2 ). Summarizing, we have obtained that y ≤ c 1 h 2 , where c 1 is independent of N . Since c 1 h 2 ≤ c 1 T 2 =: b, Theorem 2.2 implies existence of c such that for every natural N Now we focus on the last term in the left-hand side. Sinceû and u take only values ±1, as already pointed out for instance in [25,Section 4], we have that d # (u,û) ≤ c 3 u −û 1 for some c 3 ≥ 0, and this concludes the proof.

Error estimate in case of inexact solution of the discrete problem
The estimation (3.17) in Theorem 3.1 is valid on the assumption that the discrete-time problem (3.7)-(3.9) is exactly solved. In the present section we incorporate in the error estimation possible inaccuracy in solving the discrete-time problem. The basic argument for that is identical with the one for Mayer's problems, presented in [23, Section 5], therefore we only sketch it.
We assume that as a result of a numerical procedure for solving the mathematical programming problem (3. Using the approximate solution {w i }, one can define an approximation,ũ(·), of the optimal controlû in the same way as described in Sect. 3.3. Then the estimation (3.17) in Theorem 3.1 takes the form The proof of this statement is not straightforward, but the argument is identical with that in [23, Section 5], therefore we do not repeat it here.
Clearly, in order to make the error arising in the proposed discretization scheme and the error in solving the discretized problem consistent, one has to solve the latter with accuracy ε proportional to h 2 . Here (for appropriate values of a and b) there is a unique optimal solution with a switch from u = −1 to u = 1 at time τ which is a solution of the equation

A numerical experiment
Moreover, τ is a simple zero of the switching function, thus κ = 1. Taking, for example, a = 1 and b = 0.1, the equation above becomes −5τ 4 + 24τ 3 − 48τ 2 + 44τ − 12.6 = 0 and the single real solution of this equation in [0, 1] is τ = 0.492487520 with all digits being correct. We solved system (3.10)-(3.13) for various values of the discretization step h = T /N (N is a natural number) by using a version of the gradient projection method. The computation of the approximate solutions ({x i }, {p i }, {w i }) is done with accuracy (measured by the residual ε-see Sect. 5) higher than h 2 , so that the theoretical error estimate (5.1) is ch 2 . As before, (x,p,û) is the exact solution of the considered problem, andũ is the continuous time control constructed as described in Sect. 3.3 for the computed {w i } = {(ũ i ,ṽ i )} (note thatũ depends on size of the mesh N , therefore further we use the notationũ N instead ofũ).  Table 1 we report numerical results, focusing on the most critical error e N = d # (û −ũ N ). We also calculate the value e N / h 2 , which according to (5.1) should be bounded. This is confirmed by the results.

Concluding remarks
In this paper we extend the analysis of the discretization scheme introduced in [23] for Mayer's problems to embrace convex linear-quadratic problems, affine with respect to the control. The optimal controls are assumed to be purely bang-bang, with an additional assumption involving the associated "switching function". Our discretization approach opts for solving a discrete-time optimization problem involving an additional control variable. This yields an order of convergence which doubles that of the Euler's method. The price for that is that the discrete problem involves quadratic control constraints (instead of the box-type constraints in the original problem).
It is worth noting that the components of the optimal controls could be, in general, concatenations of bang-bang and singular arcs. This challenging case will be a subject of further investigation. It requires, among other things, a deeper analysis of the metric sub-regularity of the system of first order optimality conditions under perturbations (a step in this direction is made in [13]). Another challenging issue is to avoid Assumption (I). Assumption of this kind is present also in [29], as well as in [16], in a nonlinear context, where it requires the Lie brackets of the involved controlled vector fields to vanish. An idea in this direction is presented in [19], but it does not seem to be efficient for numerical implementation.