On the convergence of the gradient projection method for convex optimal control problems with bang–bang solutions

We revisit the gradient projection method in the framework of nonlinear optimal control problems with bang–bang solutions. We obtain the strong convergence of the iterative sequence of controls and the corresponding trajectories. Moreover, we establish a convergence rate, depending on a constant appearing in the corresponding switching function and prove that this convergence rate estimate is sharp. Some numerical illustrations are reported confirming the theoretical results.


Introduction
Numerical solution methods for various optimal control problems have been investigated during the last decades [6,[8][9][10][11]. However, in most of the literature, the optimal controls are assumed to be at least Lipschitz continuous. This assumption is rather strong, as whenever the control appears linearly in the problem, the lack of coer- civity typically leads to discontinuities of the optimal controls. Recently, optimal control problems with bang-bang solutions attract more attention. Stability and error analysis of bang-bang controls can be found in [14,26,32]. Euler discretizations for linear-quadratic optimal control problems with bang-bang solutions were studied in [1,2,5,29]. Higher order schemes for linear and linear-quadratic optimal control problems with bang-bang solutions were developed in [24,27].
On the other hand, among many traditional solution methods in optimization, projection-type methods are widely applied because of their simplicity and efficiency [13,15,31].
Recently, the gradient projection method has been reconsidered for solving general optimal control problems [22,28]. Under some suitable conditions, it was proved that the control sequence converges weakly to an optimal control and the corresponding trajectory sequence converges strongly to an optimal trajectory. However, no convergence rate result has been established.
In this paper, we study the gradient projection method for optimal control problems with bang-bang solutions. In particular we consider the following problem minimize ψ(x, u) := g(x(T )) + and the functions f : R × R n × R m → R n , g : R n → R and h : R × R n × R m → R are given. Further we assume (see the next section for precise formulations) that the data are smooth enough, that the problem (1.1)-(1.3) is convex and that for the (unique) optimal control u * the objective function fulfills a certain growth condition. In particular we show that this condition is satisfied in the bang-bang case if each component of the associated switching function satisfies a growth condition as given in [25,29].
Under these assumptions, we prove that the control sequence actually converges strongly to the solution. Moreover, the convergence rates for both controls and states are provided, depending on the constant appearing in the growth condition for the switching function. An example is analysed showing that the estimation for these convergence rates is sharp.
The paper is organized as follows: In Sect. 2, we specify the assumptions we use and recall some facts which will be useful in the sequel. Section 3 discusses the convergence properties of the gradient projection method. Some numerical examples of linear-quadratic type are reported in Sect. 4 illustrating the results in the previous section. Some final remarks are given in the last section.

Preliminaries
In this section, we will clarify the assumptions used and recall some important facts which are necessary to establish our result.
By U := L 2 ([0, T ], U ) we denote the set of all admissible controls and if not stated otherwise · denotes the L 2 -norm. The first two assumptions guarantee that the problem (1.1)-(1.3) is meaningful.
Now recall the Hamiltonian of (1.1)-(1.3) as Then by the Pontryagin maximum principle there is an absolutely continuous function p * such that (x * , u * , p * ) solves the adjoint equatioṅ and for every u ∈ U Then we have the following useful formula for the gradient of J (see, e.g. [22,31]).
where x and p are the unique solution of (1.2) and (2.1) depending on u ∈ U.

Assumption A3
The objective function J is continuously differentiable on U with Lipschitz derivative.
We denote by L the Lipschitz modulus of the gradient ∇ J of J and write J * := J (u * ) for its optimal value. The following result is well known (see e.g. [ Assumptions A1-A3 are common in optimal control. For example the following two Assumptions B1-B2 imply A1-A3 (cf. [22]) Assumption B2 There exists c ≥ 0 such that for every x ∈ R n and u ∈ U : Additionally we assume the following.

Assumption A4
The objective function J is convex.
Note that if the set F of admissible pairs is convex this assumption is equivalent to the statement that the function ψ is convex on F. In particular this is the case if f is [25,29].
Further we will assume a growth condition for J that is similar to (4.7) in [3].
Assumption A5 For a solution u * of (1.1)-(1.3) there are constants β > 0 and θ ≥ 0 such that for every u ∈ U we have Note that in particular A5 implies that the solution u * is unique.

Remark 2.2
For coercive optimal control problems (in the sense of [12]) Assumptions A1-A4 are fulfilled as well as A5 for θ = 0. In these problems the objective function J however is even strongly convex and therefore one can apply known results (e.g. [21, Theorem 2.1.15]) directly to show linear convergence of the gradient projection method in this case.
In the following we will show that Assumption A5 is fulfilled for bang-bang controls with no singular arcs. We recall that in the case of bang-bang controls the function σ * := H u (·, x * , u * , p * ) is called switching function corresponding to the triple (x * , u * , p * ). For every j ∈ {1, . . . , m} denote by σ * j its j-th component. The following assumption says that the switching function σ * satisfies a growth condition around the switching points, which implies that u * is strictly bang-bang.
Assumption B3 There exist real numbers θ, α, τ > 0 such that for all j ∈ {1, . . . , m} Assumption B3 plays the main role in the study of regularity, stability and error analysis of discretization techniques for optimal control problems with bang-bang solutions. Many variations of this assumption are used in the literature about bangbang controls. To our knowledge the first assumption of this type was introduced by Felgenhauer [14] for continuously differentiable switching functions with θ = 1 to study the stability of bang-bang controls. Alt et al. [1,2,4] used a slightly stronger version of B3 with θ = 1, that additionally excludes the endpoints 0 and T as zeros of the switching function, to investigate the error bound for Euler approximation of linear-quadratic optimal control problems with bang-bang solutions. Quincampoix and Veliov [26] used a rank condition which implies B3 (including cases where θ = 1) to obtain the metric regularity and stability of Mayer problems for linear systems. Seydenschwanz [29], Preininger et al. [25], Pietrus, Scarinci and Veliov [24,27] used this assumption in the study of metric (sub)-regularity, stability and error estimate for discretized schemes of linear-quadratic optimal control problems with bang-bang solutions.
To prove that B3 implies A5 we need the following lemma, which is a simplified version of [29, Lemma 1.3] (see also, [1, Lemma 4.1]).

Lemma 2.3
Let Assumptions A1-A2 be fulfilled and let u * be a solution of (1.1)- where · 1 is the L 1 -norm.
To define the gradient projection method in the next chapter we will need the following notion of a projection. For each u ∈ U, there exists a unique point in U (see [17, p. 8]), denoted by P U (u), such that It is well known [17,Theorem 2.3] that the projection operator can be characterized by (2.5) Further to establish the convergence rate of the gradient projection method, we will need the following lemmas.
Then there is a number γ > 0 such that the sequence {α k } is non-summable and the sequence {s k } is decreasing. Then where the o-notation means that s k = o(1/t k ) if and only if lim k→∞ s k t k = 0.

Convergence analysis
We consider the following Gradient Projection Method (GPM):

Algorithm GPM
Step 0: Choose a sequence {λ k } of positive real numbers and an initial control u 0 ∈ U. Set k = 0.
Step 1: by solving the following differential equationṡ Step 2: Compute Step 3: If u k+1 = u k then Stop. Otherwise replace k by k + 1 and go to Step 1.
It is known (see e.g. [21, Theorem 2.1.14]) that for J continuously differentiable with Lipschitz derivative the gradient (projection) method has the convergence rate O( 1 k ) in terms of the objective value. I.e. that For the strongly convex objective function, it is known that the iterative sequence {u k } converges linearly to the unique solution. However, it is not possible to show convergence for the iterative sequence {u k } for the general convex case. Here, thanks to Assumptions A1-A5, we are able to prove that the iterative sequence {u k } generated by the GPM converges strongly to an optimal control. Moreover, the convergence rate is established, depending on the constants θ appearing in Assumption A5.
The following estimate will be used repeatedly in our convergence analysis.
Substituting u = u * ∈ U into the latter inequality yields This implies that Since J has Lipschitz derivative, we have from Lemma 2.1 that Substituting u = u k and v = u k+1 into the last inequality yields Moreover, since J is convex, we obtain Combining (3.6), (3.7) and (3.8) gives Using Assumption A5 we obtain We are now in the position to establish the strong convergence and the convergence rate of {u k } to a solution.

Theorem 3.2 Let
Assumptions A1-A4 be satisfied, let u * be a solution of (1.1)-(1.3) such that Assumption A5 is fulfilled with some θ > 0. Let the sequence {λ k } be chosen such that Then we have Proof We first prove that {u k } converges strongly to u * . From (3.4) and 0 < λ min ≤ λ k ≤ 1 L , the sequence { u k − u * } is decreasing and bounded from below by 0, and therefore it converges. Moreover, since we conclude that { u k − u * } converges to 0, which means {u k } converges strongly to u * . Now we can apply Lemma 2.5 for s k = u k − u * 2 , α = θ and δ k = 2λ min β to obtain the convergence rate (i) for { u k − u * }.
Substituting u = u k in (3.5) implies Combining (3.7) and (3.11) we get Hence the sequence {J (u k )} is monotonically decreasing. Now from (3.9) and 0 < λ min ≤ λ k ≤ 1 L we have Summing this inequality from 0 to i − 1 we obtain Finally, taking the limit as i → ∞, we obtain (ii).

Remark 3.3 From (ii) in Theorem 3.2, we can conclude that
The following example illustrates that the estimation (i) in Theorem 3.2 cannot be improved when λ k is bounded from below by a constant λ min . (3.13) where σ is any continuous function fulfilling Assumption B3. Then ∇ J (u)(t) = σ (t) is independent of u and the optimal control is given by u * (t) = −sgn(σ (t)). Starting the GPM with u 0 ≡ 0 and λ k = λ for some λ ∈ R + we get

Example 3.4 Consider the following optimal control problem
In the special case σ (t) = t θ , we therefore have u k (t) = max{−1, −kλt θ }. This implies that for k > 1 λT θ , we have For the objective value we get (3.14) which is stronger than (ii). It remains unknown whether in the general case the estimation (ii) can be improved to an estimation similar to (3.14).
Using the stronger Assumptions B1-B2 the convergence rate of the corresponding trajectories can be obtained as a corollary of Theorem 3.
When the Lipschitz modulus L is difficult to estimate, one can consider the nonsummable diminishing stepsizes as follow. Then the sequence {u k } converges strongly to u * . Moreover there exists N > 0 such that for all k ≥ N , it holds where μ k := k−1 i=N λ i and C is a constant.
Proof Let β > 0 be as in Proposition 3.1. Since lim k→∞ λ k = 0, there exists N > 0 such that for all k ≥ N we have 1 − λ k L > 0 and 2λ k β < 1. From (3.4) we have that { u k − u * } is decreasing, therefore it converges. Moreover Using Lemma 2.5 with s k = u k+N − u * 2 , α = θ and δ k := 2λ k+N β we get that there exists γ > 0 such that which shows (i).
Using the same example as above we can again show that the estimation (i) cannot be improved.
Example 3.7 Consider the problem (3.13) with σ (t) := t θ again. As before we use GPM with u 0 ≡ 0 but now with non-constant λ k . Denoting μ k := k−1 i=0 λ i we get u k (t) = max{−1, −μ k t θ }. Hence for k big enough such that μ k > 1 T θ we have Similar to Corollary 3.5 we obtain

Numerical illustrations
In this section, we present some numerical experiments for a class of optimal control problems with bang-bang solutions namely linear-quadratic problem, described as follow.
Here we use the classical Euler discretization where the error estimate can be found in [1,2,5]. We choose a natural number N and define the mesh size h := T /N . Since the optimal control is assumed to be bang-bang, we identify the discretized control u N := (u 0 , u 1 , . . . , u N −1 ) with its piece-wise constant extension: Moreover, we identify the discretized state x N := (x 0 , x 1 , . . . , x N ) and costate p N := ( p 0 , p 1 , . . . , p N ) with its piece-wise linear interpolations The Euler discretization of (1.1) is given by where ψ N is the cost function defined by Observe that (P N ) is a quadratic optimization problem over a polyhedral convex set, where the gradient projection method converges linearly, see e.g., [30]. This means that for each N , there exists ρ N ∈ (0, 1) such that In the following examples, we will consider various values of N which suggest that This will confirm the sublinear rate obtained in Theorem 3.2. The codes are implemented in Matlab. We perform all computations on a windows desktop with an Intel(R) Core(TM) i7-2600 CPU at 3.4 GHz and 8.00 GB of memory. Since ∇ J is linear in u, one can roughly estimate its Lipschitz constant by L = ∇ J (u 0 ) / u 0 . We choose starting control u 0 (t) = 1 ∀t ∈ [0, T ] and stepsize λ k = λ < 1/L. The stopping condition is u N k − u N k−1 ≤ , where = 10 −10 . The following example is taken from [27].  Here, with appropriate values of a and b, there is a unique optimal solution u * with a switch from − 1 to 1 at time τ , which is a solution of the equation As in [27], we choose a = 1, b = 0.1, then τ = 0.492487520 is a simple zero of the switching function. Therefore, θ = 1 and the exact optimal control is The convergence results for Example 4.1 with some different values of N are reported in Table 1. We can see that when N increases, ρ N also increases and approaches 1. This means that we can only guarantee the sublinear convergence for the continuous problem. Figure 1 displays the optimal control and the optimal states when the discretized size N = 50.
The following second example is taken from [1, Example 6.1] The exact optimal control is given by The convergence results for Example 4.2 with some different values of N are reported in Table 2. Again, we see that when N increases, ρ N also increases and approaches 1. Figure 2 displays the optimal control and the optimal states for N = 50.
In the next example, we consider a problem in which Assumption A5 is satisfied for θ = 1 (see also [27,29]).

Example 4.3
Here we present experiments with a family of problems satisfying Assumption A5 with various values of θ , given in [29]. Below, the time-interval is [0, 1], the dimension of the state is n = θ + 1 and the dynamics system depends on parameters s j :  Then Assumption A5 is satisfied with the constant θ [29] and exact optimal control is given by if θ is odd, and u * (t) = −1 if θ is even. The convergence results for Example 4.3 when θ = 2, 3 with some different values of N are reported in Table 3. Figure 3 displays the approximate optimal controls after 1000 iterations for N = 500. It seems like the optimal control has θ switching points. This is to be expected since σ * has a zero of order θ at 1/2.

Concluding remarks
Note that the main results in Theorems 3.2 and 3.6 use Assumption A5 which is more general than just the bang-bang case. For example Assumption A5 is also satisfied in the strongly convex case, where even better convergence results are known. Further it would be interesting to see under what assumptions our results still apply in the case of singular arcs. This is challenging due to the fact that currently there is no condition similar to the bang-bang Assumption B3 that ensures Assumption A5 and therefore remains as a topic for future research.