Error estimates for Runge–Kutta schemes of optimal control problems with index 1 DAEs

In this paper we derive error estimates for Runge–Kutta schemes of optimal control problems subject to index one differential–algebraic equations (DAEs). Usually, Runge–Kutta methods applied to DAEs approximate the differential and algebraic state in an analogous manner. These schemes can be considered as discretizations of the index reduced system where the algebraic equation is solved for the algebraic variable to get an explicit ordinary differential equation. However, in optimal control this approach yields discrete necessary conditions that are not consistent with the continuous necessary conditions which are essential for deriving error estimates. Therefore, we suggest to treat the algebraic variable like a control, obtaining a new type of Runge–Kutta scheme. For this method we derive consistent necessary conditions and compare the discrete and continuous systems to get error estimates up to order three for the states and control as well as the multipliers.


Introduction
Direct discretization methods are often utilized to numerically solve optimal control problems because they are robust and able to solve difficult problems with state and control constraints (cf.Betts [7], Kraft [23], and von Stryk [37]).In order to justify the application of approximation schemes, an investigation of error estimates between the solutions of the continuous and discrete problem is crucial.In addition, the analysis reveals conditions under which discretization schemes yield a solution and at which rate it converges.In process engineering, mechanical engineering, and path planning, optimal control problems subject to differential-algebraic equations (DAEs) might occur (cf.Kunkel and Mehrmann [24]).These can be solved efficiently with Runge-Kutta schemes as proposed in Gerdts [16].However, a theoretical analysis of these discretizations applied to problems with DAEs is missing in the literature.Runge-Kutta schemes are especially important for DAEs of index 3 and higher since the Euler method does not converge in that case (cf.Brennan et al. [9]).As a first step, to reduce the gap between applications and theory, we analyze Runge-Kutta schemes applied to optimal control problems with index 1 DAEs.The knowledge gained from these investigations will then be utilized for problems with higher index DAEs.
The study of discretized optimal control problems is still a current field of research particularly in the context of DAEs.The Euler-scheme applied to nonlinear problems with ordinary differential equations (ODEs) and smooth controls has been analyzed in [8,12,14,25].Herein, Malanowski et al. [25] consider problems with mixed control-state constraints, Dontchev et al. [12] also include pure state constraints, whereas Bonnans and Festa [8] and Dontchev and Hager [14] consider problems solely with pure state constraints.Gerdts and Kunkel [17] analyze nonlinear problems with control-state constraints and controls of bounded variation.They derive error estimates of order 1  p with respect to the L p -norm.Runge-Kutta schemes for problems with convex control constraints are examined by Dontchev et al. [13] for order 2 methods and by Hager [18] up to order 4 methods.
First order discretization methods applied to problems with bang-bang optimal control have been discussed in [1-6, 32, 35, 36].Alt et al. [1-3, 5, 36] examine linear and linear-quadratic problems.They assume that the switching function does not have singular subarcs.Linear-quadratic problems with additional L 1 -sparsity terms in the cost functional are analyzed by Alt and Schneider [4] and Schneider and Wachsmuth [35].Alt et al. [6] and Osmolovskii and Veliov [32] study affine problems.In terms of higher order discretization schemes, these types of problems have been examined by Veliov [38] for Runge-Kutta methods, by Haunschmied et al. [21] using the stability concept of strong bi-metric regularity, and by Pietrus et al. [33] based on second order Volterra-Fliess approximations (compare also Scarinci and Veliov [34]).
Recently, using the implicit Euler-scheme, Martens and Gerdts [26][27][28][29][30] and Martens and Schneider [31] derived error estimates for different types of optimal control problems with DAEs.The nonlinear index 1 case was discussed in [26].Convergence for the index 2 case was analyzed for problems with mixed control-state constraints in [28] and with pure state constraints in [30].Linear quadratic problems and affine problems with bang-bang controls have been discussed in [29,31], respectively.
In this paper we consider the following type of problem: Minimize ϕ(x( 1)) (OCP) subject to ẋ(t) = f (x(t), y(t), u(t)) a.e. in [0, 1] , (1) 0 = g (x(t), y(t), u(t)) a.e. in [0, 1] , (2) with the functional ϕ : R n x → R and the functions f : R The algebraic state y is implicitly determined by the algebraic constraint (2).DAEs are characterized by a quantity called index, which has various concepts, e.g., differentiation and perturbation index (cf.[24]).By differentiating the algebraic equation ( 2) with respect to t we get If we assume that the matrix g y is non-singular along a trajectory (compare Sect. 3), then we are able to solve this equation for ẏ to obtain an explicit ODE for the algebraic state.Therefore, the DAE (1), (2) has differentiation index 1 since differentiating once with respect to t was sufficient to derive an explicit ODE.

Remark 1
The non singularity of the matrix g y along a trajectory implies that the Eq. ( 2) is (implicitly) solvable for y with respect to x and u.In theory, it would then be possible to reduce (1), ( 2) to an ODE and to apply a numerical scheme afterwards.However, in the context of DAEs, this has the drawback that the algebraic constraints are no longer enforced in the discrete system.Thus, depending on the dynamic, the discrete solution may suffer from the drift-off effect (cf. [10,20]).Therefore, we suggest to discretize the system (OCP) directly and then to solve the discrete optimization problem.
Runge-Kutta schemes are often implemented to get accurate numerical solutions of DAEs.Hairer et al. [19] proved convergence of Runge-Kutta methods for Hessenberg DAEs up to index 3. Usually, in order to approximate a DAE with these schemes, we proceed as follows: for N ∈ N, N ≥ 2 we define the mesh size h := 1 N and choose coefficients b j , a jk for j, k = 1, . . ., s.Then, we approximate the differential and algebraic state by Herein, z j i and w j i approximate the differential and algebraic state at t = t i + c j h with t i := ih and c j := s k=1 a jk .Moreover, q j i is an approximation of ẏ at t = t i + c j h.Thus, we have a discretization of the index reduced system, which we get by solving (4) for ẏ, i.e., Note that the second equation also depends on u.Furthermore, if we derive discrete necessary conditions with the standard Runge-Kutta scheme, we do not obtain an approximation of the continuous necessary conditions associated with (OCP) (compare ( 9)-( 12)).Instead, we get a discretization for the necessary conditions of the index reduced ODE system.To generate an approximation for the necessary conditions of (OCP) and avoid the dependency on u, we suggest to treat the algebraic state analogous to the control (cf.[13,18]).This yields the following approximation: with stage approximations z j i ≈ x(t i + c j h) as well as intermediate algebraic variable y j i ≈ y(t i + c j h) and control u j i ≈ u(t i + c j h).Then, with the abbreviation x i := The objective of this paper is to prove that this system has a local solution, which satisfies certain error estimates with respect to the continuous solution.To that end, we proceed as follows: In Sect. 2 we derive discrete necessary conditions for (DOCP), which are consistent with the continuous necessary conditions of (OCP).In Sect. 3 we introduce the main result (Theorem 1) of this paper and the assumptions required to prove it.We transform the discrete problem into an abstract setting for which we can apply a convergence theorem in Sect. 4. In Sect. 5 we estimate the consistency error and show that the discretization scheme is stable with respect to small perturbations.This allows us to apply Proposition 2 and prove the main result Theorem 1. Numerical experiments confirming the theoretical deductions are provided in Sect.6 for different schemes.In Sect.7 we summarize the results of this paper and give an outlook for the index 2 case.We moved some technical details that would disturb the reading flow to Appendix 1.

Notation
Moreover, we associate discrete sequences (w i ) i=0,...,N ⊂ R n with the spaces space of functions that are piecewise constant on t i , t i+1 ) , W n 1, p,h ⊂ W n 1, p space of continuous functions that are piecewise linear on t i , t i+1 ) Throughout the paper we use i as the index for the discrete time t i and j, k for the index of coefficients.Finally, to simplify notation, we often use the abbreviation F[t] for functions of type F( ŵ(t)) where ŵ is a local minimizer or Karush-Kuhn-Tucker (KKT) point.

Necessary conditions
The general procedure to derive error estimates for solutions of optimal control problems is to compare the associated necessary conditions and the dynamic with its discrete counterparts.Therefore, in this section we derive necessary conditions associated with (DOCP), which are consistent with the continuous necessary conditions.These hold if (OCP) has a (local) solution.A tuple 3) is a local minimizer of (OCP) if there exists > 0 such that 1)-( 3).Consequently, if (OCP) has a solution ( x, ŷ, û) and the index 1 condition in Sect. 3 holds, then there exist Lagrange multipliers λ ∞ such that the normalized necessary conditions for (OCP) (cf.[16,Theorem 3.4 are satisfied with the Hamilton function Herein, we have the adjoint DAE ( 9), (10) with adjoint differential state λ and adjoint algebraic variable μ as well as the endpoint condition (11).Next, deriving necessary conditions associated with the discrete system (DOCP) yields adjoint equations for multipliers λ i , Assuming b j > 0 for all j = 1, . . ., s and introducing the new multiplier gives us and therefore Inserting λ i+1 into the second equation yields the new necessary conditions ) 123 with the coefficients The Eqs. ( 14)-( 18) can be transformed back to the original conditions with the multiplier η j i , using the relation (13), i.e., both systems are equivalent (cf.[18, Proposition 3.1]).In ( 14)-( 18) the adjoint variables ν j i and μ j i can be viewed as stage approximations for λ and intermediate adjoint algebraic states for μ, respectively.However, the Eqs.( 5)-( 8) and ( 14)-( 18) are not a Runge-Kutta scheme applied to the KKT-system (1)-( 3), ( 9)-( 12) since the coefficients a jk and ã jk are not equal in general.Hence, further analysis is required to obtain error estimates.

Assumptions and main theorem
Before formulating the main result of this paper, we introduce the assumptions required to prove it.To conduct a convergence analysis in optimal control, it is typically presumed that the problem has a sufficiently smooth solution and that the system satisfies certain regularity properties.Moreover, in the nonlinear case, second order sufficient conditions are exploited since they imply stability of the problem.For the rest of the paper we assume the following: and ρ > 0 such that B ρ ( x(t), ŷ(t), û(t)) ⊂ M for all t ∈ [0, 1] the first κ derivatives of f and g exist and are Lipschitz continuous on M. Furthermore, the first κ derivatives of ϕ exist and are Lipschitz continuous on B ρ ( x(1)).(Index 1) The matrix g y ( x(t), ŷ(t), û(t)) is non-singular for all t ∈ [0, 1].(Coercivity) There exists γ > 0 such that the quadratic form x(0) = 0.
Remark 2 (i) If smoothness for κ ∈ {2, 3} and the index 1 condition are satisfied, then the Lagrange multipliers ( λ, μ) associated with the local solution ( x, ŷ, û) are in the space This can be seen by solving the adjoint algebraic equation (10) for μ which yields μ ∈ W n y 1,∞ .Then, the adjoint differential equation ( 9) implies W n y 2,∞ .The process is repeated until the suggested smoothness is reached.(ii) If the assumptions are satisfied, then there exists some β > 0 such that the Legendre-Clebsch condition holds for all (w, v) ∈ ker(g [15], [27,Lemma 5.3.3]).In addition, this implies that the matrix is non-singular for all t ∈ [0, 1].(iii) Smoothness and the index 1 condition imply that g y [t] and its inverse are continuous and uniformly bounded.Thus, we can solve the linear algebraic equation in the coercivity assumption for y and insert it into the differential equation to obtain with the abbreviations The quadratic form then reduces to with the matrix functions Table 1 Conditions for Runge-Kutta schemes of order one to three for optimal control Order Conditions For this reduced form we also have the coercivity condition 21), (22).Furthermore, the Legendre Clebsch Lemma 2], [15]).
Next, with the abbreviations c j := s k=1 a jk and d j := s k=1 b k a k j we introduce the conditions required to get Runge-Kutta methods of order one to three in Table 1. 3 for third order, which is not needed for Runge-Kutta methods applied to DAEs.But in the context of optimal control, some extra conditions for the coefficients arise since a jk and ã jk are not equal in general.
Note that the order κ of the error estimates in Theorem 1 is closely related to the assumed smoothness of the functions in (OCP).To derive error estimates, we exploit appropriate Taylor expansions, which contain higher order derivatives of the functions, i.e., they have to be sufficiently smooth (see Appendix 1 and Lemma 1).

Convergence theorem and abstract setting
Before we prove Theorem 1 in Sect.5, we first transform the discrete KKT-system ( 5)-( 8), ( 14)-( 18) to obtain an equation 0 = T (ω) and show that this system has a solution, which satisfies certain error estimates.To that end, we require the following result (cf.[ Our goal now is to transform the discrete KKT-system ( 5)-( 8), ( 14)-( 18) into a discretization of a boundary value problem such that we can apply Proposition 2. For that purpose, we introduce the abbreviations Then, we can write the discrete system ( 5)-( 8), ( 14)- (18) as which is an approximation of the DAE boundary value problem According to (20), the matrix is non-singular along the trajectory ( X , Ŷ ) = ( x, λ, ŷ, û, μ), i.e., the DAE has index 1.
Since jk contains a jk and ã jk , which are not equal in general, the discretization ( 23)-( 26) with stage approximations Z j i is not a classic Runge-Kutta scheme and existing convergence results cannot be applied.Hence, further examination is required.We proceed by reducing the system ( 23)-( 26) such that we obtain a discretization of an ODE boundary value problem.We first consider the algebraic equations 0 = G(Z j , Y j ), j = 1, . . ., s, which are satisfied for Z j = X (t i ), Ȳ j = Ŷ (t i ), j = 1, . . ., s for any i = 0, . . ., N −1.Moreover, the matrix G Y [t i ] −1 exists for i = 0, . . ., N − 1 (compare (20)).Thus, by the implicit function theorem (cf.[22, p. 29]), there exist some , δ > 0 such that the mappings Y j : B ( Z) → B δ ( Ȳ j ), j = 1 . . ., s are continuously differentiable and satisfy for Z ∈ B ( Z).Now, we insert the functions Y j (•), j = 1, . . ., s into the stage approximation (24) and consider the equations with some parameters P = (P 1 , . . ., P s ) replacing X i .These equations are satisfied for Z j = X (t i ) and P j = X (t i ) − hϒ j F[t i ], j = 1, . . ., s for any i = 0, . . ., N − 1. Differentiating the right hand side with respect to Z yields a matrix of the form I 2sn x + O(h).Hence, this matrix is non-singular for sufficiently small h.Applying the implicit function theorem once more gives us continuously differentiable mappings Z j : B ˜ ( P) → B δ ( Z j ) for j = 1, . . ., s and some ˜ , δ > 0.Then, for all P ∈ B ˜ ( P) we have Furthermore, since for sufficiently small h.Thus, the function Z j (•) is also continuously differentiable on (B ˆ ( X (t i ))) s for some 0 < ˆ < ˜ and satisfies the stage approximations (29) for P j = X (t i ).Finally, we insert Z j (•) and Y j (•) into the difference equation ( 23) to obtain for X i ∈ B ˆ ( X (t i )), i = 0, . . ., N .This system only depends on X = (x, λ) and is an approximation of the ODE boundary value problem we get by solving (2), ( 10), (12) for (y, u, μ) with respect to (x, λ) and inserting the result into the differential equations ( 1) and (9).For this discrete equation we intent to apply Proposition 2 to get error estimates for x and λ.Therefore, we introduce the Banach space := W 2n x 1,∞,h 123 and the linear, normed space := L 2n x 1,h × R 2n x .Finally, with X ∈ defined by Xi := X (t i ) for i = 0, . . ., N and some sufficiently small r ∈ (0, ˆ ) we have the continuously Frechét differentiable function T : B r ( X ) ⊂ → and the linear operator L : →

Proof of the main theorem
Before proving Theorem 1, we derive an upper bound for the consistency error T ( X ) with respect to h, which gives us error estimates for the discrete solution of 0 = T (X ) with respect to X after applying Proposition 2.
Lemma 1 If smoothness for κ ∈ {2, 3}, the index 1 condition, b j > 0 for j = 1, . . ., s, and the conditions in Table 1 up to order κ are satisfied, then we have for some constant ≥ 0 and sufficiently small h.

Proof
The second component of T ( X ) is zero.Thus, it remains to estimate In Appendix 1 we derive Taylor expansions for Xi+1 − Xi h and F(Z j ( Xi ), Y j (Z( Xi ))) for i = 0, . . ., N − 1 such that the remainder terms are of order O(h 3 ).Herein, we omit arguments, i.e., we write F = F[t i ] etc. for i = 0, . . ., N − 1.Now, we compare the terms of order O( 1 ã jk , we can write According to Table 1, for second order schemes we assume Thus, the terms are equal if the second order condition in Table 1 is satisfied.For third order we first consider linear terms of order O(h 2 ) such as These terms are equal if which is satisfied if the third order conditions in Table 1 hold (cf.[18,Theorem 4.1]).
For the quadratic terms we have, e.g., 1   6 which are equal if According to [18,Theorem 4.1], this holds if the third order conditions in Table 1 are satisfied.Thus, we obtain for Runge-Kutta schemes satisfying all conditions in Table 1.In summary, we have for some ≥ 0 if the conditions in Table 1 up to order κ are satisfied.
Next, we verify that the conditions of Proposition 2 are satisfied for our abstract setting: Lemma 2 Let the assumptions of Theorem 1 be satisfied.Then, for arbitrary θ > 0 there exists some sufficiently small r > 0 and h > 0 such that T (W ) − L ≤ θ for all W ∈ B r ( X ).Furthermore, for π = T ( ω) − L( X ) and some constant σ > 0 the mapping L −1 : B σ ( π) → is single valued and Lipschitz continuous with constant ϑ ≥ 0.
Proof First, we estimate the norm for the linear operator To this end, we abbreviate which is Lipschitz continuous with respect to (X , Y ) close to ( X (t), Ŷ (t)) for some t ∈ [0, 1].Using the sensitivities ( 27), ( 28), ( 30), (31) we get Then, the linearization of T in some W ∈ B r ( X ) yields and we can write the linear Operator L as For the first component of T (W )X − L(X ) we get for i = 0, . . ., N − 1 and some generic constants 1 , 2 , 3 ≥ 0. For the second component we exploit the Lipschitz continuity of to obtain the bound r X 1,∞,h .Thus, we have T (W ) − L ≤ θ for arbitrary θ > 0 if r and h are sufficiently small.
Next, we need to verify that L −1 : B σ ( π) → exists and is bounded.To this end, we consider the linear, perturbed system for i = 0, . . ., N − 1, j = 1, . . ., s, and perturbations These are the KKT-conditions associated with the linear quadratic program Minimize P h (x, y, u) for i = 0, . . ., N − 1, j = 1, . . ., s, and with the discrete quadratic form Since the matrix g y [t i ] is for all i = 0, . . ., N by the index 1 assumption, we can solve (34) for y j i and obtain the reduced system Minimize P h (x, u) where the reduced quadratic form is defined by and the perturbations ( The matrix functions A, . . ., Q are defined in Remark 2 (iii).Furthermore, the matrix Q(t i ) is uniformly positive definite for all i = 0, . . ., N −1 according to Remark 2 (iii).Hence, we can apply [13, Lemma 6.1] which yields a Lipschitz continuous solution of (RLQP) with respect to the perturbations ( π j f , π 0 , π j H x , π ϕ , π j H u ), j = 1, . . ., s.This in turn implies that the linear quadratic program (LQP) and therefore the KKT-system (33) have a solution for any perturbation.Moreover, we can write (33) as , and π := (π 0 , π ϕ ).Since the matrix G Y [t i ] is invertible by (20), we can solve the second equation for Y j i and insert the result into the first equation.This yields the perturbed equation L(X ) = π , which has a unique Lipschitz continuous solution with respect to π ∈ and some Lipschitz constant ϑ ≥ 0 as we have verified.Thus, the single valued mapping L −1 : B σ ( π) → exists and is Lipschitz continuous with constant ϑ.

Numerical example
In order to verify the results of Theorem 1, we consider following optimal control problem with a parameter α = 0: Utilizing the normalized necessary conditions associated with this problem yields the solution For α ∈ R \ {−4, −1, 0} the smoothness assumptions are satisfied.Furthermore, we can (explicitly) solve the algebraic equation for y, i.e., the DAE has index 1.Moreover, we have the quadratic form and the linearized dynamic If α > 0, then the coercivity condition ( 19) is obviously satisfied.For negative α the linearized dynamic implies x 1 (t) = 0 and y(t) = 1 2 u(t) for all t ∈ [0, 1].Hence, we obtain which is positive for α ∈ (−4, 0).However, the coercivity assumption does not hold for α < −4.Therefore, we did numerical experiments for the parameter values α = 1 and α = −4.2 as well as the Runge-Kutta schemes in Table 2.
The convergence order κ for ŵ = x, ŷ, . . .can be approximated with the formula Therefore, in Figs. 1 and 2 we plotted − log 2 ŵh − ŵ ∞,h for the different errors and the values N = 20, 40, 80, 160, 320, 640.Then, the convergence order is indicated by the vertical distance between two consecutive points.The Heun scheme is displayed in Fig. 1 with the aforementioned values of α.Solving the algebraic constraints for (y, μ, u) with respect to (x, λ) = ( xh , λh ) improves the accuracy for the algebraic states and control but not the convergence order of 2. Note that even if the coercivity condition is not satisfied (for α = −4.2) we still get second order error estimates.For the third order scheme Radau IA and α = 1 we get third order error estimates for the differential states but only second order error estimates for the algebraic states and control (compare Fig. 2).The order of convergence is improved by solving the algebraic constraints for (y, μ, u) with respect to (x, λ) = ( xh , λh ).For the parameter value α = −4.2we do not get third order error estimates for the differential states but still second order error estimates for the algebraic variables and control (compare Fig. 2).Since the differential states do not satisfy third order estimates, solving the constraints will not improve the convergence order for the algebraic states and control.Therefore, these errors were omitted for the α = −4.2case.

Conclusions
In this paper we proposed a new type of Runge-Kutta scheme for optimal control problems with DAEs.Instead of approximating the algebraic variable with stage derivatives, we treated the algebraic state like a control.For this discretization scheme we derived necessary conditions that are consistent with the continuous conditions and we were able to establish error estimate for Runge-Kutta schemes applied to optimal control problems with index 1 DAEs.The next step is to apply this method to problems with index 2 DAEs and derive error estimates.However, for optimal control problems with index 2 DAEs some new difficulties occur.There exists a discrepancy between the continuous and discrete necessary conditions, i.e., the adjoint continuous DAE has index 1 while the discrete system approximates an index 2 DAE.In [28] we were able to overcome this problem by performing an index reduction for the discrete system.In that case we used the implicit Euler discretization.To obtain higher order error estimates, the idea is to perform such a discrete index reduction in the context of Runge-Kutta schemes.Unfortunately, so far we have not been able to find a way to derive consistent necessary conditions.and F(Z j ( Xi ), Y j (Z( Xi ))) with Xi = X (t i ) for i = 0, . . ., N − 1.This gives us where the right hand side is evaluated at t = t i , i.e., F = F[t i ] etc..We abbreviate Z j = Z j ( Xi ) and Y j = Y j (Z( Xi )).Then, we have since Y j (•) is Lipschitz continuous.Furthermore, expanding F( Z j , Y j ) and G( Z j , Y j ) in ( X (t i ), Ŷ (t i )) =: ( X , Ŷ ) yields Hence, we get with ϒ j = s k=1 jk .This implies

= 1 ,
), O(h), and O(h 2 ) for both expansions.The O(1) terms F which corresponds to the order 1 condition in ã jk ck , 123

Fig. 1 2 Fig. 2
Fig. 1 Errors for the Heun scheme with the values α = 1 and α = −4.2 1), (2) is a DAE in semi-explicit form with Lipschitz continuous differential state x ∈ W n x 1,∞ and essentially bounded algebraic variable and control y We denote by R n the n−dimensional Euclidean space with the norm | • |.The space of n × m matrices A is endowed with the spectral norm A and the n−dimensional unit matrix is denoted by I n .Let B r (w) be the open ball with center w and radius r > 0. For generic, non-negative constants we use , 1 , 2 , . ... Furthermore, for vector-valued functions w : [0, 1] → R n , p ∈ [1, ∞], and k ∈ N we introduce the Banach spaces

Table 1
. For order O(h) terms we have 1 2

Table 2
Examined Runge-Kutta schemes for the example