# Coupling MPC and DP Methods for an Efficient Solution of Optimal Control Problems

- 1 Citations
- 457 Downloads

## Abstract

We study the approximation of optimal control problems via the solution of a Hamilton-Jacobi equation in a tube around a reference trajectory which is first obtained solving a Model Predictive Control problem. The coupling between the two methods is introduced to improve the initial local solution and to reduce the computational complexity of the Dynamic Programming algorithm. We present some features of the method and show some results obtained via this technique showing that it can produce an improvement with respect to the two uncoupled methods.

## Keywords

Optimal control Dynamic Programming Model Predictive Control Semi-Lagrangian schemes## 1 Introduction

The numerical solution of partial differential equations obtained by applying the Dynamic Programming Principle (DPP) to nonlinear optimal control problems is a challenging topic that can have a great impact in many areas, e.g. robotics, aeronautics, electrical and aerospace engineering. Indeed, by means of the DPP one can characterize the value function of a fully–nonlinear control problem (including also state/control constraints) as the unique viscosity solution of a nonlinear Hamilton–Jacobi equation, and, even more important, from the solution of this equation one can derive the approximation of a feedback control. This result is the main motivation for the PDE approach to control problems and represents the main advantage over other methods, such as those based on the Pontryagin minimum principle. It is worth to mention that the characterization via the Pontryagin principle gives only necessary conditions for the optimal trajectory and optimal open-loop control. Although from the numerical point of view the control system can be solved via shooting methods for the associated two point boundary value problem, in real applications a good initial guess for the co-state is particularly difficult and often requires a long and tedious trial-and-error procedure to be found. In any case, it can be interesting to obtain a local version of the DP method around a reference trajectory to improve a sub-optimal strategy. The reference trajectory can be obtained via the Pontryagin principle (with open-loop controls), via a Model Predictive Control (MPC) approach (using feedback sub-optimal controls) or simply via the already known engineering experience. The application of DP in an appropriate neighborhood of the reference trajectory will not guarantee the global optimality of the new feedback controls but could improve the result within the given constraints.

*infinite horizon optimal control*, which is associated to the following Hamilton–Jacobi–Bellman equation:

## 2 A Local Version of DP via MPC Models

*infinite horizon problem*. Let the controlled dynamics be given by the solution of the following Cauchy problem:

*f*is Lipschitz continuous with respect to the state variable and continuous with respect to (

*x*,

*u*), the classical assumptions for the existence and uniqueness result for the Cauchy problem (1) are satisfied. To be more precise, the Carathéodory theorem (see [2]) implies that for any given control \(u(\cdot )\in \mathcal {U}\) there exists a unique trajectory \(y(\cdot ; u)\) satisfying (1) almost everywhere. Changing the control policy the trajectory will change and we will have a family of infinitely many solutions of the controlled system (1) parametrized with respect to the control

*u*.

*cost functional*\(J:\mathcal {U}\rightarrow \mathbb {R}\) which will be used to select the

*optimal trajectory*. For the infinite horizon problem the cost functional is

### 2.1 Hamilton–Jacobi–Bellman Equations

The essential features will be briefly sketched, and more details in the framework of viscosity solutions can be found in [2, 4].

*G*have been proposed for (4). To simplify the presentation, let us consider a uniform structured grid with constant space step \(k:=\varDelta x\). We will use a semi-Lagrangian method based on a Discrete Time Dynamic Programming Principle. A first discretization in time of the original control problem [2] leads to a characterization of the corresponding value function \(v^h\) (for the time step \(h:=\varDelta t\)) as

*v*(

*x*) when \((\varDelta t, \varDelta x)\) goes to 0 (precise a-priori-estimates are available, e.g. [3] for more details). This method is referred in the literature as the

*value iteration method*because, starting from an initial guess for the value function, it modifies the values on the grid according to the foot of the characteristics. It is well-known that the convergence of the value iteration can be very slow, since the contraction constant \(e^{-\lambda \varDelta t}\) is close to 1 when \(\varDelta t\) is close to 0. This means that a higher accuracy will also require more iterations. Then, there is a need for an acceleration technique in order to cut the link between accuracy and complexity of the value iteration. One possible choice is the iteration in the policy space or the coupling between value iteration and the policy iteration in [1]. We refer the interested reader to the book [4] for a complete guide on the numerical approximation of the equation and the reference therein. One of the strength of this method is that it provides the feedback control once the value function is computed (and the feedback is computed at every node even in the fixed point iteration). In fact, we can characterize the optimal feedback control everywhere in \(\varOmega \)

*Dv*is an approximation of the value function obtained by the values at the nodes.

### 2.2 Model Predictive Control

Nonlinear model predictive control (NMPC) is an optimization based method for the feedback control of nonlinear systems. It consists on solving iteratively a finite horizon open loop optimal control problem subject to system dynamics and constraints involving states and controls.

*N*is a natural number, \(t_0^N=t_0+N\varDelta t\) is the final time, \(N\varDelta t\) denotes the length of the prediction horizon for the chosen time step \(\varDelta t>0\) and the state

*y*solves \(\dot{y}(t)=f(y(t),u(t)),\; y(t_0)=y_0,\, t\in [t_0,t_0^N)\) and is denoted by \(y(\cdot ,t_0;u(\cdot ))\). We also note that \(y_0=x\) at \(t=0\) as in Eq. (1). The basic idea of NMPC algorithm is summarized at the end of sub-section.

The method works as follows: we store the optimal control on the first subinterval \([t_0,t_0+\varDelta t]\) together with the associated optimal trajectory. Then, we initialize a new finite horizon optimal control problem whose initial condition is given by the optimal trajectory \( y(t)=y(t;t_0,u^N(t))\) at \(t=t_0+\varDelta t\) using the sub-optimal control \(u^N(t)\) for \(t\in (t_0,t_0+\varDelta t]\). We iterate this process by setting \(t_0=t_0+\varDelta t\). Note that (7) is an open loop problem on a finite time horizon \([t_0,t_0+N\varDelta t]\) which can be treated by classical techniques, see e.g. [6]. The interested reader can find in [5] a detailed presentation of the method and a long list of references.

*N*is too short we will lose these properties (see [5] Example 6.26). Estimates on the minimum value for

*N*which ensures asympotitic stability are based on the relaxed dynamic programming principle and can be found in [5] and the references therein. The computation of this minimal horizon is related to a relaxed dynamic programming principle in terms of the value function for the finite horizon problem (7).

### 2.3 Coupling MPC with Bellman Equation

The idea behind the coupling is to combine the advantages from both methods. The Dynamic Programming approach is global and gives information on the value function in a domain, provided we solve the Bellman equation. It gives the feedback synthesis in the whole domain. Model Predictive control is local and gives an approximate feedback control just for a single initial condition. Clearly MPC is faster but does not give the same amount of information.

In many real situations, we need a control to improve the solution around a reference trajectory starting at *x*, \(\overline{y}_x (\cdot )\), so we can reduce the domain to a neighborhood of \(\overline{y}_x(\cdot )\). Now let us assume that we are interested in the approximation of feedbacks for an optimal control problem given the initial condition *x*. First of all we have to select a (possibly small) domain where we are going to compute the approximate value function and to this end we need to compute a first guess that we will use as reference trajectory.

*short*prediction horizon in order to have a fast approximation of the initial guess. This will not give the final feedback synthesis but will be just used to build the domain \(\varOmega _\rho \) where we are going to apply the DP approach. It is clear that MPC may provide inaccurate solutions if

*N*is too short but its rough information about the trajectory \(y^{MPC}\) will be later compensated by the knowledge of the value function obtained by solving the Bellman equation. We construct \(\varOmega _\rho \) as a tube around \(y^{MPC}\) defining

## 3 Numerical Tests

In this section we present two numerical tests for the infinite horizon problem to illustrate the performances of the proposed algorithm. However, the localization procedure can be applied to more general optimal control problems.

All the numerical simulations have been made on a MacBook Pro with 1 CPU Intel Core i5 2.4 GHz and 8 GB RAM. The codes used for the simulations are written in Matlab. The routine for the approximation of MPC is provided in [5].

*Test 1: 2D Linear Dynamics.*Let us consider the following controlled dynamics:

*x*is the initial condition. The cost functional we want to minimize is:

*P*and

*Q*. We set \(P:=(0,0)\) and \(Q:=(2,2)\) so that the value of the running cost is 0 at

*P*and \(-2\) at

*Q*. Note that we have included a discount factor \(\lambda \), which guarantees the integrability of the cost functional \(J_x(u)\) and the existence and uniqueness of the viscosity solution. The main task of the discount factor is to penalize long prediction horizons. Since we want to make a comparison we introduce it also in the setting of MPC, although this is not a standard choice. As we mentioned, MPC will just provide a first guess which is used to define the domain where we are solving the HJB equation.

In this test the chosen parameters are: \(u\in [-1,1]^2,\) \(\rho =0.2\), \(\varOmega =[-4,6]^2\), \(\varDelta t_{MPC}=0.05=\varDelta t_{HJB}\), \(\varDelta x_{HJB}=0.025\), \(\varDelta \tau =0.01\) (the time step to integrate the trajectories). In particular, we focus on \(\lambda =0.1\) and \(\lambda =1.\) The number of controls are \(21^2\) for the value function and \(3^2\) for the trajectories. Note that the time step used in the HJB approach for the approximation of the trajectory (\(\varDelta \tau \)) is smaller than the one used for MPC: this is because with MPC we want to have a rough and quick approximation of the solution. In Fig. 1, we show the results of MPC with \(\lambda =0.1\) on the left and \(\lambda =1\) on the right. As one can see, none of them is an accurate solution. In the first case, the solution goes to the local minimum (0, 0) and is trapped there, whereas when we increase \(\lambda \) the optimal solution does not stop at the global minimum \(y_2\). On the other hand these two approximations help us to localize the behavior of the optimal solution in order to apply the Bellman equation in a reference domain \(\varOmega _\rho \).

*Test 2: Van der Pol Dynamics.*In this test we consider the two-dimensional nonlinear system dynamics given by the Van Der Pol oscillator:

A comparison of CPU time (seconds) and values of the cost functional.

\(\lambda =1\) | MPC N = 5 | HJB in \({\varOmega _\rho }\) | HJB in \({\varOmega }\) |
---|---|---|---|

CPU | 16 s | 239 s | 638 s |

\(J_x(u)\) | 5.41 | 5.33 | 5.3 |

*u*is:

Test 2: a comparison of CPU time (seconds) and values of the cost functional for \(\lambda =\{0.1, 1\}\).

\(\lambda =0.1\) | MPC N = 10 | HJB in \({\varOmega _\rho }\) | HJB in \({\varOmega }\) |
---|---|---|---|

CPU | 79 s | 155 s | 228 s |

\(J_x(u)\) | 14.31 | 13.13 | 12.41 |

\(\lambda =1\) | MPC N = 10 | HJB in \({\varOmega _\rho }\) | HJB in \({\varOmega }\) |
---|---|---|---|

CPU | 23 s | 49 s | 63 s |

\(J_x(u)\) | 6.45 | 6.09 | 6.07 |

## 4 Conclusions

We have proposed a local version of the dynamic programming approach for the solution of the infinite horizon problem showing that the coupling between MPC and DP methods can produce rather accurate results. The coupling improves the original guess obtained by the MPC method and allows to save memory allocations and CPU time with respect to the global solution computed via Hamilton-Jacobi equations. An extension of this approach to other classical control problems and more technical details on the choice of the parameters \(\lambda \) and \(\rho \) will be given in a future paper.

## References

- 1.Alla, A., Falcone, M., Kalise, D.: An efficient policy iteration algorithm for dynamic programming equations. SIAM J. Sci. Comput.
**37**(1), 181–200 (2015)MathSciNetCrossRefzbMATHGoogle Scholar - 2.Bardi, M., Capuzzo Dolcetta, I.: Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Birkhäuser, Basel (1997)CrossRefzbMATHGoogle Scholar
- 3.Falcone, M.: Numerical solution of dynamic programming equations. Appendix A in the volume. In: Bardi, M., Capuzzo Dolcetta, I. (eds.) Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, pp. 471–504. Birkhäuser, Boston (1997)Google Scholar
- 4.Falcone, M., Ferretti, R.: Semi-Lagrangian Approximation Schemes for Linear and Hamilton-Jacobi Equations. SIAM (2014)Google Scholar
- 5.Grüne, L., Pannek, J.: Nonlinear Model Predictive Control. Springer, London (2011)CrossRefzbMATHGoogle Scholar
- 6.Nocedal, J., Wright, S.J.: Numerical Optimization. Operation Research and Financial Engineering, 2nd edn. Springer, New York (2006)zbMATHGoogle Scholar
- 7.Rawlings, J.B., Mayne, D.Q.: Model Predictive Control: Theory and Design. Nob Hill Publishing, LLC, Madison (2009)Google Scholar
- 8.Sethian, J.A.: Level Set Methods and Fast Marching Methods. Cambridge University Press, Cambridge (1999)zbMATHGoogle Scholar
- 9.Zhao, H.: A fast sweeping method for Eikonal equations. Math. Comput.
**74**, 603–627 (2005)MathSciNetCrossRefzbMATHGoogle Scholar