Encyclopedia of Systems and Control

Living Edition
| Editors: John Baillieul, Tariq Samad

Adaptive Horizon Model Predictive Control and Al’brekht’s Method

  • Arthur J KrenerEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4471-5102-9_100071-1
  • 252 Downloads

Abstract

A standard way of finding a feedback law that stabilizes a control system to an operating point is to recast the problem as an infinite horizon optimal control problem. If the optimal cost and the optimal feedback can be found on a large domain around the operating point, then a Lyapunov argument can be used to verify the asymptotic stability of the closed loop dynamics. The problem with this approach is that it is usually very difficult to find the optimal cost and the optimal feedback on a large domain for nonlinear problems with or even without constraints, hence the increasing interest in Model Predictive Control (MPC). In standard MPC a finite horizon optimal control problem is solved in real time, but just at the current state, the first control action is implemented, the system evolves one time step, and the process is repeated. A terminal cost and terminal feedback found by Al’brekht’s method and defined in a neighborhood of the operating point can be used to shorten the horizon and thereby make the nonlinear programs easier to solve because they have less decision variables. Adaptive Horizon Model Predictive Control (AHMPC) is a scheme for varying the horizon length of Model Predictive Control as needed. Its goal is to achieve stabilization with horizons as small as possible so that MPC methods can be used on faster and/or more complicated dynamic processes.

Keywords

Adaptive horizon Model predictive control Al’brekht’s method Optimal stabilization 

Introduction

Model Predictive Control (MPC) is a way to steer a discrete time control system to a desired operating point. We will present an extension of MPC that we call Adaptive Horizon Model Predictive Control (AHMPC) which adjusts the length of the horizon in MPC while nearly verifying in real time that stabilization is occurring for a nonlinear system.

We are not the first to consider adaptively changing the horizon length in MPC; see Michalska and Mayne (1993), Polak and Yang (1993a, b, c). In these papers the horizon is changed so that a terminal constraint is satisfied by the predicted state at the end of horizon. In Giselsson (2010) the horizon length is adaptively changed to ensure that the infinite horizon cost of using the finite horizon MPC scheme is not much more than the cost of the corresponding infinite horizon optimal control problem.

Adaptive horizon tracking is discussed in Page et al. (2006) and Droge and Egerstedt (2011). In Kim and Sugie (2008) an adaptive parameter estimation algorithm suitable for MPC was proposed, which uses the available input and output signals to estimate the unknown system parameters. In Gruene et al. (2010) a detailed analysis of the impact of the optimization horizon and the time varying control horizon on stability and performance of the closed loop is given.

Review of Model Predictive Control

We briefly describe MPC following the definitive treatise of Rawlings and Mayne (2009). We largely follow their notation.

We are given a controlled, nonlinear dynamics in discrete time
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} x^+&\displaystyle =&\displaystyle f(x,u) \end{array} \end{aligned} $$
(1)
where the state \(x\in \mathbb {R}^{n\times 1}\), the control \(u\in \mathbb {R}^{m\times 1}\), and x+(k) = x(k + 1). Typically this a time discretization of a controlled, nonlinear dynamics in continuous time. The goal is to find a feedback law u(k) = κ(x(k)) that drives the state of the system to some desired operating point. A pair (xe, ue) is an operating point if f(xe, ue) = xe. We conveniently assume that, after state and control coordinate translations, the operating point of interest is (xe, ue) = (0, 0).
The controlled dynamics may be subject to constraints such as
$$\displaystyle \begin{aligned} x&\in \mathbb{X}\subset \mathbb{R}^{n\times 1} \end{aligned} $$
(2)
$$\displaystyle \begin{aligned} u&\in \mathbb{U}\subset \mathbb{R}^{m\times 1} {} \end{aligned} $$
(3)
and possibly constraints involving both the state and control
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} y=h(x,u)&\displaystyle \in&\displaystyle \mathbb{Y}\subset \mathbb{R}^{p\times 1} \end{array} \end{aligned} $$
(4)
A control u is said to be feasible at \(x\in \mathbb {X}\) if
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} u\in \mathbb{U},&\displaystyle f(x,u) \in\mathbb{X},&\displaystyle h(x,u)\in\mathbb{Y}{} \end{array} \end{aligned} $$
(5)
Of course the stabilizing feedback κ(x) that we seek needs to be feasible, that is, for every \(x\in \mathbb {X}\),
$$\displaystyle \begin{aligned} \begin{array}{rcl} \kappa(x)\in \mathbb{U},&\displaystyle f(x,\kappa(x)) \in \mathbb{X},&\displaystyle h(x,\kappa(x)) \in \mathbb{Y} \end{array} \end{aligned} $$
An ideal way to find a stabilizing feedback is to choose a Lagrangian l(x, u) (aka running cost), that is, nonnegative definite in x, u and positive definite in u, and then to solve the infinite horizon optimal control problem of minimizing the quantity
$$\displaystyle \begin{aligned}\sum_{k=0}^\infty l(x(k),u(k)) \end{aligned}$$
over all choices of infinite control sequences u = (u(0), u(1), …) subject to the dynamics (1), the constraints (2, 3, 4), and the initial condition x(0) = x0. Assuming the minimum exists for each \(x^0\in \mathbb {X}\), we define the optimal cost function
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} V (x^0)=\min_{\mathbf{u}} \sum_{k=0}^\infty l(x(k),u(k)) \end{array} \end{aligned} $$
(6)
Let u = (u(0), u(1), …) be a minimizing control sequence with corresponding state sequence x = (x(0) = x0, x(1), …). Minimizing control and state sequences need not be unique, but we shall generally ignore this problem because we are using optimization as a path to stabilization. The key question is whether the possibly nonunique solution is stabilizing to the desired operating point. As we shall see, AHMPC nearly verifies stabilization in real time.
If a pair \(V (x)\in \mathbb {R}, \kappa (x)\in \mathbb {R}^{m\times 1}\) of functions satisfy the infinite horizon Bellman Dynamic Program Equations (BDP)
$$\displaystyle \begin{aligned}\begin{array}{rllllllllllllllllllllllllllllllll} {} V (x)&=&\mbox{min}_u \left\{V (f(x,u))+ l(x,u)\right\}\\ \kappa (x)&=& \mbox{argmin}_u \left\{ V (f(x,u))+l(x,u)\right\} \\ \end{array}\end{aligned} $$
(7)
and the constraints
$$\displaystyle \begin{aligned} \begin{array}{rcl} \kappa (x){\in} \mathbb{U},&\displaystyle f(x,\kappa (x)){\in}\mathbb{X},&\displaystyle h(x,\kappa (x)){\in} \mathbb{Y} {} \end{array} \end{aligned} $$
(8)
for all \(x\in \mathbb {X}\), then it is not hard to show that V (x) is the optimal cost and κ(x) is an optimal feedback on \( \mathbb {X}\). Under suitable conditions a Lyapunov argument can be used to show that the feedback κ(x) is stabilizing.

The difficulty with this approach is that it is generally impossible to solve the BDP equations on a large domain \(\mathbb {X}\) if the state dimension n is greater than 2 or 3. So both theorists and practitioners have turned to Model Predictive Control (MPC).

In MPC one chooses a Lagrangian l(x, u), a horizon length N, a terminal domain \(\mathbb {X}_f\subset \mathbb {X}\) containing x = 0, and a terminal cost Vf(x) defined and positive definite on \(\mathbb {X}_f\). Then one considers the problem of minimizing
$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{k=0}^{N-1} l(x(k),u(k)) + V_f(x(N)) \end{array} \end{aligned} $$
by choice of feasible
$$\displaystyle \begin{aligned} \begin{array}{rcl}{\mathbf{u}}_N=(u_N(0),u_N(1),\ldots,u_N(N-1)) \end{array} \end{aligned} $$
subject to the dynamics (1), the constraints (8), the final condition \(x(N)\in \mathbb {X}_f\), and the initial condition x(0) = x0. Assuming this problem is solvable, let VN(x0) denote the optimal cost
$$\displaystyle \begin{aligned} V_N(x^0)=\min_{{\mathbf{u}}_N} \sum_{k=0}^{N-1} l(x(k),u(k)) + V_f(x(N)) \end{aligned} $$
(9)
and let
$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{u}}_N^*&\displaystyle =&\displaystyle (u_N^*(0),u_N^*(1),\ldots,u_N^*(N-1))\\ {\mathbf{x}}_N^*&\displaystyle =&\displaystyle (x_N^*(0)=x^0,x_N^*(1),\ldots,x_N^*(N)) \end{array} \end{aligned} $$
denote optimal control and state sequences starting from x0 when the horizon length is N. We then define the MPC feedback law \( \kappa _N(x^0)= u_N^*(0) \).

The terminal set \(\mathbb {X}_f\) is controlled invariant (aka viable) if for each \(x\in \mathbb {X}_f\) there exists a \(u\in \mathbb {U}\) such that \(f(x,u)\in \mathbb {X}_f\) and \(h(x,u) \in \mathbb {Y}\) are satisfied. If Vf(x) is a control Lyapunov function on \(\mathbb {X}_f\), then, under suitable conditions, a Lyapunov argument can be used to show that the feedback κN(x) is stabilizing on \(\mathbb {X}_f\). See Rawlings and Mayne (2009) for more details.

AHMPC requires a little more, the existence of terminal feedback u = κf(x) defined on the terminal set \(\mathbb {X}_f\) that leaves it positively invariant, if \(x\in \mathbb {X}_f\) then \(f(x,\kappa _f(x)) \in \mathbb {X}_f\), and which makes Vf(x) a strict Lyapunov function on \(\mathbb {X}_f\) for the closed loop dynamics, if \(x \in \mathbb {X}_f\) and x≠0 then
$$\displaystyle \begin{aligned} \begin{array}{rcl} V_f(x)>V_f(f(x,\kappa_f(x)))\ge 0 \end{array} \end{aligned} $$

If \(x\in \mathbb {X}_f\) then AHMPC does not need to solve (9) to get u, it just takes u = κf(x). A similar scheme has been called dual mode control in Michalska and Mayne (1993).

The advantage of solving the finite horizon optimal control problem (9) over solving the infinite horizon problem (6) is that it may be possible to solve the former online as the process evolves. If it is known that the terminal set \(\mathbb {X}_f\) can be reached from the current state x in N or fewer steps, then the finite horizon N optimal control problem is a feasible nonlinear program with finite dimensional decision variable \({\mathbf {u}}_N\in \mathbb {R}^{m\times N}\). If the time step is long enough, if f, h, l are reasonably simple, and if N is small enough, then this nonlinear program possibly can be solved in a fraction of one time step for \({\mathbf {u}}^*_N\). Then the first element of this sequence \(u_N^*(0)\) is used as the control at the current time. The system evolves one time step, and the process is repeated at the next time. Conceptually MPC computes a feedback law \(\kappa _N(x) =u^*_N(0)\) but only at values of x when and where it is needed.

Some authors like Grimm et al. (2005) and Gruene (2012) do away with the terminal cost Vf(x), but there is a theoretical reason and a practical reason to use one. The theoretical reason is that a control Lyapunov terminal cost facilitates a proof of asymptotic stability via a simple Lyapunov argument; see Rawlings and Mayne (2009). But this is not a binding reason because under suitable assumptions, asymptotic stability can be shown even when there is no terminal cost provided the horizon is sufficiently long. The practical reason is more important; when there is a terminal cost one can usually use a shorter horizon N. A shorter horizon reduces the dimension mN of the decision variables in the nonlinear programs that need to be solved online. Therefore MPC with a suitable terminal cost can be used for faster and more complicated systems.

An ideal terminal cost Vf(x) is V (x) of the corresponding infinite horizon optimal control provided that the latter can be accurately computed off-line on a reasonably large terminal set \(\mathbb {X}_f\). For then the infinite horizon cost (6) and (9) will be the same. One should not make too much of this fact as stabilization is our goal; the optimal control problems are just a means to accomplish this. This in contrast to Economic MPC where the cost and the associated Lagrangian are chosen to model real-world costs.

Adaptive Horizon Model Predictive Control

For AHMPC we assume that we have the following;
  • A discrete time dynamics f(x, u) with operating point x = 0, u = 0.

  • A Lagrangian l(x, u), nonnegative definite in (x, u) and positive definite in u.

  • State constraints \(x\in \mathbb {X}\) where \(\mathbb {X}\) is a neighborhood of x = 0.

  • Control constraints \(u\in \mathbb {U}\) where \(\mathbb {U}\) is a neighborhood of u = 0.

  • Mixed constraints \( h(x,u)\in \mathbb {Y}\) which are not active at the operating point x = 0, u = 0.

  • The dynamics is recursively feasible on \( \mathbb {X}\), that is, for every \(x\in \mathbb {X}\) there is a \(u\in \mathbb {U}\) satisfying \( h(x,u)\in \mathbb {Y}\) and \(f(x,u)\in \mathbb {X}\)

  • A terminal cost Vf(x) defined and nonnegative definite on some neighborhood \(\mathbb {X}_f\) of the operating point x = 0, u = 0. The neighborhood \(\mathbb {X}_f\) need not be known explicitly.

  • A terminal feedback u = κf(x) defined on \(\mathbb {X}_f\) such that the terminal cost is a valid Lyapunov function on \(\mathbb {X}_f\) for the closed loop dynamics using the terminal feedback u = κf(x).

One way of obtaining a terminal pair Vf(x), κf(x) is to approximately solve the infinite horizon dynamic program equations (BDP) on some neighborhood of the origin. For example, if the linear part of the dynamics and the quadratic part of the Lagrangian constitute a linear quadratic regulator (LQR) problem satisfying the standard conditions, then one can let Vf(x) be the quadratic optimal cost and κf(x) be the linear optimal feedback of this LQR problem. Of course the problem with such terminal pairs Vf(x), κf(x) is that generally there is no way to estimate the terminal set \(\mathbb {X}_f\) on which the feasibility and Lyapunov conditions are satisfied. It is reasonable to expect that they are satisfied on some terminal set but the extent of this terminal set is difficult to estimate.

In the next section we show how higher degree Taylor polynomials for the optimal cost and optimal feedback can be computed by the discrete time extension (Aguilar and Krener 2012) of Al’brekht’s method (Al’brecht 1961) because this can lead to a larger terminal set \(\mathbb {X}_f\) on which the feasibility and the Lyapunov conditions are satisfied. It would be very difficult to determine what this terminal set is, but fortunately in AHMPC we do not need to do this.

AHMPC mitigates this last difficulty just as MPC mitigates the problem of solving the infinite horizon Bellman Dynamic Programming equations BDP (7). MPC does not try to compute the optimal cost and optimal feedback everywhere; instead it computes them just when and where they are needed. AHMPC does not try to compute the set \(\mathbb {X}_f\) on which κf(x) is feasible and stabilizing; it just tries to determine if the end state \(x_N^*(N)\) of the currently computed optimal trajectory is in a terminal set \(\mathbb {X}_f\) where the feasibility and Lyapunov conditions are satisfied.

Suppose the current state is x and we have solved the horizon N optimal control problem for \( {\mathbf {u}}^*_N=(u^*_N(0),\ldots , u^*_N(N-1))\), \({\mathbf {x}}^*_N=(x^*_N(0)=x,\ldots , x^*_N(N))\) The terminal feedback u = κf(x) is used to compute M additional steps of this state trajectory
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} x^*_N( k+1)&\displaystyle =&\displaystyle f(x^*_N(k),\kappa_f(x^*_N(k)) \end{array} \end{aligned} $$
(10)
for k = N, …, N + M − 1.
Then one checks that the feasibility and Lyapunov conditions hold for the extended part of the state sequence,
$$\displaystyle \begin{aligned} &\kappa_f(x^*_N(k))\in \mathbb{U} \end{aligned} $$
(11)
$$\displaystyle \begin{aligned} &f(x^*_N(k),\kappa_f(x^*_N(k)))\in \mathbb{X} {} \end{aligned} $$
(12)
$$\displaystyle \begin{aligned} &h(x^*_N(k),\kappa_f(x^*_N(k)))\in \mathbb{Y} {} \end{aligned} $$
(13)
$$\displaystyle \begin{aligned} &V_f (x^*_N(k))\ge \alpha(|x^*_N(k)|)\qquad \end{aligned} $$
(14)
$$\displaystyle \begin{aligned} &V_f (x^*_N(k))-V_f (x^*_N(k+1)){}\\ {} &\quad \ge \alpha(|x^*_N(k)|)\qquad {} \end{aligned} $$
(15)
for k = N, …, N + M − 1 and some Class K function α(s). For more on Class K functions, we refer the reader to Khalil (1996).

If (1115) hold for all for k = N, …, N + M − 1, then we presume that \(x^*_N(N)\in \mathbb {X}_f\), a set where κf(x) is stabilizing and we use the control \(u_N^*(0)\) to move one time step forward to \(x^+=f(x, u^*_N(0))\). At this next state x+, we solve the horizon N − 1 optimal control problem and check that the extension of the new optimal trajectory satisfies (1115).

If (1115) does not hold for all for k = N, …, N + M − 1, then we presume that \(x^*_N(N)\notin \mathbb {X}_f\). We extend current horizon N to N + L where L ≥ 1 , and if time permits, we solve the horizon N + L optimal control problem at the current state x and then check the feasibility and Lyapunov conditions again. We keep increasing N by L until these conditions are satisfied on the extension of the trajectory. If we run out of time before they are satisfied, then we use the last computed \(u_N^*(0)\) and move one time step forward to \(x^+=f(x, u^*_N(0))\). At x+ we solve the horizon N + L optimal control problem.

How does one choose the extended horizon M and the class K function α(⋅)? If the extended part of the state sequence is actually in the region where the terminal cost Vf(x) and the terminal feedback κf(x) well approximate the solution to infinite horizon optimal control problem, then the dynamic programing equations (7) should approximately hold. In other words
$$\displaystyle \begin{aligned} &V_f (x^*_N(k))-V_f (x^*_N(k+1))\\ &\quad \approx l(x^*_N(k), \kappa_f(x^*_N(k))\ \ge \ 0 \end{aligned} $$
If this does not hold throughout the extended trajectory, we should increase the horizon N. We can also increase the extended horizon M, but this may not be necessary. If the Lyapunov and feasibility conditions are going to fail somewhere on the extension, it is most likely this will happen at the beginning of the extension. Also we should choose α(⋅) so that α(|x|) < |l(x, κf(x))|∕2.

The nonlinear programming problems generated by employing MPC on a nonlinear system are generally nonconvex so the solver might return local rather than global minimizers. In which case there is no guarantee that an MPC approach is actually stabilizing. AHMPC mitigates this difficulty by nearly checking that stabilization is occurring. If (1115) don’t hold even after the horizon N has been increased substantially, then this is a strong indication that the solver is returning locally rather than globally minimizing solutions, and these local solutions are not stabilizing. To change this behavior one needs to start the solver at a substantially different initial guess. Just how one does this is an open research question. It is essentially the same question as which initial guess should one pass to the solver at the first step of MPC.

The actual computation of the M additional steps (10) can be done very quickly because the closed loop dynamics function f(x, κf(x)) can be computed and compiled beforehand. Similarly the feasibility and Lyapunov conditions (1115) can be computed and compiled beforehand. The number M of additional time steps is a design parameter. One choice is to take M a positive integer; another choice is a positive integer plus a fraction of the current N.

Choosing a Terminal Cost and a Terminal Feedback

A standard way of obtaining a terminal cost Vf(x) and a terminal feedback κf(x) is to solve the linear quadratic regulator (LQR) using the quadratic part of the Lagrangian and the linear part of dynamics around the operating point (xe, ue) = (0, 0). Suppose
$$\displaystyle \begin{aligned} \begin{array}{rcl} l(x,u)&\displaystyle =&\displaystyle {1\over 2} \left(x'Qx{+}2x'Su{+}u'Ru\right) {+}O(x,u)^3\\ f(x,u)&\displaystyle =&\displaystyle Fx+Gu+O(x,u)^2 \end{array} \end{aligned} $$
Then the LQR problem is to find P, K such that
$$\displaystyle \begin{aligned} P &=F^{\prime}P F-\left(F'P G+S\right){}\\ &\quad \left(R+G'P G\right)^{-1}\left(G'P F+S'\right)+Q{} \end{aligned} $$
(16)
$$\displaystyle \begin{aligned} K &= -\left(R+G'P G\right)^{-1}\left(G'P F+S'\right) {}\qquad \end{aligned} $$
(17)
Under mild assumptions, the stabilizability of F, G, the detectability of Q1∕2, F, the nonnegative definiteness of [Q, S;S′, R], and the positive definiteness of R, there exist a unique nonnegative definite P satisfying the first equation (16) which is called the discrete time algebraic Riccati equation (DARE). Then K given by (17) puts all the poles of the closed loop linear dynamics
$$\displaystyle \begin{aligned} \begin{array}{rcl} x^+&\displaystyle \left(F+GK\right)x \end{array} \end{aligned} $$
inside of the open unit disk. See Antsaklis and Michel (1997) for details. But l(x, u) is a design parameter so we choose [Q, S;S′, R] to be positive definite and then P will be positive definite.
If we define the terminal cost to be \(V_f (x)={1\over 2} x'P x \), then we know that it is positive definite for all \(x\in \mathbb {R}^{n\times 1}\). If we define the terminal feedback to be κf(x) = Kx, then we know by a Lyapunov argument that the nonlinear closed loop dynamics
$$\displaystyle \begin{aligned} \begin{array}{rcl} x^+&\displaystyle =&\displaystyle f(x,\kappa(x)) \end{array} \end{aligned} $$
is locally asymptotically stable around xe = 0. The problem is that we don’t know what is the neighborhood \(\mathbb {X}_f\) of asymptotic stability and computing it off-line can be very difficult in state dimensions higher than two or three.

There are other possible choices for the terminal cost and terminal feedback. Al’brecht (1961) showed how the Taylor polynomials of the optimal cost and the optimal feedback could be computed for some smooth, infinite horizon optimal control problems in continuous time. Aguilar and Krener (2012) extended this to some smooth, infinite horizon optimal control problems in discrete time. The discrete time Taylor polynomials of the optimal cost and the optimal feedback may be used as the terminal cost and terminal feedback in an AHMPC scheme, so we briefly review (Aguilar and Krener 2012).

Since we assumed that f(x, u) and l(x, u) are smooth and the constraints are not active at the origin, we can simplify the BDP equations. The simplified Bellman Dynamic Programming equations (sBDP) are obtained by setting the derivative with respect to u of the quantity to be minimized in (7) to zero. The result is
$$\displaystyle \begin{aligned} V (x)&=V (f(x,\kappa (x))+ l(x,\kappa (x)) \end{aligned} $$
(18)
$$\displaystyle \begin{aligned} 0&= \frac{\partial V }{ \partial x}(f(x,u))\frac{\partial f}{\partial u}(x, \kappa (x) ) {+}\frac{\partial l}{\partial u}(x,\kappa (x)) {} \end{aligned} $$
(19)
If the quantity to be minimized is strictly convex in u, then the BDP equations and sBDP equations are equivalent. But if not then BDP implies sBDP but not vice versa.
Suppose the discrete time dynamics and Lagrangian have Taylor polynomials around the operating point x = 0, u = 0 of the form
$$\displaystyle \begin{aligned} f(x,u)&= Fx+Gu+f^{[2]}(x,u){}\\ &\quad +\cdots+f^{[d]}(x,u)+O(x,u)^{d+1}\\ l(x,u)&={1\over 2} \left(x'Qx{+}2x'Su+u'Ru\right){+}l^{[3]}(x,u)\\ &\quad +\cdots+ l^{[d+1]}(x,u)+O(x,u)^{d+2} \end{aligned} $$
for some integer d ≥ 1 where [j] indicates the homogeneous polynomial terms of degree j.
Also suppose the infinite horizon optimal cost and optimal feedback have similar Taylor polynomials
$$\displaystyle \begin{aligned} V (x)&={1\over 2} x'P x+V ^{[3]}(x)+\cdots\\ &\quad + V ^{[d+1]}(x)+O(x,u)^{d+2}\\ \kappa (x)&= K x+\kappa ^{[2]}(x,u)+\cdots\\ &\quad +\kappa ^{[d]}(x,u)+O(x,u)^{d+1} \end{aligned} $$

We plug these polynomials into sBDP and collect terms of lowest degree. The lowest degree in (18) is two while in (19) it is one. The result is the discrete time Riccati equations (16, 17).

At the next degrees we obtain the equations
$$\displaystyle \begin{aligned} &V^{[3]}(x) - V^{[3]}((F+GK)x)\\ &\qquad = ((F+GK)x)'Pf^{[2]}(x,Kx)\\ &\qquad \quad +l^{[3]}(x,Kx){} \end{aligned} $$
(20)
$$\displaystyle \begin{aligned} &\left(\kappa^{[2]}(x)\right)'\left(R+G'PG\right)\\ &\qquad = -\frac{\partial V^{[3]}}{\partial x}((F+GK)x)G\\ &\qquad \quad -((F+GK)x)'P\frac{\partial f^{[2]}}{\partial u}(x,Kx)\\ &\qquad \quad -\frac{\partial l^{[3]}}{\partial u}(x,Kx) {} \end{aligned} $$
(21)
Notice these are linear equations in the unknowns V[3](x) and κ[2](x) and the right sides of these equations involve only known quantities. Moreover κ[2](x) does not appear in the first equation. The mapping
$$\displaystyle \begin{aligned} V^{[3]}(x) \mapsto V^{[3]}(x) - V^{[3]}((F+GK)x) \end{aligned} $$
(22)
is a linear operator on the space of polynomials of degree three in x. Its eigenvalues are of the form of 1 minus the products of three eigenvalues of F + GK. Since the eigenvalues of F + GK are all inside the open unit disk, zero is not an eigenvalue of the operator (22), so it is invertible. Having solved (20) for V[3](x), we can readily solve (21) for κ[2](x) since we have assumed R is positive definite
The higher degree equations are similar, at degrees d + 1, d they take the form
$$\displaystyle \begin{aligned} & V^{[j+1]}(x) - V^{[j+1]}((F+GK)x)\\ &\quad = \mbox{ Known Quantities}\\ &\left(\kappa^{[j]}(x)\right)'\left(R+G'PG\right)\\ &\quad =\mbox{ Known Quantities} \end{aligned} $$
The “Known Quantities” involve the terms of the Taylor polynomials of f, l and previously computed V[i+1], κ[i] for 1 ≤ i < j. Again the equations are linear in the unknowns V[k+1], κ[k], and the first equation does not involve κ[k]. The eigenvalues of the linear operator
$$\displaystyle \begin{aligned} \begin{array}{rcl} V^{[k+1]}(x) &\displaystyle \mapsto&\displaystyle V^{[k+1]}(x) - V^{[k+1]}((F+GK)x) \end{array} \end{aligned} $$
are of the form of 1 minus the products of k + 1 eigenvalues of F + GK.

We have written MATLAB code to solve these equations to any degree and in any dimensions. The code is quite fast. Later we shall present an example where n = 4 and m = 2. The code found the Taylor polynomials of the optimal cost to degree 6 and the optimal feedback to degree 5 in 0.12 s. on a laptop using 3.1 GHz Intel Core i5.

Completing the Squares

The infinite horizon optimal cost is certainly nonnegative definite, and if we choose Q > 0 then it is positive definite. That implies that its quadratic part \(V^{[2]}(x)={1\over 2}x'Px\) is positive definite.

But its Taylor polynomial of degree d + 1,
$$\displaystyle \begin{aligned} \begin{array}{rcl} V^{[2]}(x)+V^{[3]}(x)+\cdots+V^{[d+1]}(x) \end{array} \end{aligned} $$
need not be positive definite for d > 1. This can lead to problems if we define this Taylor polynomial to be our terminal cost Vf(x) because then the nonlinear program solver might return a negative cost VN(x) (9). The way around this difficulty is to “complete the squares.”

Theorem

Suppose a polynomial V (x) is of degrees two through d + 1 in n variables x1, …, xn. If the quadratic part of V (x) is positive definite, then there exist a nonnegative definite polynomial W(x) of degrees two through 2d such that the part of W(x) that is of degrees two through d + 1 equals V (x). Moreover, we know that W(x) is nonnegative definite because it is the sum of n squares.

Proof

We start with the quadratic part of V (x), because it is positive definite it must be of the form \({1\over 2}x'Px\) where P is a positive definite n × n matrix. We know that there is an orthogonal matrix T that diagonalizes P
$$\displaystyle \begin{aligned} \begin{array}{rcl} T'PT=\left[ \begin{array}{ccccccccc} \lambda_1&\displaystyle &\displaystyle 0\\&\displaystyle \ddots &\displaystyle \\0&\displaystyle &\displaystyle \lambda_n \end{array}\right] \end{array} \end{aligned} $$
where λ1 ≥ λ2 ≥… ≥ λn > 0. We make the linear change of coordinates x = Tz. We shall show that V (z) = V (Tz) can be extended with higher degree terms to a polynomial W(z) of degrees two through 2d which is a sum of n squares. We do this degree by degree. We have already showed that the degree two part of V (z) is a sum of n squares
$$\displaystyle \begin{aligned} \begin{array}{rcl} {1\over 2} \sum_{i+1}^n \lambda_i z_i^2&\displaystyle =&\displaystyle {1\over 2}\left( \lambda_1 z_1^2+\cdots \lambda_n z_n^2\right) \end{array} \end{aligned} $$
The degrees two and three parts of V (z) are of the form
$$\displaystyle \begin{aligned} \begin{array}{rcl} {1\over 2} \sum_{i+1}^n \lambda_i z_i^2+\sum_{i_1=1}^n\sum_{i_2=i_1}^n\sum_{i_3=i_2}^n \gamma_{i_1,i_2,i_3}z_{i_1}z_{i_2}z_{i_3} \end{array} \end{aligned} $$
Consider the expression
$$\displaystyle \begin{aligned} \begin{array}{rcl} \delta_1(z)&\displaystyle =&\displaystyle z_1 + \sum_{i_2=1}^n\sum_{i_3=i_2}^n \delta_{1,i_2,i_3}z_{i_2}z_{i_3} \end{array} \end{aligned} $$
then
$$\displaystyle \begin{aligned} \begin{array}{rcl} {\lambda_1\over 2} (\delta_1(z))^2&\displaystyle =&\displaystyle {\lambda_1\over 2}\left(z_1 + \sum_{i_2=1}^n\sum_{i_3=i_2}^n \delta_{1,i_2,i_3}z_{i_2}z_{i_3}\right)^2\\ &\displaystyle =&\displaystyle {\lambda_1 \over 2} \left( z_1^2 +2 \sum_{i_2=1}^n\sum_{i_3=i_2}^n \delta_{1,i_2,i_3}z_1z_{i_2}z_{i_3}+{1\over 2}( \sum_{i_2=1}^n\sum_{i_3=i_2}^n \delta_{1,i_2,i_3}z_{i_2}z_{i_3} )^2\right) \end{array} \end{aligned} $$
Let \(\delta _{1,i_2,i_3}= {\gamma _{1,i_2.i_3}\over \lambda _1}\) then the degrees two and three parts of
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} V(z)-{\lambda_1\over 2}(\delta_1(z))^2 \end{array} \end{aligned} $$
have no terms involving z1.
Next consider the expression
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta_2(z)&\displaystyle =&\displaystyle z_2 + \sum_{i_2=2}^n\sum_{i_3=i_2}^n \delta_{2,i_2,i_3}z_{i_2}z_{i_3} \end{array} \end{aligned} $$
then
$$\displaystyle \begin{aligned} &{\lambda_2\over 2} (\delta_2(z))^2\\ &\quad = {\lambda_2 \over 2} \left( z_2^2 +2 \sum_{i_2=2}^n\sum_{i_3=i_2}^n \delta_{2,i_2,i_3}z_2z_{i_2}z_{i_3}\right.\\ &\quad \left.+{1\over 2}( \sum_{i_2=2}^n\sum_{i_3=i_2}^n \delta_{2,i_2,i_3}z_{i_2}z_{i_3} )^2\right) \end{aligned} $$
Let \(\delta _{2,i_2,i_3}= {\gamma _{2,i_2,i_3}\over \lambda _2}\) then the degrees two and three parts of
$$\displaystyle \begin{aligned} \begin{array}{rcl} V(z)-{\lambda_1\over 2}(\delta_1(z))^2-{\lambda_2\over 2}(\delta_2(z))^2 \end{array} \end{aligned} $$
have no terms involving either z1 or z2.
We continue on in this fashion defining δ3(z), …, δn(z) such that
$$\displaystyle \begin{aligned} \begin{array}{rcl} V(z)-\sum_{i=1}^n{\lambda_i\over 2}(\delta_i(z))^2&\displaystyle =&\displaystyle \sum_{i_1=1}^n\sum_{i_2=i_1}^n\sum_{i_3=i_2}^n \sum_{i_4=i_3}^n \gamma_{2,i_2,i_3,i_4}z_{i_1}z_{i_2}z_{i_3}z_{i_4}+O(z)^5 \end{array} \end{aligned} $$
has no terms of degrees either two or three.
We redefine
$$\displaystyle \begin{aligned} \delta_1(z)&= z_1 + \sum_{i_2=1}^n\sum_{i_3=i_2}^n \delta_{1,i_2,i_3}z_{i_2}z_{i_3}\\ &\quad + \sum_{i_2=1}^n\sum_{i_3=i_2}^n\sum_{i_4=i_3}^n \delta_{1,i_2,i_3,i_4}z_{i_2}z_{i_3}z_{i_4} \end{aligned} $$
This does not change the degree two and three terms of \({\lambda _1\over 2}(\delta _1(z))^2\), and its degree four terms are of the form
$$\displaystyle \begin{aligned} \begin{array}{rcl} \lambda_1\ \sum_{i_2=1}^n\sum_{i_3=i_2}^n\sum_{i_4=i_3}^n \delta_{1,i_2,i_3,i_4}z_1z_{i_2}z_{i_3}z_{i_4} \end{array} \end{aligned} $$
If we let \(\delta _{1,i_2,i_3,i_4}={ \gamma _{1,i_2,i_3,i_4}\over \lambda _1}\), then we cancel the degree four terms involving z1 in
$$\displaystyle \begin{aligned} \begin{array}{rcl} V(z)-\sum_{j=1}^n{\lambda_j\over 2}(\delta_j(z))^2 \end{array} \end{aligned} $$
Next we redefine
$$\displaystyle \begin{aligned} \delta_2(z)&=z_2 + \sum_{i_2=2}^n\sum_{i_3=i_2}^n \delta_{2,i_2,i_3}z_{i_2}z_{i_3}\\ &\quad + \sum_{i_2=2}^n\sum_{i_3=i_2}^n\sum_{i_4=i_3}^n \delta_{2,i_2,i_3,i_4}z_{i_2}z_{i_3}z_{i_4} \end{aligned} $$
Again this does not change the degree two and three terms of \({\lambda _2\over 2}(\delta _2(z))^2\), and its degree four terms are of the form
$$\displaystyle \begin{aligned} \begin{array}{rcl} \lambda_2\ \sum_{i_2=2}^n\sum_{i_3=i_2}^n\sum_{i_4=i_3}^n \delta_{2,i_2,i_3,i_4}z_2z_{i_2}z_{i_3}z_{i_4} \end{array} \end{aligned} $$
If we let \(\delta _{2,i_2,i_3,i_4}={ \gamma _{2,i_2,i_3,i_4}\over \lambda _2}\) then we cancel the degree four terms involving z2 in
$$\displaystyle \begin{aligned} \begin{array}{rcl} V(z)-\sum_{j=1}^n{\lambda_j \over2}(\delta_j(z))^2 \end{array} \end{aligned} $$

We continue on in this fashion. The result is a sum of squares whose degree two through four terms equal V (z).

Eventually we define
$$\displaystyle \begin{aligned} \begin{array}{rcl} \delta_j(z)&\displaystyle =&\displaystyle z_j+ \sum_{i_2=j}^n\sum_{i_3=i_2}^n \delta_{j,i_2,i_3}z_{i_2}z_{i_3}+\cdots\\ &\displaystyle &\displaystyle + \sum_{i_2=j}^n \ldots \sum_{i_d=i_{d-1}}^n\delta_{j,i_2,\ldots,i_d}z_{i_2}\cdots z_{i_d} \end{array} \end{aligned} $$
and
$$\displaystyle \begin{aligned} \begin{array}{rcl} W(z)&\displaystyle =&\displaystyle \sum_{j=1}^n {\lambda_j\over 2}\left( \delta_j(z)\right)^2 \end{array} \end{aligned} $$

At degree d = 3, we solved a linear equation from the quadratic coefficients of δ1(z), …, δn(z) to the cubic coefficients of V (z). We restricted the domain of this mapping by requiring that the quadratic part of δi(z) does not depend on z1, …, zi−1. This made the restricted mapping square; the dimensions of the domain and the range of the linear mapping are the same. We showed that the restricted mapping has a unique solution.

If we drop this restriction, then at degree d the overall dimension of the domain is
$$\displaystyle \begin{aligned} \begin{array}{rcl} n\left(\begin{array}{ccc} n+d-1\\ d\end{array}\right) \end{array} \end{aligned} $$
while the dimension of the range is
$$\displaystyle \begin{aligned} \begin{array}{rcl} \left(\begin{array}{ccc} n+d\\ d+1\end{array}\right) \end{array} \end{aligned} $$

So the unrestricted mapping has more unknowns than equations, and hence there are multiple solutions.

But the restricted solution that we constructed is a least squares solution to the unrestricted equations because λ1 ≥ λ2 ≥… ≥ λn > 0. To see this consider the coefficient γ1,1,2 of \(z_1^2z_2\). If we allow γ2(z) to have a term of the form \(\delta _{2,1,1} z_1^2\), we can also cancel γ1,1,2 by choosing δ1,1,2 and δ2,1,1 so that
$$\displaystyle \begin{aligned} \begin{array}{rcl} \gamma_{1,1,2}&\displaystyle =&\displaystyle \lambda_1\delta_{1,1,2}+ \lambda_2\delta_{2,1,1} \end{array} \end{aligned} $$
Because λ1 ≥ λ2, a least squares solution to this equation is \(\delta _{1,1,2}={\gamma _{1,1,2}\over \lambda _1}\) and δ2,1,1 = 0. Because T is orthogonal, W(x) = W(T′z) is also a least squares solution.

Example

Suppose we wish to stabilize a double pendulum to straight up. The first two states are the angles between the two links and straight up measured in radians counterclockwise. The other two states are their angular velocities. The controls are the torques applied at the base of the lower link and at the joint between the links. The links are assumed to be massless, the base link is one meter long, and the other link is two meters long. There is a mass of two kilograms at the joint between the links and a mass of one kilogram at the tip of the upper link. There is linear damping at the base and the joint with both coefficients equal to 0.5 s−1. The resulting continuous time dynamics is discretized using Euler’s method with time step 0.1 s.

We simulated AHMPC with two different terminal costs and terminal feedbacks. In both cases the Lagrangian was
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} {0.1\over 2}\left(|x|{}^2+|u|{}^2\right) \end{array} \end{aligned} $$
(23)

The first pair \(V^2_f(x), \kappa ^1_f(x)\) was found by solving the infinite horizon LQR problem obtained by taking the linear part of the dynamics around the operating point x = 0 and the quadratic Lagrangian (23). Then \(V^2_f(x)\) is quadratic and positive definite and \(\kappa ^1_f(x)\) is a linear.

The second pair \(V^6_f(x), \kappa ^5_f(x)\) was found using the discrete time version of Al’brekht’s method. Then \(V^6_f(x)\) is the Taylor polynomial of the optimal cost to degree 6 and \(\kappa ^5_f(x)\) is the Taylor polynomial of the optimal feedback to degree 5. But \(V^6_f(x)\) is not positive definite so we completed the squares as above to get degree 10 polynomial \(V^{10}_f(x)\) which is positive definite.

In all the simulations we imposed the control constraint |u|≤ 4 and started at x(0) = (0.9π, 0.9π, 0, 0) with an initial horizon of N = 50 time steps. The extended horizon was kept constant at M = 5. The class K function was taken to be α(s) = s2∕10.

If the Lyapunov or feasibility conditions were violated, then the horizon N was increased by 5 and the finite horizon nonlinear program was solved again without advancing the system. If after three tries the Lyapunov or feasibility conditions were still not satisfied, then the first value of the control sequence was used, the simulation was advanced one time step, and the horizon was increased by 5 (Fig. 1).
Fig. 1

Angles, d = 1 on left, d = 5 on right

If the Lyapunov and feasibility conditions were satisfied over the extended horizon, then the simulation was advanced one time step, and the horizon N was decreased by 1.

The simulations were first run with no noise, and the results are shown in the following figures. Both methods stabilized the links to straight up in about t = 80 times steps (8 s). The degree 2d = 10 terminal cost and the degree d = 5 terminal feedback seem to do it a little more smoothly with no overshoot and with shorter maximum horizon N = 65 versus N = 75 for LQR (d = 1) (Fig. 2).
Fig. 2

Controls, d = 1 on left, d = 5 on right

The simulations were done on a MacBook Pro with a 3.1 GHz Intel Core i5 using MATLAB’s fmincon.m with its default settings. We did supply fmincon.m the gradients of the objective functions, but we did not give it the Hessians. The cpu time for the degree 2d = 10 terminal cost and the degree d = 5 terminal feedback was 5.01 s. This is probably too slow to control pendulum in real time because the solver needs to return u(0) in a fraction of a time step But by using a faster solver than fmincon.m, coding the objective, the gradient, and the Hessian in C compiling them, we probably could control the double pendula in real time. The cpu time for the LQR terminal cost and terminal feedback was 24.56 s so it is probably not possible to control the double pendula in real time using LQR terminal cost and terminal feedback (Fig. 3). Then we added noise to the simulations. At each advancing step a Gaussian random vector with mean zero and covariance 0.0001 times the identity was added to the state. The next figures show the results using the degree 10 terminal cost and degree 5 terminal feedback. Notice that the horizon starts at T = 50 but immediately jumps to T = 65, declines to T = 64, then jumps to T = 69 before decaying monotonically to zero. When the horizon T = 0 the terminal feedback is used. The LQR terminal cost and feedback failed to stabilize the noisy pendula. We stopped the simulation when T > 1000 (Fig. 4).
Fig. 3

Horizons, d = 1 on left, d = 5 on right

Fig. 4

Noisy angles, controls and horizons

We also considered d = 3 so that after completing the squares the terminal cost is degree 2d = 6 and the terminal feedback is degree d = 3. It stabilized the noiseless simulation with a maximum horizon of N = 80 which is greater than the maximum horizons for both d = 1 and d = 5. But it did not stabilize the noisy simulation. Perhaps the reason is revealed by Taylor polynomial approximations to \(\sin x\) as shown in Figure 5. The linear approximation in green overestimates the magnitude of \(\sin x\), so the linear feedback is stronger than it needs to be to overcome gravity. The cubic approximation in blue underestimates the magnitude of \(\sin x\) so the cubic feedback is weaker than it needs to be to overcome gravity. The quintic approximation in orange overestimates the magnitude of \(\sin x\), so the quintic feedback is also stronger than it needs to be but by a lesser margin than the linear feedback to overcome gravity. This may explain why the degree 5 feedback stabilizes the noise free pendula in a smoother fashion than the linear feedback (Fig. 5).
Fig. 5

Taylor approximations to \(y=\sin x\)

Conclusion

Adaptive Horizon Model Predictive Control is a scheme for varying the horizon length in Model Predictive Control as the stabilization process evolves. It adapts the horizon in real time by testing Lyapunov and feasibility conditions on extensions of optimal trajectories returned by the nonlinear program solver. In this way it seeks the shortest horizons consistent with stabilization.

AHMPC requires a terminal cost and terminal feedback that stabilizes the plant in some neighborhood of the operating point but that neighborhood need not be known explicitly. Higher degree Taylor polynomial approximations to the optimal cost and the optimal feedback of the corresponding infinite horizon optimal control problems can be found by an extension of Al’brekht’s method (Aguilar and Krener 2012). The higher degree Taylor polynomial approximations to optimal cost need not be positive definite, but they can be extended to nonnegative definite polynomials by completing the squares. These nonnegative definite extensions and the Taylor polynomial approximations to the optimal feedback can be used as terminal costs and terminal feedbacks in AHMPC. We have shown by an example that a higher degree terminal cost and feedback can outperform using LQR to define a degree two terminal cost and a degree one terminal feedback.

References

  1. Aguilar C, Krener AJ (2012) Numerical solutions to the dynamic programming equations of optimal control. In: Proceedings of the 2012 American control conferenceGoogle Scholar
  2. Antsaklis PJ, Michel AN (1997) Linear systems. McGraw Hill, New YorkGoogle Scholar
  3. Al’brecht EG (1961) On the optimal stabilization of nonlinear systems. PMM-J Appl Math Mech 25:1254–1266MathSciNetCrossRefGoogle Scholar
  4. Droge G, Egerstedt M (2011) Adaptive time horizon optimization in model predictive control. In: Proceedings of the 2011 American control conferenceGoogle Scholar
  5. Giselsson P (2010) Adaptive nonlinear model predictive control with suboptimality and stability guarantees. In: Proceedings of the 2010 conference on decision and controlGoogle Scholar
  6. Grimm G, Messina MJ, Tuna SE, Teel AR (2005) Model predictive control: for want of a local control Lyapunov function, all is not lost. IEEE Trans Autom Control 50:546–558MathSciNetCrossRefGoogle Scholar
  7. Gruene L (2012) NMPC without terminal constraints. In: Proceedings of the 2012 IFAC conference on nonlinear model predictive control, pp 1–13Google Scholar
  8. Gruene L, Pannek J, Seehafer M, Worthmann K (2010) Analysis of unconstrained nonlinear MPC schemes with time varying control horizon. SIAM J Control Optim 48:4938–4962MathSciNetCrossRefGoogle Scholar
  9. Khalil HK (1996) Nonlinear systems, 2nd edn. Prentice Hall, EnglewoodGoogle Scholar
  10. Kim T-H, Sugie T (2008) Adaptive receding horizon predictive control for constrained discrete-time linear systems with parameter uncertainties. Int J Control 81:62–73MathSciNetCrossRefGoogle Scholar
  11. Krener AJ (2018) Adaptive horizon model predictive control. In: Proceedings of the IFAC conference on modeling, identification and control of nonlinear systems, Guadalajara, 2018Google Scholar
  12. Michalska H, Mayne DQ (1993) Robust receding horizon control of constrained nonlinear systems. IEEE Trans Autom Control 38:1623–1633MathSciNetCrossRefGoogle Scholar
  13. Page SF, Dolia AN, Harris CJ, White NM (2006) Adaptive horizon model predictive control based sensor management for multi-target tracking. In: Proceedings of the 2006 American control conferenceGoogle Scholar
  14. Pannek J, Worthmann K (2011) Reducing the Predictive Horizon in NMPC: an algorithm based approach. Proc IFAC World Congress 44:7969–7974Google Scholar
  15. Polak E, Yang TH (1993a) Moving horizon control of linear systems with input saturation and plant uncertainty part 1. robustness. Int J Control 58:613–638CrossRefGoogle Scholar
  16. Polak E, Yang TH (1993b) Moving horizon control of linear systems with input saturation and plant uncertainty part 2.disturbance rejection and tracking. Int J Control 58:639–663CrossRefGoogle Scholar
  17. Polak E, Yang TH (1993c) Moving horizon control of nonlinear systems with input saturation, disturbances and plant uncertainty. Int J Control 58:875–903MathSciNetCrossRefGoogle Scholar
  18. Rawlings JB, Mayne DQ (2009) Model predictive control: theory and design. Nob Hill Publishing, MadisonGoogle Scholar

Copyright information

© This is a U.S. Government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2020

Authors and Affiliations

  1. 1.Department of Applied MathematicsNaval Postgraduate SchoolMontereyUSA

Section editors and affiliations

  • Alberto Isidori
    • 1
  1. 1.Department of Computer, Control, Management Engng. “A. Ruberti”University of Rome “La Sapienza”RomeItaly