Abstract
In this paper we extend dynamic programming techniques to the study of discrete-time infinite horizon optimal control problems on compact control invariant sets with state-independent best asymptotic average cost. To this end we analyse the interplay of dissipativity and optimal control, and propose novel recursive approaches for the solution of so called shifted Bellman Equations.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Dynamic programming (DP) is a cornerstone of control theory which allows to solve (in feedback form) optimal control problems formulated on horizons of increasing length through a suitable recursive formula for the computation of the so called value function, [1]
Remarkably, dynamic programming allows to study problems formulated both on a finite horizon or an infinite one, the latter achieved under suitable technical assumptions, by studying the asymptotic properties of the recursion or by computing its fixed points. By now, the subject of dynamic programming and infinite horizon optimal control has been studied in depth by many authors and several monographs on the subject exist both in the control domain [2,3,4] and in economics, [5, 6].
While, in its naive form, DP is often associated to the curse of dimensionality, which may hinder its applicability to scenarios of practical relevance, the topic of its approximate and efficient numerical treatment has also gathered significant impetus, in particular in the context of machine learning, [7]. Indeed, the dynamic programming or Bellman Equation is at the core of any (deep) reinforcement learning algorithm [8, 9].
The link of optimal control to dissipativity was already established by Willems in the seminal papers [10, 11] and in parallel in the study of nonlinear inverse optimal regulators for nonlinear systems, [12]. However, it was only brought to the forefront of the discourse on optimisation-based control in recent years, [13, 14], thanks to its surprising connections to closed-loop stability of Economic Model Predictive Control [15, 16] and long-run average optimal control, [17, 18]. In particular, [15] proposes a notion of optimal operation at steady-state and provides a sufficient conditions for this property to hold based on dissipativity of the associated systems’ dynamics with respect to a suitable supply function. The converse statement is investigated in [16] where an additional controllability assumption is needed in order to prove necessity of dissipativity. While generalizations of the above results, its relation to the so-called turnpike property, and extensions to periodic optimal solutions are provided in several subsequent works (i.e., [19] and [20]), the connection to Dynamic Programming and infinite horizon optimal control has remained elusive, due to restrictive technical assumptions needed to make sense of undiscounted cost functionals.
In this paper we further explore connections between dissipativity and infinite horizon optimal control problems, while proposing new formulations and iterative methods for their solutions that significantly expand the class of problems which can be meaningfully addressed by this approach. Our main contributions are
-
introducing a terminal penalty in infinite horizon optimal control, in the form of suitable storage functions with negative sign;
-
proposing a shifted Bellman Equation to be used in optimal control problems with non-zero (yet state-independent) optimal long run average performance (this includes systems with periodic, almost periodic or even chaotic regimes of operation allowing general time-varying asymptotic cost along optimal solutions);
-
proposing two novel recursions whose fixed points are solutions of shifted Bellman Equation (of any shift);
-
analysing the convergence properties of such recursions under fairly general technical assumptions, allowing simultaneous computation of the best average performance and of the associated value function;
-
tackling the non-existing trade-off between transient cost and asymptotic average performance.
The rest of the paper is organized as follows: Sect. 2 introduces the problem formulation, basic notations and some preliminary results, Sect. 3 introduces the shifted Bellman Equation and the novel recursion operators whose properties are investigated in Sect. 4. Section 5 provides a general convergence result under suitable conditions on the controllability of the system’s dynamics, while Sect. 6 relaxes some continuity assumptions needed for convergence analysis approaching the recursion from specific initialisations. Examples and counter-examples are shown in Sects. 7 and 8 draws some conclusions and points to further open research directions. Important intermediate technical results are collected in the appendix in Sect. A.
2 Problem Formulation and Preliminary Results
Consider the discrete-time finite dimensional nonlinear control system described by the following difference equations:
where \(x(t) \in {\mathbb {X}} \subset {\mathbb {R}}^n\) is the state-variable, taking values in some compact control invariant set \({\mathbb {X}}\), \(u(t) \in {\mathbb {R}}^m\) is the control input and \(f: {\mathbb {Z}} \rightarrow {\mathbb {X}}\), is the continuous transition map. We denote by \({\mathbb {U}}(\cdot ): {\mathbb {X}} \rightarrow 2^{{\mathbb {R}}^m} \) the upper semicontinuous set-valued mapping defined below:
which corresponds to the set of feasible control inputs in state x, given the compact state/input constraint set \({\mathbb {Z}}\). Moreover, we assume, without loss of generality,
for all \(x \in {\mathbb {X}}\). For an input sequence \({{\textbf {u}}}= \{ u(t)\}_{t=0}^{\infty }\), we denote by \(\phi (t,x,{{\textbf {u}}})\) the state at time t, from initial condition \(x(0)=x\), as given by iteration (2.1). We also extend definition (2.2), to allow feasible control sequences of length \(\tau \), as follows:
Our contribution is twofold; namely, to define optimal control problems over an infinite horizon within a significantly larger set of systems dynamics and associated cost functional than is currently possible to address by existing formulations, and, at the same time, to propose a dynamic programming approach for their solution. To this end we consider a continuous stage cost, \(\ell (x,u): {\mathbb {Z}} \rightarrow {\mathbb {R}}\), and formulate the following cost functional:
where \(\psi : {\mathbb {X}} \rightarrow {\mathbb {R}}\) is a continuous function called the terminal cost. Terminal costs significantly affect the solution of an optimal control problem and a key insight of our paper will be providing guidelines for their selection in order to allow the formulation of infinite horizon optimal control problems. A finite horizon optimal control problem is then defined as follows:
For each value of the initial condition \(x \in {\mathbb {X}}\), a solution of (2.6) is guaranteed to exist thanks to the compactness and non-emptiness properties of the feasible set, control invariance of \({\mathbb {X}}\), and continuity of the cost function.
On the other hand, when the control problem has no natural termination time, one might want to define an infinite horizon optimisation problem. This has often the additional appealing feature of being achieved through implementation of a time-invariant feedback policy. However, making sense of an infinite horizon formulation of (2.6) typically entails strong assumptions on the kind of system’s dynamics and cost functional that are allowed.
One strategy for avoiding such kind of limitations is, at least in practice, to introduce a discounting factor \(0< \gamma < 1\) in the cost function:
which for \(\gamma \approx 1\) provides a good approximation to some form of infinite horizon (average) cost. While this approach has some appealing features, for instance making optimal solutions invariant with respect to translation of \(\ell \) by any finite constant value, having to settle on a specific value of \(\gamma \) less than unity is unsatisfactory as it always leaves open the question of how optimal control policies would be affected by variations in \(\gamma \), i.e., if higher values were to be considered. Moreover, as shown later in Sect. 7.5, adoption of a discounting factor may introduce non-existent trade-offs between optimisation of steady-state and transient costs.
An alternative approach is to resort to average, rather than summed costs:
Taking the average yields well-defined costs even when summed costs would be divergent to \(\pm \infty \), or are non-convergent (for instance oscillating), which constitute the main obstructions in the definition of infinite horizon control problems for general dynamics and costs. On the other hand, time-shift invariance of average costs along any solution, implies that this approach disregards transient costs, which therefore won’t be minimised and might be arbitrarily large even for optimal feedback policies (see again example in Sect. 7.5). It is worth pointing out that similar notions have also attracted considerable interest in the Markov Decision Process literature, where optimal average cost is usually referred to as bias optimality, in contrast to gain optimality which captures optimal transient behaviour (see for instance [21]).
Our proposed solution and novel contribution is to provide fairly general conditions on the terminal cost \(\psi \) to make sure that the functional:
is well-defined. To this end the notion of dissipativity will play an interesting role. This notion was originally introduced by Willems in [10, 11] and has recently received a surge in interest for its crucial role in the analysis of closed-loop Economic Model Predictive Control schemes [13,14,15,16]. In a nutshell a system as (2.1), is said to be dissipative with respect to the supply function \(\ell (x,u)\), if there exists a continuous storage function \(\lambda : {\mathbb {X}} \rightarrow {\mathbb {R}}\) such that:
This inequality is normally interpreted in “energetic” terms as enforcing, for a dissipative system, that energy stored within, at the next state, cannot exceed the energy at the current state plus the energy externally supplied through the supply function \(\ell (x,u)\). In the context of optimal control, where the objective is to minimize a cost functional, \(\lambda (x)\) can be interpreted as the value of the state x and the dissipation inequality guarantees that the gain in value for any feasible control action u and state x cannot exceed the corresponding stage cost. Notice that, while optimal control sequences over any finite control horizon (or over infinite control horizon with forgetting factor \(\gamma \)) are invariant with respect to cost translation, viz. \({\tilde{\ell }} (x,u ):= \ell (x,u) - c\) for any constant \(c \in {\mathbb {R}}\), dissipativity is not a shift-invariant property. In fact, it can always be guaranteed by a sufficiently negative value of c, given compactness of \({\mathbb {Z}}\). Trivially, if \({\tilde{\ell }} (x,u) \ge 0\) for all \((x,u) \in {\mathbb {X}}\), dissipativity is ensured just by defining \(\lambda (x)=0\) for all \(x \in {\mathbb {X}}\). Our first result is stated below.
Proposition 2.1
Assume that system (2.1) is dissipative with continuous storage function \(\lambda (\cdot )\) with respect to the supply \(\ell (x,u)\), and let \(\psi (x) = - \lambda (x)\), then the limit:
exists for all \(x \in {\mathbb {X}}\), possibly assuming the value \(+\infty \).
Proof. Consider any feasible solution \(x^*_{\tau +1}\), \(u^*_{\tau +1}\) (with \(x^*(0) = x\)) which achieves the optimal cost \(V^{\psi }_{\tau +1} (x)\). By definition,
where the first inequality holds by the dissipativity assumption, and the second because \(x^*, u^*\) is a feasible solution also over the shorter horizon \([0,\tau ]\). Hence, \(V^{\psi }_{\tau } (x)\) is monotone non-decreasing with respect to \(\tau \) and the limit (2.10) exists.
It is important to realise that Proposition 2.1 only guarantees existence of the limit, not actual boundedness of the cost \(V^{\psi }_{\infty } (x)\). In fact, typically the cost would be \(+ \infty \) unless a suitably shifted version of \(\ell (x,u)\) is considered. In particular, there is only a single value of this shift that might result in a finite cost. This can be found, by alternative means, looking for the optimal average cost,
Under suitable technical conditions, for instance global controllability assumptions, the optimal cost is independent of x, and its value can be found [18, 22] by an infinite dimensional linear program, viz. by solving the following optimisation problem:
where:
We note that this approach has similarities to the effective Hamiltonian approach in continuous-time ergodic optimal control, see [23]. Dynamic programming allows to solve optimal control problems through iteration of a suitably defined operator, which computes the optimal cost for increasing values of the control horizon. To this end, for summed costs without exponential rescaling, the following Bellman operator is normally defined: \(T: {\mathcal {C}} ( {\mathbb {X}} ) \rightarrow {\mathcal {C}} ({\mathbb {X}} )\).
The following result characterizes \(V_{\infty }^{\psi }(x)\) as a fixed-point of the Bellman operator.
Proposition 2.2
Assume that \(\psi = - \lambda \) for some storage function \(\lambda \in {\mathcal {C}} ( {\mathbb {X}} )\) and that the following limit exists and is finite:
Then, \(V_{\infty }^{\psi }\) is a lower semi-continuous solution of the Bellman Equation, viz. \(T V_{\infty }^{\psi } (x) = V_{\infty }^{\psi } (x)\).
Proof
To see this, recall that \(V_{\tau }^{\psi }(x)\) is non-decreasing with respect to \(\tau \). Hence:
Since \(\tau \) is arbitrary, we see that:
This proves that \(V^{\psi }_{\infty }\) is lower semicontinuous. Hence the minimum of
is achieved, for some optimal feedback policy \(u^*(x)\). Moreover it fulfills:
On the other hand:
Let \(x \in {\mathbb {X}}\) be fixed and arbitrary. Since \(V_\tau \) is continuous in x (by induction over \(\tau \), continuity of \(\psi \) and u.s.c. of \({\mathbb {U}}(x)\)), for each \(\tau >0\) and the current fixed value of x there exists a minimizer \(u_\tau (x) \in {\mathbb {U}}(x)\) for this last expression. Since \({\mathbb {U}}(x)\) is compact, we find a sequence \(\tau _n\rightarrow \infty \) (possibly x-dependent) such that \(u_{\tau _n}\) converges to a control value \(u_\infty (x) \in {\mathbb {U}}(x)\). For each \(\tau >0\) this implies
Since \(V_{\tau }^{\psi } (x) \rightarrow V_{\infty }^{\psi } (x)\), for each \(\varepsilon >0\) there exists \(\tau _{\varepsilon } (x)>0\) such that \(V_{\tau _{\varepsilon }(x)}^{\psi } (x) \ge V^{\psi }_{\infty } (x) - \varepsilon \). Hence we see, starting from (2.15):
Since \(x\in {\mathbb {X}}\) and \(\varepsilon >0\) were arbitrary, the assertion \(V_{\infty }^{\psi } (x) \ge T V_{\infty }^{\psi } (x)\) follows for all \(x \in {\mathbb {X}}\) and this, combined with the complementary inequality \(V_{\infty }^{\psi } (x) \le T V_{\infty }^{\psi } (x)\) previously shown, completes the proof. \(\square \)
We remark that Proposition 2.2 is connected to the result in [24], where the role of linear penalty functions is explored in guaranteeing asymptotic stability of Economic MPC.
3 Shifted Bellman Equation and Operators
In the literature, different constructive approaches for computing storage functions are described, above all the classical constructions of the available storage and the required supply, which go back to [10] and are easily adapted to the discrete-time case (see, e.g., [16, 19] for the available storage). For this reason, a possible, but ultimately unsatisfactory, way to approach an infinite horizon optimal control problem would be according to the following steps:
-
1.
Computing the minimal average cost, \(V^{\text {avg}}\);
-
2.
Defining a shifted stage cost, \({\tilde{\ell }} (x,u) = \ell (x,u) - V^{\text {avg}}\), so as to yield 0 optimal average;
-
3.
Compute a storage function \(\lambda \) for the supply function \({\tilde{\ell }}(x,u)\);
-
4.
Defining \(\psi := - \lambda \) as a terminal penalty for the infinite horizon optimal control problem, with shifted stage costs \({\tilde{\ell }}\);
-
5.
Use the standard Bellman iteration to asymptotically compute the value function over an infinite horizon or directly looking for a solution of the associated Bellman Equation.
This procedure is non ideal for several reasons: first of all, computation of the optimal average cost involves a limiting operation, and therefore typically only approximate values for \(V^{\text {avg}}\) can ever be achieved. However, using approximate values in the iteration of the Bellman operator, yields diverging optimal costs over an infinite horizon, either to \(\pm \infty \), depending on whether the optimal average cost has been over or underestimated. In addition, Step 3 is bound to fail whenever the average optimal cost \(V^{\text {avg}}\) has been overestimated (in other words a storage function might exist only for \({\tilde{\ell }}(x,u)=\ell (x,u) - c\) where \(c \le V^{\text {avg}}\)).
The goal of this section is to propose operators, the \(\min \)-shifted and \(\max \)-shifted Bellman operator, whose iteration would converge to the optimal infinite horizon cost, and, at the same time, yield as a by-product the optimal average cost.
To this end, we need additional notation. Given \(\psi _1: {\mathbb {X}} \rightarrow {\mathbb {R}}\) and \(\psi _2: {\mathbb {X}} \rightarrow {\mathbb {R}}\), continuous, we define the following:
viz. the median value of the difference \(\psi _1- \psi _2\). The following distance notion is also defined:
Notice that \(d( \psi _1 + c_1, \psi _2 + c_2 ) = d ( \psi _1, \psi _2 )\) for all \(c_1, c_2 \in {\mathbb {R}}\). Indeed,
where the second equality follows by the change of variables \({\tilde{b}}= c_1 - c_2 + b\). Moreover:
viz. the infinity norm of the shifted version of signal \(\psi _1 - \psi _2\) once its median value has been subtracted. In fact, an equivalent alternative definition for \(d(\psi _1,\psi _2)\) is as follows:
Recall the Bellman operator \(T: {\mathcal {C}} ( {\mathbb {X}} ) \rightarrow {\mathcal {C}} ({\mathbb {X}} )\) previously introduced:
Definition 3.1
Define the \(\min \)-shifted Bellman operator \({\hat{T}}: {\mathcal {C}} ( {\mathbb {X}}) \rightarrow {\mathcal {C}} ( {\mathbb {X}} )\) as:
Similarly, we may consider the following operator.
Definition 3.2
Define the \(\max \)-shifted Bellman operator \({\check{T}}: {\mathcal {C}} ( {\mathbb {X}}) \rightarrow {\mathcal {C}} ( {\mathbb {X}} )\) as:
It is straightforward to see that:
for all \(k \in {\mathbb {N}}\). Opposite inequalities hold in the case of the \({\check{T}}\) operator:
Remark 3.3
By induction, and exploiting the \(\min \) commutativity property, the following formula can be proved (see Appendix B.2):
Along similar lines the following inequality can be shown by induction for the \({\check{T}}\) operator:
The following result holds:
Proposition 3.4
A function \({\bar{\psi }} (x) \in {\mathcal {C}}( {\mathbb {X}})\) is a fixed point of \({\hat{T}}\) or \({\check{T}}\) if and only if there exists \(b \in {\mathbb {R}}\) such that \({\bar{\psi }}\) is a solution of the following shifted Bellman Equation:
Proof
Assume that \({\bar{\psi }}\) fulfills the shifted Bellman Equation (3.6). Then, direct computation shows:
where the equality follows since by definition \(c ( {\bar{\psi }}, {\bar{\psi }} + {b} ) = -{b}\). Conversely, assume \({\hat{T}} \bar{ \psi } = {\bar{\psi }}\):
Hence, the following inequality holds:
We claim that more is true, namely:
Assume by contradiction:
where the \(\min \) exists by continuity of \({\bar{\psi }}\), \(T \bar{ \psi }\) and compactness of \({\mathbb {X}}\). By inequality (3.7) we also know that:
Taking a convex combination of the two previous inequalities yields:
which is a contradiction. Hence, (3.8) holds, and \(\bar{ \psi }\) is solution of a shifted Bellman Equation. A similar proof applies to the \({\check{T}}\) operator. \(\square \)
The following proposition shows that the feedback control that can be generated from a solution \({{\bar{\psi }}}\) of the shifted Bellman equation guarantees the optimal average cost b, provided \({{\bar{\psi }}}\) is bounded.
Proposition 3.5
Let \({{\bar{\psi }}}\) be a bounded solution of the shifted Bellman equation (3.6) for some \(b\in {\mathbb {R}}\). Let \(x^\star (t)\), \(u^\star (t)\) be a solution of (2.1) which realizes the minimum in (2.13) for all \(t\ge 0\), i.e.,
for all \(t\ge 0\). Then the average cost satisfies
and this is the best possible value.
Moreover \(x^*(\cdot ), u^*(\cdot )\) is optimal with respect to the following finite and infinite horizon costs:
Proof
The assumptions yield for all \(t\ge 0\)
implying
Summing up this equation yields
which implies
because by the boundedness assumption on \({{\bar{\psi }}}\) the expression \(({{\bar{\psi }}}(x^\star (0)) - {{\bar{\psi }}}(x^\star (\tau )))/\tau \) converges to 0 as \(\tau \rightarrow +\infty \).
For any other solution \(x(\cdot )\), \(u(\cdot )\) of (2.1), the definition of T in (2.13) implies
Then the same computations as above yield \(J^\textrm{avg}(x(\cdot ),u(\cdot )) \ge b\), showing optimality of the average cost b. Alternatively, equality (3.9) may be restated as
Similarly to the case of average costs, for any other feasible solution x(t), u(t) with \(x^\star (0)=x(0)\) we have:
thus proving optimality of \(x^\star (\cdot ),u^\star (\cdot )\) with respect to finite horizon costs \( \sum _{t=0}^{\tau -1} \, [ \ell (x(t),u(t)) - b] + {\bar{\psi }} ( x{(\tau )})\) for all \(\tau \in {\mathbb {N}}\). Letting \(\tau \) go to infinity in the above (in)equalities shows optimality of \(x^\star (\cdot )\), \(u^\star (\cdot )\) on infinite horizon costs:
It is worth pointing out that \(\liminf \) in the infinite horizon cost could also be replaced by \(\limsup \) since the cost of the optimal solution admits a limit for all initial conditions. In addition, removing the bias term b from finite horizon costs only shifts them by a constant and does not affect optimality of solutions. As we will see later in Proposition 4.1, the value b for which the shifted Bellman equation can have a solution is unique (even without assuming boundedness of the solution), which is in line with the fact that due to its minimality the minimal average cost is also unique. \(\square \)
Remark 3.6
-
(i)
A similar computation as in the proof of Proposition 3.5 shows that for the shifted cost \(\ell (x,u)-b\) the identity
$$\begin{aligned} J_\tau ^\psi (x^\star (\cdot ),u^\star (\cdot )) = {{\bar{\psi }}}(x^\star (0)) - {{\bar{\psi }}}(x^\star (\tau )) + \psi (x^\star (\tau )) \end{aligned}$$holds. If \({{\bar{\psi }}}\) and \(\psi \) are bounded, then the lim sup of this expression is bounded, implying that \(V_\infty ^\psi \) from (2.14) (if existing) is bounded from above. Particularly, in the situation of Proposition 2.1, where \(V_\infty ^\psi \) exists but may be \(+\infty \), the existence of a bounded solution of the shifted Bellman equation implies finiteness of \(V_\infty ^\psi \) when \(\ell (x,u)\) is replaced by \(\ell (x,u)-b\).
-
(ii)
Conversely, if \(V_\infty ^\psi \) exists and is finite for the shifted cost \(\ell (x,u)-b\), then a standard dynamic programming proof (see, e.g., [25, Theorem 4.4]) shows that \(V_\infty ^\psi \) solves the Bellman equation (i.e., the shifted Bellman equation (3.6) with \(b=0\)) and thus the shifted Bellman equation (3.6) for the original \(\ell \).
-
(iii)
A crucial question is thus whether \(V_\infty ^\psi \) exists and is finite, or even bounded. For strictly dissipative systems (for a definition see (4.1), below), sufficiently fast (asymptotic) controllability to the equilibrium \((x^e,u^e)\) guarantees this, see [26, Assumption 6.1 and Theorem 6.4]. A similar condition applies to optimal control problems with an optimal periodic orbit, see [19, Assumption 10 and Theorem 16]. In both references the inequalities are shown for all sufficiently large finite horizons, but carry over to the infinite horizon limit. We conjecture that this condition can be extended to systems that have more complex optimal behavior than equilibria or periodic orbits, but this extension goes beyond the scope of this paper.
-
(iv)
Alternatively one might wonder whether shifting the stage-cost by the optimal average cost might be enough (without adoption of a terminal penalty function) to guarantee existence of the infinite horizon cost. For this to happen, typically, one needs two conditions: constant stage cost along a recurrent (e.g. periodic) average optimal solution and some form of fast enough convergence of the cost to this constant value (e.g. exponential). Together the conditions guarantee a finite limit of the infinite horizon transient cost.
4 Properties of T, \({\hat{T}}\) and \({\check{T}}\) Operators
Throughout this section we recall some useful properties of the T operator and additionally provide original derivations for the properties of the \({\hat{T}}\) and \({\check{T}}\) operators. Some of the properties listed below are well known and can be found in [3]:
-
Monotonicity:
$$\begin{aligned} {[} \psi _1 (x) \le \psi _2 (x), \; \forall \, x \in {\mathbb {X}}] \Rightarrow [ T \psi _1 (x) \le T \psi _2 (x), \; \forall \, x \in {\mathbb {X}} ] \end{aligned}$$ -
Translation invariance:
$$\begin{aligned} T ( \psi + {b} ) = T \psi + {b}, \end{aligned}$$for any constant \({b} \in {\mathbb {R}}\);
-
Minimum commutativity, for finite index set K:
$$\begin{aligned} T \left( \min _{k \in K} \{ \psi _k \} \right) = \min _{k \in K} \{ T \psi _k \} \end{aligned}$$To see the last one, notice:
$$\begin{aligned} T \left( \min _{k \in K} \{ \psi _k \} \right)= & {} \min _{u \in {\mathbb {U}}(x) } \ell (x,u) + \min _{k \in K} \{ \psi _k (f(x,u)) \} \\ {}{} & {} = \min _{u \in {\mathbb {U}}(x) } \min _{k \in K} \{ \ell (x,u) + \psi _k (f(x,u) \}\\{} & {} = \min _{k \in K} \min _{u \in {\mathbb {U}}(x)} \ell (x,u) + \psi _k (f(x,u)) \; = \; \min _{k \in K} \{ T \psi _k \}. \end{aligned}$$ -
Concavity:
For all \(\alpha \in [0,1]\) and any \(\psi _1, \psi _2\) it holds:
$$\begin{aligned} T ( \alpha \psi _1 + (1-\alpha ) \psi _2 ) \ge \alpha T \psi _1 + (1- \alpha ) T \psi _2. \end{aligned}$$To see this, notice:
$$\begin{aligned} T ( \alpha \psi _1 + (1-\alpha ) \psi _2 ){} & {} = \min _{u \in {\mathbb {U}}(x) }\ell (x,u) + \alpha \psi _1 (f(x,u)) + (1- \alpha ) \psi _2 (f(x,u)) \\{} & {} = \min _{u \in {\mathbb {U}} (x) } \alpha [ \ell (x,u) + \psi _1 ( f(x,u)) ]\\{} & {} \qquad + (1- \alpha ) [ \ell (x,u) + \psi _2 (f(x,u) ) ] \\\ge & {} \min _{u \in {\mathbb {U}} (x) } \alpha [ \ell (x,u) + \psi _1 ( f(x,u)) ] \\{} & {} + \min _{u \in {\mathbb {U}} (x) }(1- \alpha ) [ \ell (x,u) + \psi _2 (f(x,u) ) ]\\= & {} \alpha T \psi _1 + (1- \alpha ) T \psi _2. \end{aligned}$$ -
Max-super-commutativity: the following inequality holds:
$$\begin{aligned} T \max \{ \psi _1, \psi _2 \} \ge \max \{ T \psi _1, T \psi _2 \}, \end{aligned}$$and by induction, for any finite set K:
$$\begin{aligned} T \left( \max _{k \in K} \{ \psi _k(x) \} \right) \ge \max _{k \in K} \{ T \psi _k(x) \}. \end{aligned}$$ -
Non-expansiveness: monotonicity and shift-invariance can be exploited to show the following inequality, expressing (incremental) non-expansiveness of the T operator:
$$\begin{aligned} d (T \psi _1, T \psi _2 ) \le d ( \psi _1, \psi _2 ), \qquad \forall \, \psi _1, \psi _2 \in {\mathcal {C}}( {\mathbb {X}} ). \end{aligned}$$
Next we derive some useful properties of the \({\hat{T}}\) and \({\check{T}}\) operators. Notice that for all \(b_1,b_2 \in {\mathbb {R}}\) the following holds:
Hence the following translation invariance can be seen:
for all \(b \in {\mathbb {R}}\). In fact,
The same property holds for \({\check{T}}\). The next proposition states that all solutions of a shifted Bellman Equation share the same shift value.
Proposition 4.1
Let \(\psi _1\) and \(\psi _2\) be continuous solutions of the shifted Bellman Equation (3.6), viz. \(T \psi _1 + c_1 = \psi _1\) and \(T \psi _2 + c_2 = \psi _2\) for suitable constants \(c_1\) and \(c_2\). Then, \(c_1=c_2\).
Proof
See Appendix B.1. \(\square \)
We show later, by means of an example, that while the shift is uniquely defined for all solutions of the shifted Bellman Equation, it is not true in general that \(d ( \psi _1, \psi _2) =0\), i.e. there may be multiple solutions of the shifted Bellman Equation, even after taking into account translation invariance. In the remainder of this section, we describe a situation in which the solution of the shifted Bellman Equation is unique, up to the addition of a constant. Again, a dissipativity inequality plays a role, but now a stronger one than (2.9). For an equilibrium \((x^e,u^e)\) (i.e., \(f(x^e,u^e)=x^e\)) we call the system strictly dissipative, if there exists a storage function \(\lambda : {\mathbb {X}} \rightarrow {\mathbb {R}}\), bounded from below, andFootnote 1\(\alpha \in {\mathcal {K}}\) such that
We note that a positive definite stage cost, i.e., an \(\ell \) satisfying \(\ell (x,u) \ge \alpha (\Vert x-x^e\Vert )\) for all \((x,u)\in {\mathbb {Z}}\) and \(\ell (x^e,u^e)=0\), satisfies the inequality (4.1) for \(\lambda \equiv 0\). For this kind of stage costs, the following proposition holds.
Proposition 4.2
Suppose the stage cost \(\ell \) satisfies \(\ell (x,u) \ge \alpha (\Vert x-x^e\Vert )\) for all \((x,u)\in {\mathbb {Z}}\) and some \(\alpha \in {\mathcal {K}}\), and \(\ell (x^e,u^e)=0\). Then, up to the addition of a constant, there exists at most one continuous solution of the shifted Bellman Equation.
Proof
Let \(\psi _1\) and \(\psi _2\) be two continuous solutions of the shifted Bellman Equation (3.6) that are bounded from below. By adding suitable constants, we can assume that \(\psi _1(x^e)=\psi _2(x^e)=0\). From (2.13) we obtain that
implying \(c\le 0\).
For each \(x\in {\mathbb {X}}\), let \(u_i^*(x)\in {\mathbb {U}}(x)\) be a control that realizes the minimum in the Bellman operator (2.13) for \(\psi =\psi _i\), \(i=1,2\). Such a \(u_i^*(x)\) exists because \(\ell \), f, and \(\psi _i\) are continuous and \({\mathbb {U}}(x)\) is compact. Then from the shifted Bellman Equation we obtain that
implying
Now, given \(x_i^*(0)\in {\mathbb {X}}\), by \(x_i^*(k)\) we denote the sequence generated by \(x_i^*(k+1) = f(x_i^*(k),u_i^*(x_i^*(k))\). Then (4.2) implies
Since \(\psi _i\) is bounded from below in \({\mathbb {X}}\), this sum must converge, implying that \(\alpha (\Vert x_i^*(k)-x^e\Vert )\rightarrow 0\) and thus \(x_i^*(k)\rightarrow x^e\) as \(k\rightarrow \infty \). Since \(\psi _i(x^e)=0\) and \(\psi _i\) is continuous, we also obtain \(\psi _j(x_i^*(k))\rightarrow 0\) as \(k\rightarrow \infty \) for \(i=1,2\) and \(j=1,2\).
Now pick an arbitrary \(x\in {\mathbb {X}}\). We show that for each \(\varepsilon >0\) and for both choices \(i=1\), \(j=2\) and \(i=2\), \(j=1\) we have
holds, which shows \(\psi _1(x)=\psi _2(x)\) and thus the assertion.
To this end, consider the sequence \(x_i^*(k)\) with \(x_i^*(0)=x\). For each \(k\ge 0\) we obtain, using that c must be the same in the shifted Bellman Equation for \(\psi _j\) and \(\psi _i\) due to Proposition 4.1,
Iterating this inequality we thus obtain
for all \(k\ge 0\). Since we know that \(\psi _j(x_i^*(k))\rightarrow 0\) and \(\psi _i(x_i^*(k))\rightarrow 0\) as \(k\rightarrow \infty \), there is \(k\in {\mathbb {N}}\) such that both \(|\psi _j(x_i^*(k))|<\varepsilon /2\) and \(|\psi _j(x_i^*(k))|<\varepsilon /2\) hold, implying \(\psi _j(x_i^*(k)) - \psi _i(x_i^*(k))<\varepsilon \) and thus (4.3). \(\square \)
Now for a strictly dissipative system satisfying (4.1) we consider the “rotated” stage cost
and observe that it satisfies the conditions on \(\ell \) from Proposition 4.2. The corresponding Bellman operator defined by
satisfies the following property.
Lemma 4.3
For any continuous function \(\lambda :{\mathbb {X}}\rightarrow {\mathbb {R}}\) the identity
holds. Particularly, if \(\psi \) is a solution of the shifted Bellman Equation for T and some c, then \({{\tilde{\psi }}} = \psi +\lambda \) is a solution of the shifted Bellman Equation for \(\widetilde{T}\) and \({\tilde{c}} = c - \ell (x^e,u^e)\).
Proof
For all \(x\in {\mathbb {X}}\) we have that
This proves the first statement. Now, if \(\psi \) is a solution of the shifted Bellman Equation for T, then
i.e. \({{\tilde{\psi }}}\) is a solution of the shifted Bellman Equation for \(\widetilde{T}\). \(\square \)
Theorem 4.4
Consider an optimal control problem for which strict dissipativity (4.1) holds with a continuous storage function \(\lambda \).
Then, up to the addition of a constant, there exists at most one continuous solution of the shifted Bellman Equation.
Proof
Let \(\psi _1\) and \(\psi _2\) be two solutions of the shifted Bellman Equation satisfying the assumption. Then \({{\tilde{\psi }}}_i = \psi _i+\lambda \), \(i=1,2\) satisfy the assumption of Proposition 4.2 since \(\lambda \) is continuous and bounded from below. Hence, applying Proposition 4.2 to \(\widetilde{T}\) yields that \(\psi _1 + c - \ell (x^e,u^e)\) and \(\psi _2 + c - \ell (x^e,u^e)\) coincide up to the addition of a constant, implying the same for \(\psi _1\) and \(\psi _2\). \(\square \)
We note that non-strict dissipativity is not enough to obtain this uniqueness result up to additions of constants, as the example in Sect. 7.2.1 shows.
Remark 4.5
If the optimal control problem is strictly dissipative, then under a mild controllability condition it has the so-called turnpike property at the equilibrium \((x^e,u^e)\), see, e.g., [13]. This in particular means that the infinite horizon optimal trajectoryFootnote 2 from Proposition 3.5 satisfies \((x^\star (t),u^\star (t)) \rightarrow (x^e,u^e)\) as \(t\rightarrow \infty \). Moreover, the optimal control problem with cost \({{\tilde{\ell }}}(x,u) = \ell (x,u)-\ell (x^e,u^e)+\lambda (x)-\lambda (f(x,u))\) from (4.4) also has the turnpike property at \((x^e,u^e)\), which is here easily seen directly, since this cost is positive definite with respect to the equilibrium \((x^e,u^e)\).
By adding appropriate constants to \({{\bar{\psi }}}\) and \(\lambda \) we can assume without loss of generality that \({{\bar{\psi }}}(x^e)=0\) and \(\lambda (x^e)=0\). In the following, we moreover assume that \({{\bar{\psi }}}\) and \(\lambda \) are continuous near \(x^e\). For the cost \({{\tilde{\ell }}}\) defined above we obtain
i.e., minimizing the sum over \({{\tilde{\ell }}}\) is equivalent to minimizing the sum over \(\ell \) with terminal cost \(\psi =-\lambda \). Now the fact that \({{\tilde{\ell }}}\) is positive definite with respect to the equilibrium \((x^e,u^e)\) implies that any candidate for an optimal trajectory of (4.5) must eventually be close to \((x^e,u^e)\), implying \(\lambda (x(\tau ))\rightarrow 0\). This means that a trajectory is an approximate minimizer of (4.5) if and only if it approximately minimizes
among all solutions satisfying \({\lambda (x(\tau ))}\approx 0\). The closer the value of (4.6) is to the minimum of this sum and the closer \({\lambda (x(\tau ))}\) is to 0, the closer (4.5) is to its minimum.
Now consider the optimal trajectories \((x^\star (t),u^\star (t))\) from Proposition 3.5 minimizing
Since the problem has the turnpike property, we have that \(x(\tau )\approx {x_e}\) and thus \({\bar{\psi }} ( x(\tau ))\approx 0\) for large \(\tau \), i.e.,
This means that \((x^\star (t),u^\star (t))\) approximately minimizes (4.6) and satisfies \(x^\star (\tau )\approx {x_e}\). Consequently, it approximately minimizes (4.5), with the approximation error tending to 0 as \(\tau \rightarrow \infty \). Conversely, the finite horizon optimal trajectories minimizing (4.5) can be extended to near-optimal infinite horizon optimal trajectories for the stage cost \(\ell \), with the gap to optimality decreasing to 0 as the length of the finite horizon in (4.5) tends to infinity. This explains from an optimal control point of view why the finite horizon optimal value functions \(V_\tau ^\psi \) with terminal cost \(\psi =-\lambda \) converge to a solution of the Bellman equation (if they converge, at all), as stated in Proposition 2.2.
5 Convergence Analysis Under Equicontinuity
In order to prove convergence of the \({\hat{T}}\) and \({\check{T}}\) iterations to a solution point of the shifted Bellman Equation we restrict the dynamics to fulfill suitable equicontinuity assumptions. Moreover, we provide sufficient conditions, in the form of controllability assumptions, which lead to the needed equicontinuity properties both for the iteration \(T^k \psi \) and \({\hat{T}}^k \psi \).
In order to have convergence guarantees for a sequence of functions, the following notion of equicontinuity is adopted.
Definition 5.1
A sequence of functions \(\{ \psi _k \}_{k=0}^{+ \infty }\), \(\psi _k: {\mathbb {X}} \rightarrow {\mathbb {R}}\) is said to be equicontinuous, if there exists a function \(\gamma \in {\mathcal {K}}_{\infty }\) such that:
To carry out our analysis, we will need the following assumption.
Assumption 5.2
The sequence \(\{ T^k \psi \}_{k = 0}^{+\infty }\) is equicontinuous.
The following lemma shows that this assumption immediately carries over to \({\hat{T}}^k\psi \).
Lemma 5.3
The sequence \(\{ {\hat{T}}^k \psi \}_{k=0}^{+ \infty }\) is equicontinuous provided \(\{ T^k \psi \}_{k=0}^{+ \infty }\) is such.
Proof
The lemma is a consequence of formula (3.4). In particular, for any \(x_2 \in {\mathbb {X}}\) there exists \(\tau ^*(x_2)\) such that:
Therefore, for any \(x_1 \in {\mathbb {X}}\) we see:
where the last inequality holds by the assumption of equicontinuity of the \(T^k \psi \) sequence. The symmetric inequality \({\hat{T}}^k \psi (x_2) - {\hat{T}}^k \psi (x_1) \le \gamma (|x_1-x_2|)\) can be proved along the same lines. Therefore equicontinuity holds with respect to the same function \(\gamma \).
Our main convergence results under equicontinuity are now stated in the following two theorems.
Theorem 5.4
Let \(\psi \in {\mathcal {C}} ( {\mathbb {X}} )\) be such that \(T^k \psi (x)\) fulfill Assumption 5.2. Then, if a continuous solution of the shifted Bellman Equation exists, the sequence \({\hat{T}}^k \psi (x)\) converges uniformly to one such solution.
Proof
Consider the sequence \([{\hat{T}}^k \psi ]_n\). By Lemma A.5 this sequence is bounded since:
Moreover, by Lemma 5.3 it is equicontinuous. Hence, by the Arzela-Ascoli Theorem, it admits a non empty set of accumulation points (with respect to the uniform topology),
Moreover, each accumulation point in \(\omega (\psi )\) is continuous and fulfills the same continuity inequality,
By Lemma A.8, the function \(W( [\psi ]_n)= W (\psi ):= d(\psi ,T \psi )\) is non-increasing along the iteration of \({\hat{T}}\), viz. \(W({\hat{T}}^k \psi )\) is a non-increasing sequence, bounded from below by 0. In addition W is continuous in the topology of uniform convergence. Hence, the limit \(\lim _{k \rightarrow + \infty } W ({\hat{T}}^k \psi )\) exists, and we denote it by \({\bar{W}}\). Because of continuity of W and uniform convergence to the limit points we also have \(W ( {\bar{\psi }} ) = {\bar{W}}\) for all \({\bar{\psi }} \in \omega ( \psi )\). Notice that \(\omega (\psi )\) is invariant with respect to \({\hat{T}}\). Let, in the following, \({\bar{\psi }}\) denote an arbitrary element in \(\omega (\psi )\). For any \(k \in {\mathbb {N}}\) we have \(W ( {\hat{T}}^k {\bar{\psi }} ) = {\bar{W}}\). By combined inequalities (A.9) and (A.8) we see that \(W ( {\hat{T}}^k {\bar{\psi }} )\) can be constant only provided \(\min _{x \in {\mathbb {X}}} {\hat{T}}^k {\bar{\psi }} (x) - T {\hat{T}}^k {\bar{\psi }}(x)\) and \(\max _{x \in {\mathbb {X}}} {\hat{T}}^k {\bar{\psi }} (x) - T {\hat{T}}^k {\bar{\psi }}(x)\) are constant with respect to k. By Corollary A.24, the sequence \({\hat{T}}^k {\bar{\psi }}\) is bounded and converges monotonically to an upper semi-continuous limit. Notice that, by invariance of \(\omega (\psi )\) and the fact that all elements of \(\omega (\psi )\) fulfill inequality (5.1), equicontinuity of \({\hat{T}}^k {\bar{\psi }}\) follows. Hence the limit \(\psi _{\infty } (x):= \lim _{k \rightarrow + \infty } {\hat{T}}^k {\bar{\psi }} (x)\) not only exists (as previously established), but is also continuous and, by Dini’s Theorem, convergence is uniform in \({\mathbb {X}}\). By continuity of the \({\hat{T}}\) operator with respect to uniform convergence, \(\psi _{\infty }(x)\) is a solution of the shifted Bellman Equation (cf. Lemma A.22) and \(0=d( \psi _{\infty }, T \psi _{\infty })=d( {\bar{\psi }}, T {\bar{\psi }} )\). This shows that any element of \(\omega (\psi )\) is a solution of the shifted Bellman Equation. We only need to show that \(\omega (\psi )\) is a singleton. This follows because of Lemma A.6. Indeed, the distance to any element \({\bar{\psi }}\) of \(\omega (\psi )\) is non increasing along the iteration \({\hat{T}}^k \psi \). Since such distance is converging to 0 along some subsequence \({\hat{T}}^{k_n} \psi \), then it is converging to 0 along the sequence \({\hat{T}}^k \psi \) itself. \(\square \)
Due to the lack of an analogue to formula (3.4) for the \({\check{T}}\) operator, there is no simple way of proving a version of Lemma 5.3 for \({\check{T}}^k \psi \). As a consequence, the analogue of Theorem 5.4 for \({\check{T}}\) is stated by directly assuming equicontinuity of \({\check{T}}^k \psi \).
Theorem 5.5
Let \(\psi \in {\mathcal {C}} ( {\mathbb {X}} )\) be such that \({\check{T}}^k \psi (x)\) fulfills Assumption 5.2. Then, if a continuous solution of the shifted Bellman Equation exists, the sequence \({\check{T}}^k \psi (x)\) converges uniformly to one such solution.
Proof
Consider the sequence \([{\check{T}}^k \psi ]_n\). This sequence is bounded since:
where the last inequality follows by Lemma A.7. Moreover, by assumption, it is equicontinuous. Hence, by Arzela-Ascoli Theorem, it admits a non empty set of limit points (with respect to the uniform topology),
Note that each limit point in \(\omega (\psi )\) is continuous and fulfills the same continuity inequality,
By Lemma A.11, the function \(W( [\psi ]_n)= W (\psi ):= d(\psi ,T \psi )\) is non-increasing along the iteration of \({\check{T}}\), viz. \(W({\check{T}}^k \psi )\) is a non-increasing sequence, bounded from below by 0. In addition W is continuous in the topology of uniform convergence. Hence, the limit \(\lim _{k \rightarrow + \infty } W ({\check{T}}^k \psi )\) exists, and we denote it by \({\bar{W}}\). Because of continuity of W and uniform convergence to the limit points we also have \(W ( {\bar{\psi }} ) = {\bar{W}}\) for all \({\bar{\psi }} \in \omega ( \psi )\). Notice that \(\omega (\psi )\) is invariant with respect to \({\check{T}}\). Hence, for any \({\bar{\psi }} \in \omega (\psi )\) and any \(k \in {\mathbb {N}}\) we have \(W ( {\check{T}}^k {\bar{\psi }} ) = {\bar{W}}\). By combined inequalities (A.12) and (A.13) we see that \(W ( {\check{T}}^k {\bar{\psi }} )\) can be constant only provided \(\min _{x \in {\mathbb {X}}} {\check{T}}^k {\bar{\psi }} (x) - T {\check{T}}^k {\bar{\psi }}(x)\) and \(\max _{x \in {\mathbb {X}}} {\check{T}}^k {\bar{\psi }} (x) - T {\check{T}}^k {\bar{\psi }}(x)\) are constant with respect to k. By Corollary A.26, the sequence \({\check{T}}^k {\bar{\psi }}\) is bounded and converges monotonically to a lower semi-continuous limit. Notice that, by invariance of \(\omega (\psi )\) and the fact that all elements of \(\omega (\psi )\) fulfill inequality (5.2) follows equicontinuity of \({\check{T}}^k {\bar{\psi }}\), hence the limit \(\psi _{\infty } (x):= \lim _{k \rightarrow + \infty } {\check{T}}^k {\bar{\psi }} (x)\) not only exists (as previously established), but is also continuous and, by Dini’s Theorem, convergence is uniform in \({\mathbb {X}}\). By continuity of the \({\check{T}}\) operator with respect to uniform convergence, \(\psi _{\infty }(x)\) is a solution of the shifted Bellman Equation and \(0=d( \psi _{\infty }, T \psi _{\infty })=d( {\bar{\psi }}, T {\bar{\psi }} )\). This shows that any element of \(\omega (\psi )\) is an solution of the shifted Bellman Equation. We only need to show that \(\omega (\psi )\) is a singleton. This follows because of Lemma A.7. Indeed, the distance to any element \({\bar{\psi }}\) of \(\omega (\psi )\) is non increasing along the iteration \({\check{T}}^k \psi \). Since such distance is converging to 0 along some subsequence \({\check{T}}^{k_n} \psi \), then it is converging to 0 along the sequence \({\check{T}}^k \psi \) itself. \(\square \)
In the remainder of this section we derive a sufficient condition for Assumption 5.2, which is based on a controllability condition.
Definition 5.6
Given a system as in (2.1) and the associated state and input constraint sets \({\mathbb {X}}\) and \({\mathbb {U}}(x)\), we say that the system fulfills Uniform Incremental Continuous Controllability, if there exists \(N \in {\mathbb {N}}\), and a class \({\mathcal {K}}_{\infty }\) function \(\delta \), such that, for all \(x_1, x_2 \in {\mathbb {X}}\), and for all \({{\textbf {u}}}_1 \in {\mathbb {U}}_N (x_1)\), there exists \({{\textbf {u}}}_2 \in {\mathbb {U}}_N (x_2)\) such that \(\phi (N,x_1,{{\textbf {u}}}_1) = \phi (N,x_2,{{\textbf {u}}}_2)\), and in addition: \(\Vert {{\textbf {u}}}_1 - {{\textbf {u}}}_2 \Vert \le \delta (|x_1-x_2|)\).
A milder controllability assumption can be formulated by considering continuity with respect to the cost alone, rather than the control input. To this end, let \(J_N(x, {{\textbf {u}}})\), for \(x \in {\mathbb {X}}\) and \({{\textbf {u}}} \in {\mathbb {U}}_N (x)\) denote the following:
Definition 5.7
Given a system as in (2.1) and the associated state and input constraint sets \({\mathbb {X}}\) and \({\mathbb {U}}(x)\), we say that the system fulfills Uniform Incremental Controllability Continuous in Cost, if there exists \(N \in {\mathbb {N}}\), and a class \({\mathcal {K}}_{\infty }\) function \(\delta \), such that, for all \(x_1, x_2 \in {\mathbb {X}}\), and for all \({{\textbf {u}}}_1 \in {\mathbb {U}}_N (x_1)\), there exists \({{\textbf {u}}}_2 \in {\mathbb {U}}_N (x_2)\) such that \(\phi (N,x_1,{{\textbf {u}}}_1) = \phi (N,x_2,{{\textbf {u}}}_2)\), and in addition: \(| J_N (x_1,{{\textbf {u}}}_1) - J_N(x_2,{{\textbf {u}}}_2) | \le \delta (|x_1-x_2|)\).
Remark 5.8
Notice that Uniform Incremental Continuous Controllability implies Uniform Incremental Controllability Continuous in Cost. This is because the considered stage-cost function and the dynamics are both continuous, moreover cost is considered only over a finite interval of length N. The converse implication is not true in general.
The following proposition now shows that Uniform Incremental Controllability Continuous in Cost implies the equicontinuity in Assumption 5.2 required in Theorem 5.4.
Proposition 5.9
Assume that system (2.1) fulfills the controllability assumption in Definition 5.7. Then, for any continuous function \(\psi : {\mathbb {X}} \rightarrow {\mathbb {R}}\), the sequence \(\{ T^k \psi \}_{k=0}^{+ \infty }\) is equicontinuous, i.e., Assumption 5.2 is fulfilled.
Proof
Consider any \(k\in {\mathbb {N}}\), and arbitrary \(x_1,x_2 \in {\mathbb {X}}\). Let \({{\textbf {u}}}^*_1 \in {\mathbb {U}}_{k+N}(x_1)\) be any optimal control sequence corresponding to the optimal control problem with terminal penalty function \(\psi \) and horizon \(k+N\), with initial condition \(x_1\). Then, from the optimality principle:
Let now, \({{\textbf {u}}}_2\) be as in Definition 5.7. Clearly, applying \({{\textbf {u}}}_2\) is, in general, suboptimal from initial condition \(x_2\). Hence, the inequality below holds:
Combining equations (5.3) and (5.4) yields:
where the first equality follows because \(\phi (N,x_1,{{\textbf {u}}}^*_1) = \phi (N,x_2,{{\textbf {u}}}_2)\), and the last inequality from Definition 5.7. Symmetric inequalities can be obtained swapping \(x_1\) and \(x_2\), yielding \(|T^{k+N} \psi (x_1) - T^{k+N} \psi (x_2)| \le \delta (|x_1-x_2|)\). This shows that equicontinuity holds on the tail of the sequence \(T^k \psi \). However, \(\{ T^k \psi \}_{k=0}^{N-1}\) is a finite family of continuous functions defined over a compact set (thus also fulfilling an equicontinuity property), and therefore equicontinuity of the whole sequence follows. \(\square \)
Unfortunately, due to the lack of a counterpart of Lemma 5.3, we currently do not have a controllability condition for ensuring the equicontinuity needed in Theorem 5.5 for the \({\check{T}}\) operator.
Remark 5.10
The operators \({\hat{T}}\) and \({\check{T}}\) are the main novel technical tool that we propose to compute solutions of shifted Bellman Equations and solve infinite horizon control problems. Notice that they come with different a priori guarantees of continuity of the limiting solution, i.e., upper semi-continuous and lower semi-continuous respectively. As regularity of solutions of the Bellman Equation is normally not available a priori, it is hard to state a criterion for choosing one or the other. Nevertheless, it is worth pointing out that, due to the fact that such operators share Lyapunov functionals (i.e., \(W (\psi ) = d(\psi , T\psi )\), see Lemmas A.8 and A.11, or Lemmas A.6 and A.7), one could also apply them in switched combinations or in random order while typically still retaining (or even improving) convergence. This topic is outside of the scope of the present manuscript
6 Convergence Analysis Without Continuity
In this section we provide a convergence result for the iteration using the \({{\hat{T}}}\) operator without assuming any continuity. This is possible if we assume a dissipativity condition and start the iteration from the negative storage function. The result can thus be seen as an extension of Proposition 2.2 to the shifted Bellman Equation with nontrivial shift \({b}\ne 0\). The only property we impose in this section on the iterates \(\psi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) is that they attain a minimum on \({\mathbb {X}}\). Then we can define
We note that \({[\psi ]_n}\ge 0\) and \(\min _{x\in {\mathbb {X}}} {[\psi ]_n}(x)=0\) as well as \([\psi +{b}]_n = [\psi ]_n\) for all \({b} \in {\mathbb {R}}\) and start our analysis with a little auxiliary lemma.
Lemma 6.1
For any \({b}\in {\mathbb {R}}\) it holds that
Proof
We have that
This implies the assertion since \([{{\hat{T}}} \psi +b]_n = [{{\hat{T}}}\psi ]_n\) for all \(b\in {\mathbb {R}}\). A similar computation works for T in place of \({{\hat{T}}}\). \(\square \)
We now first consider the case where \(\ell \ge 0\). To this end, we make the following assumption.
Assumption 6.2
There exists a nonempty set \(N\subset {\mathbb {X}}\) such that for any \(\psi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) with \(\psi \ge 0\) and \(\psi |_N\equiv 0\) we have that \(T\psi |_N\equiv 0\).
We denote by \(\psi |_N\) the restriction of function \(\psi \) over the set N, and by \(\equiv 0\) the fact that the function is identically 0 on its domain. We note that this assumption is satisfied for instance if \(\ell \ge 0\) and there is an equilibrium \((x^e,u^e)\) with \(\ell (x^e,u^e)=0\). Then one can choose \(N=\{x^e\}\).
Lemma 6.3
Assume \(\ell \ge 0\) and let Assumption 6.2 hold for some set N. Then for \(\psi ^0\equiv 0\) the sequence of functions \(\psi ^k:= [{{\hat{T}}}^k \psi ^0]_n\), \(k\in {\mathbb {N}}\), satisfies the following properties for all \(k\in {\mathbb {N}}\):
Proof
By applying Lemma 6.1 inductively we see that \(\psi ^{k+1} = [{{\hat{T}}} \psi ^k]_n\). Moreover, we observe for all \(\psi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) the equality
Now we prove (a)–(d) by induction over k.
For \(k=0\), (a) and (c) hold trivially, while (b) and (d) hold because \(\psi ^0\equiv 0\) and \(T\psi ^0\ge 0\) (since \(\ell \ge 0\)) and \(\psi ^1\ge 0\) (by definition of the \([\cdot ]_n\) operator).
For \(k\rightarrow k+1\), assume that (a), (b), and (c) hold for \(\psi ^k\). We now prove these three properties for \(\psi ^{k+1}\) and start with (c). By the above computation it holds that
By induction assumption (b) we have that \(T\psi ^k\ge \psi ^k\) implying that \(c(\psi ^k,T\psi ^k)\le 0\) and thus \(\psi ^k-c(\psi ^k,T\psi ^k)\ge 0\). Since \(\ell \ge 0\) and \(\psi ^k\ge 0\) we moreover have \(T\psi ^k\ge 0\). By induction assumption (c) we know that \(\psi ^k|_N\equiv 0\). Thus, Assumption 6.2 yields \(T\psi ^k|_N\equiv 0\). Together this implies that \(\min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\}\ge 0\) and is equal to 0 on N. This implies that
and thus \(\psi ^{k+1}|_N\equiv 0\), i.e., (c) for \(k+1\).
Next we prove (b) for \(k+1\). Using (6.1) as well as the min commutativity and the translation invariance of T we obtain
Now using the induction assumption for (b) and the monotonicity of T we obtain \(T\psi ^k\ge \psi ^k\) and \(TT\psi ^k \ge T\psi ^k\), implying, using (6.1) once more
This shows (b) for \(k+1\). From the induction assumption (a) and (b) and monotonicity of T we obtain
which shows (a) for \(k+1\).
Finally, for showing (d), we use that the induction assumption for (b) yields \(c(\psi ^k,T\psi ^k)\le 0\) and \(T\psi ^k\ge \psi ^k\). Together with (6.1) we obtain
Proposition 6.4
Assume \(\ell \ge 0\), let Assumption 6.2 hold and assume that \(V_\infty ^{\psi ^0}\) is finite for \(\psi ^0\equiv 0\). Then the sequence of functions \(\psi ^k={[{{\hat{T}}}^k \psi ^0]_n}\), \(k\in {\mathbb {N}}\), converges to \(V_\infty ^{\psi ^0}\), i.e., in particular to a solution of the Bellman Equation.
Proof
From Lemma 6.3 it follows that \(\psi ^k\) is increasing and bounded from above by \(V_\infty ^{\psi ^0}\). Hence, it converges to some limit function \(\psi ^\infty \le V_\infty ^{\psi ^0}\). Now from \(T\psi ^k\ge \psi ^k\) we obtain that
implying that
Since \(T\psi ^k\ge \psi ^k\) we moreover obtain that \(T\psi ^k\ge \frac{1}{2} (\psi ^k(x) + T\psi ^k(x))\). Inserting these inequalities into (6.1) then yields
and using this inequality and \(T(\psi _1/2 + \psi _2/2) \ge (T\psi _1)/2 + (T\psi _2)/2\) yields
which by induction yields the general formula
Since \(\sum _{l=0}^{k} \left( {\begin{array}{c}k\\ l\end{array}}\right) =2^k\) grows exponentially in k while for each fixed \(p\in {\mathbb {N}}\) the sum \(\sum _{l=0}^{p-1} \left( {\begin{array}{c}k\\ l\end{array}}\right) \) grows only polynomially in k, we have that
as \(k\rightarrow \infty \). Combining this with \(T^q\psi ^0\ge T^p\psi ^0\ge 0\) for \(q\ge p\ge 0\), we obtain that for each \(C\in (0,1)\) and \(p\in {\mathbb {N}}\) there is \(k_{C,p}\in {\mathbb {N}}\) with
for all \(k\ge k_{C,p}\). This implies that
for any \(C\in (0,1)\). Since C can be chosen arbitrarily close to 1, this implies \(\psi ^\infty \ge V_\infty ^{\psi ^0}\), which finishes the proof. \(\square \)
Now we extend our results to dissipative stage costs. The dissipativity inequality here is similar to (2.9), where we explicitly include a shift of the cost function by b in the inequality.
Assumption 6.5
There exists a continuous storage function \(\lambda :{\mathbb {X}}\rightarrow {\mathbb {R}}\) and a value \({b}\in {\mathbb {R}}\) such that
For such a function \(\lambda \), similar to (4.4) we define the rotated cost
and the corresponding operators \(\widetilde{T}\) and \(\hat{{\widetilde{T}}}\). The next lemma extends Lemma 4.3.
Lemma 6.6
For any continuous function \(\lambda :{\mathbb {X}}\rightarrow {\mathbb {R}}\) and for all \(k\in {\mathbb {N}}\) the identities
hold.
Proof
The first identity follows with an analogous proof as for Lemma 4.3 followed by induction over k. For the second identity we compute
From this, the statement for \(\hat{{{\widetilde{T}}}^k}\psi \) follows by induction over k. \(\square \)
Assumption 6.7
There exists a nonempty set \(N\subset {\mathbb {X}}\) such that for any \(\psi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) with \(\psi \ge - \lambda \) and \(\psi (x) = - \lambda (x)\) for all \(x\in N\) we have that \(T\psi (x) = c - \lambda (x)\) for all \(x\in N\).
Somewhat similar to Assumption 6.2, for dissipative optimal control problems Assumption 6.7 holds with \(N=\{x^e\}\) for an equilibrium \((x^e,u^e)\) with \(\ell (x^e,u^e)=c\). This is because dissipativity implies \({{\tilde{\ell }}} \ge 0\) and \({{\tilde{\ell }}}(x^e,u^e)=0\). Together this yields for all \(u\in {\mathbb {U}}(x^e)\) that
while for \(u=u^e\) we get
implying that this is the minimum and hence \(T\psi (x^e) = c - \lambda (x^e)\). The situation just described in particular occurs for strictly dissipative problems, cf. eq. (4.1).
Theorem 6.8
Assume that the optimal control problem is dissipative in the sense of Assumption 6.5, that Assumption 6.7 holds and that there is \(M>0\) with \(T^k(\psi ^0)\le M + ck\) for all \(k\in {\mathbb {N}}\) and \(\psi ^0 = -\lambda \). Then the sequence of functions \(\psi ^k={[{{\hat{T}}}^k \psi ^0]_n}\), \(k\in {\mathbb {N}}\), converges to a solution of the shifted Bellman Equation.
Proof
The assumptions together with Lemma 6.6 imply that the operator \(\hat{{\widetilde{T}}}\) corresponding to the cost \({{\tilde{\ell }}}\) from (6.3) satisfies all assumptions of Proposition 6.4. Hence, for \({{\tilde{\psi }}}^0\equiv 0\) the sequence \({{{\tilde{\psi }}}^k=[\hat{{{\widetilde{T}}}^k}{{\tilde{\psi }}}^0]_n}\) converges to a solution \({{\tilde{\psi }}}^\infty \) of the Bellman Equation for \(\tilde{\ell }\), i.e., \(\widetilde{T}{{\tilde{\psi }}}^\infty ={{\tilde{\psi }}}^\infty \). Because of Lemma 6.6 and using that \({[\psi +\phi ]_n = [[\psi ]_n+\phi ]_n}\) for all functions \(\psi ,\phi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) attaining a minimum on \({\mathbb {X}}\) we obtain that
implying that
From this we get, again using Lemma 6.6 and \(w:={[{{\tilde{\psi }}}^\infty - \lambda ]_n}-{{\tilde{\psi }}}^\infty {+} \lambda \),
This finishes the proof. \(\square \)
7 Examples and Counterexamples
In this section we illustrate the performance of the iterations proposed and discussed in this paper with various examples.
7.1 Comparison of Solution Methods
The examples in Sect. 7.1 are meant to illustrate different approaches for the formulation and solution of infinite horizon optimal control problems using dynamic programming. In particular, they emphasize the need for a terminal penalty function and highlight the benefits of using the \({\hat{T}}\) and \({\check{T}}\) operators for their solution.
7.1.1 Need for Terminal Penalty Function
We consider the following scalar linear system:
along with state x taking values in \({\mathbb {X}} = [-2,2]\), and input constraints \({\mathbb {U}} (x) = [-2+x,2+x]\). The stage cost is piecewise linear and defined as:
Notice that the state-dependent part of the cost has two local minima, at x equal \(-1\) and \(+1\). Moreover, for \(u=0\) solutions are 2-periodic and fulfill \(x(t)= (-1)^t x(0)\). It is possible to show that the optimal average cost is 0, achieved by the solution \(x(t) = (-1)^t\) corresponding to \(u(t)=0\). We show that using \(\psi =0\) does not lead to a convergent sequence of cost-to-go functions, see Fig. 1. In particular, \(T^k \psi \) converges to a period 2 oscillation between two distinct piecewise linear functions after 2 iterations. Accordingly the optimal state-feedback (which is bang-bang) does not converge and will differ at least in some regions of state-space depending on whether an horizon of odd or even length is considered.
In order to obtain meaningful infinite horizon costs and feedback policies we need to use a suitable penalty function for the final state, as shown in Proposition 2.2, in particular by letting \(\psi =- \lambda \) where \(\lambda \) is a storage function. The intuition is that letting \(\tau \) go to infinity without terminal penalty function is akin to minimising costs up to time \(\tau \) assuming that time stops afterwards and therefore the final state reached does not carry any value. This is, generally speaking, unsuitable as our optimal costs might be affected by boundary effects that propagate all the way back to decisions taken at time 0, as in the current example, where different decisions are optimal considering an even or odd time horizon. The flexibility afforded by a terminal cost (which of course never materializes in the case of infinite horizon), is to allow costs to converge to a steady-state expression and decisions to asymptotically converge to a time-invariant policy. For the considered example one can show that the function:
is a storage function. Figure 2(left) shows that the iteration initialized with \(\psi ={-} \lambda _1\) converges. Notice that the cost monotonically converges in 3 steps to its infinite horizon value. It is well known that storage functions need not be unique. For instance the following function is another storage function:
Our results show that any storage function can be used in order to define a suitable infinite horizon cost, provided this exists finite. We show in Fig. 2(right) how choosing a different penalty function \(\psi =-\lambda _2\) still leads, for this particular example, to the same infinite horizon cost, with convergence in just one time step. It is important to emphasize that convergence of the cost iteration to a steady state solution does not translate to “optimal trajectories being or converging to an equilibrium”. In fact, in this example, optimal trajectories converge to period 2 solutions of the type previously described. On the other hand, convergence of the iteration to a steady state allows to extract a time-invariant optimal policy in the form of state-feedback. Solutions achieving the minimum of the Bellman operator are optimal in many respects, as clarified in Proposition 3.5. In this particular case, since convergence of the iteration to its steady-state solution is achieved in a finite number of steps (rather than asymptotically) it is possible to show that such solutions are also optimal with respect to infinite horizon costs with terminal penalty function \(-\lambda _1\) and \(- \lambda _2\) respectively.
7.1.2 Solution with \({\hat{T}}\) Operator
We consider below the same system and constraints as in the previous example, namely
along with state x taking values in \({\mathbb {X}} = [-2,2]\), and input constraints \({\mathbb {U}} (x) = [-2+x,2+x]\). The stage cost is merely a shifted version of the previous piecewise linear cost:
Example 7.1.1 was crafted to have optimal average cost equal to 0 so that iteration convergence could be achieved just by choosing a suitable terminal cost function. When formulating a generic optimal control problem one cannot expect to have 0 optimal average cost, except for very special cases. Introducing a shifted version of the cost is meant to highlight the power of the approach, which is genuinely “shift-invariant” and does not rely on a priori knowledge of the optimal average cost. Therefore, rather then applying ad hoc considerations trying to figure out the optimal average performance (which in this case is \(-7/2\)) and correspondingly shifting \(\ell \) in order to make the problem into its previous version with optimal 0 average, we directly apply the operator \({\hat{T}}\) to an arbitrary initialization \(\psi (x)=0\). We show in Fig. 3, the resulting non-increasing sequence of functions \({\hat{T}}^k \psi \), and the corresponding limit, which is a solution of the shifted Bellman Equation. The value of shift applied \(c ({\hat{T}}^k \psi ,T {\hat{T}}^k \psi )\) is displayed in Fig. 4. Notice that the shifts converge to 7/2, which is indeed the positive translation needed in order to compensate for the optimal infinite horizon average performance of \(-7/2\). To highlight the power of the \({\hat{T}}\) iteration, which simultaneously adjusts to the right value of shift and asymptotic cost, we show in Fig. 5 its evolution for a different initialisation \(\psi (x)=- \sin (x)\).
7.1.3 Solution with \({\check{T}}\) Operator
We provide next numerical evidence of convergence using the \({\check{T}}\) operator in Fig. 6 (left). It is also interesting to remark that both \({\hat{T}}\) and \({\check{T}}\) operators show robustness with respect to the definition of the shift term \(c( \psi , T \psi )\). Specifically, any strict convex combination (\(\alpha \in (0,1)\) ):
yields convergence, although at possibly different speed. To this end we show the iteration corresponding to \(\alpha =3/4\) in Fig. 6 (right).
7.2 Non Uniqueness of Optimal Solutions
The following examples illustrate non-uniqueness phenomena arising when dealing with infinite horizon control problems. In particular, they emphasize non uniqueness of the fixed-points of the Bellman Equation and/or of the associated optimal feedback policies.
7.2.1 Example With Multiple Solutions of the Bellman Equation
Consider the scalar linear system:
along with the state constraint: \({\mathbb {X}} = [-2,2]\) and input constraints \({\mathbb {U}} (x) = [-2+x,2 + x]\). We consider a piecewise linear stage cost defined as:
for some constant \(\varepsilon \) which will need to be sufficiently small. Any function \(\psi (x) = \alpha |x| + \varepsilon x/2\) is a solution of the (shifted) Bellman Equation, as long as \(0 \le \alpha < 1- \varepsilon \). In fact:
We notice that if \(0 \le \alpha <1- \varepsilon \) then the optimal value is achieved for \(u=0\), since the slope of the absolute value of |u| dominates the slope of the other terms. In particular, substituting \(u=0\) yields \(T \psi (x) = \alpha |x| + \varepsilon x/2\). Hence there are infinitely many (even continuous) solutions to the shifted Bellman Equation (3.6) (although the associated optimal feedback policies happen to be the same). We remark that because of Theorem 4.4 this implies that the problem is not strictly dissipative. Applying the \({\check{T}}\) iteration from the \(\psi \equiv 0\) initial condition and converges in one step to a fixed point of the type previously considered, in particular for \(\alpha =\epsilon =0.1\). The final fixed point depends in a non-trivial way from the initialisation. For instance, applying the \({\hat{T}}\) iteration from some non-monotone function \(\psi (x)=-\sin (x) + 2 \cos (3x)\) converges to a fixed point of different shape, hinting at some even stronger variety of possible steady-state solutions, see Fig. 7.
7.2.2 Example with Multiple Optimal Feedback Policies
We consider the following scalar linear system:
along with the state constraint \({\mathbb {X}} = [-1,1]\) and input constraints \({\mathbb {U}} (x) =[-1-x,1-x]\). We consider a piecewise linear stage cost defined as:
Notice that, for each given \(x \in {\mathbb {X}}\), \(u=0\) minimizes the stage cost and makes x into an equilibrium for the system. Hence, maximizing |x| so as to minimize \(\ell \), the optimal average performance is achieved for the equilibrium solutions \(x= \pm 1\) provided a zero input is applied. Consider the following terminal penalty functions:
As seen in Fig. 8, the functions \(\psi _1\) and \(\psi _2\) assign different terminal costs to the two optimal equilibria. In particular \(\psi _1\) favours \(-1\), with 0 terminal cost, while \(\psi _2\) favours \(+1\).
Both functions fulfill the Bellman Equation. In fact:
which is achieved for \(u_1^*(x) = -1 -x\). Similarly one can show that \(u_2^*(x)=1-x\) achieves the optimum for \(\psi _2\) and that \(\psi _2\) is a solution of the Bellman Equation. Notice that:
is also a legitimate choice of terminal penalty function. In fact, this is the infimum element in \(\Psi \), and is therefore the terminal penalty function that corresponds to the cheapest infinite horizon transient cost. As shown in Proposition 4.1, feedback policies corresponding to different solutions of the shifted Bellman Equation, share the same infinite horizon average cost. Notice, in addition, that for any constants \(c_1\) and \(c_2\), the function:
solves the shifted Bellman Equation. In fact, in this case, it can be shown that every solution of the shifted Bellman Equation is of this form. This result is likely to admit an extension to more general control set-ups.
7.3 Regularity of Fixed-Points of Bellman Equation
The following examples are meant to illustrate potential discontinuity and unboundedness issues of the fixed-point of the (shifted) Bellman Equation.
7.3.1 Example With Lower Semi-Continuous Solution of the Bellman Equation
Consider the following bilinear scalar system:
with state taking values in \({\mathbb {X}} =[-2,2]\) and input constraints:
Let the stage cost be piecewise linear defined according to:
Notice that for \(u=0\) every point is an equilibrium. Hence, simply letting \(u=0\) whenever the initial condition is \(\le 0\) achieves the minimum average cost. If the initial condition is positive, the best control action is \(u=-1\). Indeed, an input \(u \le -1\) is needed in order to leave the set of positive states and enter the negative semi-axis, where the optimal average performance can be achieved. Hence, the best choice, given the penalty |u| on inputs, is to have \(u=-1\). Moreover, waiting to apply such a control action does not pay off as the same cost will need to be incurred at some point in the future in order to switch to negative states. The following function is a lower semi-continuous solution of the associated Bellman Equation:
which is achieved for the following control policy:
We show in Fig. 9, how the iterations of the operators \({\hat{T}}\) and \({\check{T}}\) behave when initialised from \(\psi (x)=0\).
It is worth pointing out that while both sequences seem to asymptotically approximate the correct ‘shape’ of infinite-horizon cost, the theory confirms that \({\hat{T}}^k \psi (x)\) cannot be bounded, since its pointwise limit is known to be at least upper semi-continuous, which is not the case for the solution in the considered example. Essentially two possibilities are left: numerical round-off errors make \({\hat{T}}\) converge to something that looks approximately correct or \({\hat{T}}^k \psi \) diverges to \(- \infty \), though very slowly.
7.3.2 Example With Unbounded Infinite Horizon Cost
Consider the following bilinear scalar system:
with state \(x \in [0,1]:= {\mathbb {X}}\) and \(u \in [1/2,1]\). Consider the stage cost:
We claim that the optimal average cost is 0. In fact, the control sequence \(u(t) = 1/2\) for \(t = 0 \ldots K-1\) and \(u(t) = 1\) for \(t >K\) yields: \(x(K) = x(0)/2^K\), and \(x(t)=x(K)\) for \(t \ge K\). Notice that \(\ell (x(t),u(t)) = |x(0)|/2^K <= 1/ 2^K\) for all \(t \ge K\). Hence the average cost can be made less or equal than \(1/2^K\) for any positive integer K, and this, together with the inequality \(\ell (x,u) \ge 0\), proves 0 optimal average cost. We show next that the optimal cost is unbounded.
By induction, \(x(k) = x(0) \prod _{t=0}^{k-1} u(t)\). For the infinite horizon cost to be bounded we need to find an input such that \(x(k) \rightarrow 0\) as \(k \rightarrow + \infty \). Hence, the input needs to fulfill \(\prod _{t=0}^{k-1} u(t) \rightarrow 0\). On the other hand:
and therefore, for the cost to be bounded we need:
as \(k \rightarrow + \infty \). However, on the interval [1/2, 1], concavity of the \(\log \) function yields:
Using the inequality above shows:
As a consequence, for the infinite horizon cost to be bounded we need:
as \(k \rightarrow + \infty \). This, however, contradicts boundedness of the cost as \(\ell (x,u) \ge 1 - u\).
It is worth pointing out that the optimal steady state for the considered example is \(x_s=0\) and \(u_s=1\). This steady state is not reachable in finite time, though. Notice also that this is trivially a dissipative system with storage function \(\lambda (x)=0\) due to the non-negativity of the cost. As a consequence no bounded solution of the shifted Bellman Equation exists.
7.3.3 Example With Continuous and Discontinuous Solutions
Consider the autonomous nonlinear system:
along with the cost functional \(\ell (x,u)=0\). Choose \({\mathbb {X}} = [-1,1]\) which is a forward invariant set for the dynamics, with 3 equilibria in \(-1,0\) and 1 respectively. The equilibrium in 0 is antistable, while the equilibria in \(\pm 1\) are asymptotically stable with basin of attraction (0, 1) and \((-1,0)\) respectively. Clearly, \(\psi (x) \equiv 0\) is a solution of the Bellman Equation. Any function of the form:
is also a solution. Consider next an arbitrary continuous increasing initialisation of \(\psi \) of the \({\hat{T}}\) and \({\check{T}}\) maps. It can be seen that \(T \psi \) is also increasing, as f(x) is such in the interval \([-1,1]\). As a consequence \({\hat{T}} \psi \) and \({\check{T}} \psi \) are also increasing. Moreover, \(T \psi (0) = \psi (0)\) and \(T \psi ( \pm 1) = \psi (\pm 1)\). Thus, \({\hat{T}} \psi (0) - {\hat{T}} \psi (-1) = \psi (0)- \psi (-1)\) and \({\hat{T}} \psi (1) - {\hat{T}} \psi (0) = \psi (1) - \psi (0)\). By induction then, \({\hat{T}}^k \psi (x)\) is increasing with respect to x for all k and so is \({\check{T}}^k \psi (x)\). It can be shown that for \(\psi (x)=x\) it holds \(c({\hat{T}}^k \psi ,T {\hat{T}}^k \psi )=0\) for all k. In particular, \({\hat{T}}^k \psi \) converges to:
Numerical simulations indeed confirm this claim, see Fig. 10. This shows that even if \({\hat{T}}\) (or \({\check{T}}\)) admit continuous fixed points, the iteration of \({\hat{T}}^k \psi \) does not necessarily converge to a solution of the Bellman Equation. Similarly, considering the iteration \({\check{T}}^k \psi \), for the same initial function \(\psi (x)=x\), it holds \(c({\check{T}}^k \psi , T {\check{T}}^k \psi )=0\) for all k and \({\check{T}}^k \psi \) converges to:
7.3.4 Example With Upper Semi-Continuous Solution
We slightly modify the previous example to include a scalar control input and induce an upper semi-continuous solution. Consider the nonlinear system:
with state-space \({\mathbb {X}}=[0,1]\), scalar input u constrained in \({\mathbb {U}}(x) = [0,1]\) along with the cost functional
Notice that:
Hence, the optimal average performance is 0, achieved for \(u(\cdot )=1\). The function \({\bar{\psi }}\) defined below:
is a solution of the Bellman Equation. To see this, notice, assuming \(x \ne 0\):
For \(x=0\), it is easy to verify \( T {\bar{\psi }} (0) = 0\). We show in Fig. 11 the iteration converging to \({\bar{\psi }}\). Notice that, despite \({\bar{\psi }}\) being upper semi-continuous, not admitting a minimum in [0, 1], and the discontinuity point \(x=0\) being reachable from all states in \({\mathbb {X}}\) within a single step, still the minimum in the definition of the operator \(T {\bar{\psi }}\) is achieved. More in general we see that the iteration \({\hat{T}}^k \psi \) converges, for \(x>0\), to \(\psi (1)-1 + x\).
We don’t have any examples of optimal control problems where the only solutions are upper semi-continuous (and not continuous), or where the minimum \(T {\bar{\psi }}\) is not achieved. It is worth pointing out that \({\bar{\psi }}(x)=x\) is also a solution of the Bellman Equation.
7.4 Complex Optimal Regime of Operation
We consider examples where the optimal average performance is not achieved at steady-state, but for more exotic type of behaviours. It is worth pointing out that dealing with a terminal penalty function allows to treat such examples without the need of an a priori known terminal absorbing state or terminal absorbing set. Moreover, the optimal regime of operation does not entail a constant (or zero) optimal stage cost in steady-state.
7.4.1 Example With Chaotic Optimal Regime
Consider the scalar nonlinear system:
with scalar state \(x \in {\mathbb {X}}:= [0,1]\) and input \(u \in {\mathbb {U}}: = [0,4]\). We consider the stage-cost:
Notice that \(\ell (x(k),u(k)) = x(k)^2 - x(k+1)^2 + |u(k)-18/5|\). Therefore, along arbitrary solutions we have:
In particular, computing asymptotic time averages we see:
The optimal average performance is therefore 0, and is achieved for instance, for any input u(k) converging to 18/5. Notice that for \(u=18/5\) the considered dynamical system is known to have chaotic solutions. Moreover \(u(k) = 18/5\) is potentially an optimal infinite horizon control policy. This policy corresponds to the solution \({\bar{\psi }} (x) = x^2\) of the Bellman Equation. Indeed,
Numerical solution using the \({\hat{T}}\) operator is shown in Fig. 12, starting from two distinct initializations, \(\psi (x) =0\) and \(\psi (x) = \sin (4x)\). The optimal average performance is correctly estimated to be 0 and \({\hat{T}}^k \psi \) converges to a shifted version of \(x^2\) in both cases. The numerical solution using the \({\check{T}}\) operator is slightly different and is shown in Fig. 13. While it is hard to write an explicit analytic solution of the limiting function, due to the presence of somewhat unexpected spikes, we believe that the numerical result hint at the presence of multiple solutions to corresponding the Bellman Equation. These solutions match \(x^2\) for most of the interval [0, 1] but appear to allow for piecewise linear spikes that might correspond to transient costs in regions which are not visited by the chaotic attractor. It seems more plausible that these be true solutions rather than artifacts due to numerical approximations. The optimal average performance is identified with very good precision in both cases. In particular, for the \({\hat{T}}\) iteration the error is lower than \(10^{-16}\). See Fig. 14 for the shift sequence achieved for the \({\check{T}}\) operator when \(\psi (x) \equiv 0\).
7.4.2 Two-Dimensional Example With Periodic Optimal Regime
We consider next the following two-dimensional linear system:
with state \(x \in {\mathbb {X}}:= [-1,1]^2\), and input \(u \in {\mathbb {U}} (x):= [-1-x_2,1-x_2]\). Consider the stage-cost
Notice that this cost is not positive definite. In particular, the optimal average performance can be expected to be negative, as the zero solution is feasible with zero input, yielding 0 average cost. However, the stage cost can be made negative for some values of \(x_1 \ne 0\). The zero-input responses of the system are (feasible) period 4 oscillations. Moreover the system is controllable, which guarantees an optimal average performance independent of the initial condition (and regardless of the adopted stage cost \(\ell (x,u)\)). We show in Fig. 15 a solution of the shifted Bellman Equation. The iterations resulting from the \({\hat{T}}\) operator and the \({\check{T}}\) operator starting in \(\psi ^0\equiv 0\) are shown in Fig. 16.
7.5 Inefficiency of Exponential Discounting Factors
We end our example section with an example of a discounted optimal control problem, which shows that ensuring well-posedness of infinite horizon optimal control problems by means of discounting can have unwanted side effects, making the proposed approach via the shifted Bellman Equation an attractive alternative. To this end, we consider a scalar infinite horizon linear quadratic optimal control problem with exponential discounting. In particular, the system’s dynamics are given as:
with x and u taking values in \({\mathbb {R}}\). The stage cost is:
Since this choice will not give rise to bounded costs over an infinite horizon we use a discounting factor \(\gamma \in (0,1)\):
The optimal infinite horizon cost fulfills the following Bellman Equation:
It is possible to show that this equation admits a solution:
where \(\alpha \), \(\beta \) and \(\delta \) fulfill the conditions:
The optimal feedback is affine in x and expressed as:
This feedback globally asymptotically stabilizes a unique equilibrium \(x_e(\gamma )\):
Notice that the optimal average performance is achieved at equilibrium, for \(x=1/2\) and \(u=1/2\), which yields \(V^{avg} = (1/2)^2 + (1/2)^2 = 1/2\). On the other hand, the equilibrium \(x_e (\gamma )\) only approaches the value 1/2 as \(\gamma \rightarrow 1\) (see Fig. 17). This shows that the long run average performance achieved by introducing a discounting factor is in general suboptimal. Moreover, the discounting factor introduces a non existent trade-off between optimising transient cost and steady-state (average) costs which persist for \(\gamma \) arbitrarily close to 1. This trade-off can be avoided by the approach pursued in this paper. On the other hand, any feedback \(u=k(x)\) (for instance affine, \(u= k_1 x + k_2\)) which stabilizes the equilibrium 1/2, clearly achieves optimal average performance (and is therefore optimal with respect to the cost functional \(J^{\text {avg}}\)), but, at the same time, it is not necessarily optimal from the point of view of transient costs. We refer to [28,29,30] for more examples of this kind and an in-depth study of the stability properties of discounted optimal equilibria. On a related note, [31] highlights how, by suitably adapting the stage cost, discounted and undiscounted formulations in Markov Decision Processes can yield the same optimal control law.
8 Conclusions and Outlook
Two novel recursion operators are proposed for the simultaneous computation of value functions and minimal average asymptotic cost in discrete-time infinite horizon optimal control problems. The recursive formulas can be readily applied when average asymptotic cost is independent of initial conditions, a situation referred to as the ergodic case in [22]. The approach renders dynamic programming techniques invariant with respect to additive constants on the stage cost, as it is naturally the case in the finite horizon case, for infinite horizon control problems. The recursions converge, under fairly relaxed technical assumptions, to solutions of a shifted Bellman Equation, whose shift value is not a priori determined but is asymptotically computed alongside the value function. The approach removes the need for absorbing states and zero cost conditions on the absorbing sets which have often hindered the applicability such techniques, or the need for discounting factors which introduce unnecessary trade offs between transient cost and asymptotic average performance. While the approach is developed for the case of deterministic systems only, its extension to stochastic settings appears of potential interest. Finally, this may serve as a first step in understanding the more general question of a shift-invariant approach to infinite horizon optimal control problems in the non-ergodic case, [22, 32].
Notes
References
Bellman, R.: The theory of dynamic programming. Bull. Am. Math. Soc. 60, 503–516 (1954)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I, 4th edn. Athena Scientific (2017)
Dimitri, P.: Bertsekas, Abstract Dynamic Programming, 2nd edn. Athena Scientific, USA (2018)
Carlson, D.A., Haurie, A.B., Leizarowitz, A.: Deterministic and Stochastic Systems, Infinite Horizon Optimal Control. Springer, Germany (1991)
Stokey, N.L., Lucas, R.E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, MA (1989)
Ljungqvist, L., Sargent, T.: Recursive Macroeconomic Theory, 3rd edn. MIT Press, USA (2012)
Bertsekas, D.P.: Dynamic Programming and Optimal Control: Approximate Dynamic Programming, vol. II, 4th edn. Athena Scientific, USA (2012)
Barto, A., Sutton, R.S.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, USA (2018)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: A brief survey. IEEE Signal Process. Magaz. 34(6), 26–38 (2017)
Willems, J.C.: Dissipative dynamical systems part I: General theory. Arch. Rational Mech. Anal. 45, 321–351 (1972)
Willems, J.C.: Dissipative dynamical systems part II: Linear systems with quadratic supply rates. Arch. Rational Mech. Anal. 45, 352–393 (1972)
Moylan, P.J., Anderson, B.D.O.: Nonlinear regulator theory and an inverse optimal control problem. IEEE Trans. Automat. Control 18, 460–465 (1973)
Grüne, L., Müller, M.A.: On the relation between strict dissipativity and turnpike properties. Syst. Control Lett. 90, 45–53 (2016)
Grüne, L.: Dissipativity and optimal control: Examining the turnpike phenomenon. IEEE Control Syst. Magaz. 42(2), 74–87 (2022)
Angeli, D., Amrit, R., Rawlings, J.B.: On average performance and stability of economic model predictive control. IEEE Trans. Automat. Control 57(7), 1615–1626 (2012)
Müller, M.A., Angeli, D., Allgöwer, F.: On necessity and robustness of dissipativity in economic model predictive control. IEEE Trans. Automat. Control 60(6), 1671–1676 (2015)
Finlay, L., Gaitsgory, V., Lebedev, I.: Duality in linear programming problems related to deterministic long run average problems of optimal control. SIAM J. Control Opt. 47(4), 1667–1700 (2008)
Gaitsgory, V., Parkinson, A., Shvartsman, I.: Linear programming formulations of deterministic infinite horizon optimal control problems in discrete time. Discrete Contin. Dyn. Syst. Ser B 22(10), 3821–3838 (2017)
Müller, M.A., Grüne, L.: Economic model predictive control without terminal constraints for optimal periodic behavior. Automatica 70, 128–139 (2016)
Müller, M.A.: Dissipativity in economic model predictive control: beyond steady-state optimality. In: Recent advances in model predictive control—theory, algorithms, and applications, volume 485 of Lect. Notes Control Inf. Sci., pages 27–43. Springer, Cham, (2021)
Lewis, M.E., Puterman, M.L.: Bias Optimality, pp. 89–111. Springer, US, Boston, MA (2002)
Borkar, V.S., Gaitsgory, V., Shvartsman, I.: Lp formulations of discrete time long-run average optimal control problems: The nonergodic case. SIAM J. Control Opt. 57(3), 1783–1817 (2019)
Alvarez, O., Bardi, M., Marchi, C.: Multiscale problems and homogenization for second-order Hamilton-Jacobi equations. J. Differ. Eqs. 243(2), 349–387 (2007)
Faulwasser, T., Zanon, M.: Asymptotic stability of economic nmpc: The importance of adjoints. IFAC-PapersOnLine 51(20), 157–168 (2018). (6th IFAC Conference on Nonlinear Model Predictive Control NMPC 2018)
Grüne, L., Pannek, J.: Nonlinear Model Predictive Control, Theory and Algorithms, 2nd edn. Springer-Verlag, London (2017)
Grüne, L.: Economic receding horizon control without terminal constraints. Automatica 49(3), 725–734 (2013)
Grüne, L., Kellett, C.M., Weller, S.R.: On the relation between turnpike properties for finite and infinite horizon optimal control problems. J. Optim. Theory Appl. 173(3), 727–745 (2017)
Gaitsgory, V., Grüne, L., Höger, M., Kellett, C.M., Weller, S.R.: Stabilization of strictly dissipative discrete time systems with discounted optimal control. Automatica 93, 311–320 (2018)
Grüne, L., Krügel, L.: Local turnpike analysis using local dissipativity for discrete time discounted optimal control. Appl. Math. Optim. 84(suppl. 2), S1585–S1606 (2021)
Zanon, M., Gros, S.: A new dissipativity condition for asymptotic stability of discounted economic mpc. Automatica 141, 110287 (2022)
Zanon, M., Gros, S., Palladino, M.: Stability-constrained markov decision processes using mpc. Automatica 143, 110399 (2022)
Borkar, V.S., Gaitsgory, V.: Linear programming formulation of long run average optimal control problem. J. Opt. Theo. Appl. 181, 101–125 (2019)
Funding
The authors declare that no funds, grants, or other support were received for the research presented in this manuscript. The authors have no relevant financial or non-financial interests to disclose.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix: Technical Results
In order to analyse the convergence properties of the newly introduced operators \({\check{T}}\) and \({\hat{T}}\) it is useful to explore inequalities involving the \(\max \) and \(\min \) operators applied to a finite set of functions. The next two lemmas provide such tools.
Lemma A.1
Let \(\psi _i \in {\mathcal {C}}( {\mathbb {X}} )\) for \(i=1,\ldots ,N\). Then, the following holds:
Proof
Let \(x^*\) in \({\mathbb {X}}\) be such that:
for some \({\bar{\imath }}\) in \(\{1, \ldots , N\}\). By monotonicity of the \(\min \) operator, we see that:
Combining the latter inequality with the previous equality yields:
\(\square \)
The following lemma provides a similar bound for the \(\min \) operator.
Lemma A.2
Let \(\psi _i\), be continuous functions of \(x \in {\mathbb {X}}\), for \(i=1,\ldots ,N\). Then the following holds:
Proof
Let \(x^*\) in \({\mathbb {X}}\) be such that:
for some \({\bar{\imath }}\) in \(\{1, \ldots , N\}\). By monotonicity of the \(\max \) operator, we see that:
Combining the above inequalities imply:
\(\square \)
Existence of solutions of the shifted Bellman Equation can be used to establish useful upper and lower bounds on the rate of growth of the \(T^k\) operator applied to any initial condition \(\psi \in {\mathcal {C}}({\mathbb {X}})\). This is stated in the following lemma.
Lemma A.3
Assume that there exists a continuous solution \({\bar{\psi }}\) to the shifted Bellman Equation, viz. \(T {\bar{\psi }} = {\bar{\psi }} + c\), for some \(c \in {\mathbb {R}}\). Then, for any positive integer k, and any function \(\psi \in {\mathcal {C}}(x)\), the following holds:
Proof
To see the first inequality, notice:
Hence, exploiting monotonicity of the \(\min \) operator we get:
The second inequality can be proved along similar lines. \(\square \)
A direct consequence of Lemma A.3 is that:
Moreover, we can state the following corollary:
Corollary A.4
Assume there exists a continuous solution \({\bar{\psi }}\) to the shifted Bellman Equation, viz. \(T {\bar{\psi }} = {\bar{\psi }} + c\), for some \(c \in {\mathbb {R}}\). Then, for any \(\psi \in {\mathcal {C}}( {\mathbb {X}} )\) the following holds:
Proof
The result follows dividing by k both sides of the inequalities in Lemma A.3, and taking the limit as \(k \rightarrow + \infty \). \(\square \)
Notice that, by construction, if the sequence \({\hat{T}}^k \psi \) is bounded it converges to an upper semi-continuous function. Analogously, if \({\check{T}}^k \psi \) is bounded it converges to a lower semi-continuous function. If a continuous solution of the shifted Bellman Equation exists, both iterations might be suitable for determining such function, however, if no continuous solution exists, then it is not a priori clear which operator might be most suitable for the analysis. In fact, solutions of the shifted Bellman Equation might be both upper or lower semi-continuous (or neither), despite the operator T being in principle defined only on lower semi-continuous functions.
The next lemma shows that iterates of the \({\hat{T}}\) operator have a bounded excursion between their maximum and minimum value, provided a continuous fixed-point of the Bellman Equation exists.
Lemma A.5
Assume that there exists a continuous solution to the shifted Bellman Equation, viz. \(T {\bar{\psi }} = {\bar{\psi }} + c\), for some \(c \in {\mathbb {R}}\). Then, the solution \({\hat{T}}^k \psi \) fulfills the bound:
Proof
To see this, notice that, by the \(\min \)-commutativity property, a simple induction argument shows, \({\hat{T}}^k \psi ( x) = \min _{h \in \{0, \ldots , k \} } T^h \psi (x) + c_h\), for suitable values of \(c_h \in {\mathbb {R}}\) and \(c_0=0\). By Lemma A.2 we have:
Canceling out the constant terms and exploiting Lemma A.3 yields:
This last inequality completes the proof of the lemma. \(\square \)
Our subsequent analysis will rely on a combination of monotonicity and Lyapunov-based arguments. To this end it is useful to show that \({\hat{T}}\) and \({\check{T}}\) operators yield non-increasing iterations according to suitable Lyapunov functionals. Exploiting Lemma A.2 yields the following:
Lemma A.6
Assume that there exists a continuous solution to the shifted Bellman Equation, viz. \(T {\bar{\psi }} = \bar{ \psi } + c\), for some \(c \in {\mathbb {R}}\). Define the Lyapunov functional:
Then, for any continuous \(\psi \) the following holds:
Proof
Let \(\psi \in {\mathcal {C}}( {\mathbb {X}})\) be arbitrary. The inequality can be derived as follows:
where the first inequality follows by Lemma A.2, and the second follows because \(d (T \psi ,T {\bar{\psi }} ) \le d ( \psi , {\bar{\psi }} )\). \(\square \)
Lemma A.7
Assume that there exists a continuous solution to the shifted Bellman Equation, viz. \(T {\bar{\psi }} = \bar{ \psi } + c\), for some \(c \in {\mathbb {R}}\). Define the Lyapunov functional:
Then, for any continuous \(\psi \) the following holds:
Proof
The inequality can be derived as follows:
where the first inequality follows by Lemma A.1, and the second follows because \(d (T \psi ,T {\bar{\psi }} ) \le d ( \psi , {\bar{\psi }} )\). \(\square \)
An alternative Lyapunov functional for the operator \({\hat{T}}\) can be stated as follows:
The following lemma proves that this is non increasing along iterations of \({\hat{T}}(\cdot )\).
Lemma A.8
Consider the function \(W(\psi )\) defined in (A.5). For any real valued continuous function \(\psi :{\mathbb {X}} \rightarrow {\mathbb {R}}\) the following holds:
Proof
To prove the lemma consider the following inequalities:
In addition, by definition of \({\hat{T}} \psi \), we see that:
By monotonicity and translation invariance, applying the T operator to all sides of the former inequality yields:
We are now ready to estimate \(W ({\hat{T}} \psi )\) by combining inequalities (A.6) and (A.7):
\(\square \)
Remark A.9
The same argument used to prove Lemma A.8 can be used to prove the following decoupled inequalities:
and:
Our analysis indicates that regardless of whether the sequence of functions \({\hat{T}}^k \psi (x)\) converges, the real-valued sequence of shifts applied, \(c( {\hat{T}}^k \psi , T {\hat{T}}^k \psi )\) is always bounded and convergent.
Lemma A.10
The sequence \(c ( {\hat{T}}^k \psi , T {\hat{T}}^k \psi )\) is bounded and convergent, viz. there exists \({\hat{c}}_{\infty } \in {\mathbb {R}}\) such that:
Proof
By induction, and Remark A.9 we have that the real-valued sequence: \(\max _{x \in {\mathbb {X}}} [{\hat{T}}^k \psi (x) - T {\hat{T}}^k \psi (x)] \) is monotonically non-increasing, (and bounded from below by \(\min _{x \in {\mathbb {X}} } [{\hat{T}}^k \psi (x) - T {\hat{T}}^k \psi (x) ]\) ). Similarly, \(\min _{x \in {\mathbb {X}} } [{\hat{T}}^k \psi (x) - T {\hat{T}}^k \psi (x)]\) is monotonically non-decreasing (and bounded from above by \(\max _{x \in {\mathbb {X}}} [{\hat{T}}^k \psi (x) - T {\hat{T}}^k \psi (x)] \)). Hence, both sequences admit a limit:
By definition of \(c({\hat{T}}^k \psi ,T {\hat{T}}^k \psi )\) we see that:
which completes the proof of the lemma. \(\square \)
We turn next to establishing similar inequalities for the \({\check{T}}\) operator.
Lemma A.11
Consider the function W defined in (A.5). For any real valued continuous function \(\psi :{\mathbb {X}} \rightarrow {\mathbb {R}}\) the following holds:
Proof
To see the inequality consider that we have:
In addition, by definition of \({\check{T}} \psi \), we see that:
By monotonicity and translation invariance, applying the T operator to all sides of the former inequality yields:
We are now ready to bound from above \(W ({\hat{T}} \psi )\) by combining inequalities (A.10) and (A.11):
\(\square \)
Remark A.12
The same argument used to prove Lemma A.11 can also be used to prove the following decoupled inequalities:
and:
A similar proof as in Lemma A.10 allows to conclude the following result:
Lemma A.13
The sequence \(c ( {\check{T}}^k \psi , T {\check{T}}^k \psi )\) is bounded and convergent, viz. there exists \({\check{c}}_{\infty } \in {\mathbb {R}}\) such that:
It seems important to relate the value of \({\hat{c}}_{\infty }\) and \({\check{c}}_{\infty }\) with the optimal average infinite horizon cost, viz. \(V^{avg}\). The following result shows that \(-{\hat{c}}_{\infty }\) is always an upper-bound to the optimal average cost.
Lemma A.14
Assume that a solution of the shifted Bellman Equation exists, viz. \(T {\bar{\psi }} = {\bar{\psi }} + c\) for some \(c \in {\mathbb {R}}\). Then, for any \(\psi \in {\mathcal {C}}({\mathbb {X}})\) it holds:
Proof
We argue by contradiction. Assume that \(c+ {\hat{c}}_{\infty } > 0\). Then, there exists \(\varepsilon >0\) and \(Q \in {\mathbb {N}}\) such that:
for all \(k \ge Q\). Moreover, there exists \(N \in {\mathbb {N}}\) such that for any \(x \in {\mathbb {X}}\)
We claim that, under such assumptions, \({\hat{T}}^k \psi (x)\) converges to a fixed-point within a finite number of iterations. In fact, for any \(m \ge N\) we see that:
where the last equality holds because for \(\tau \ge N\) application of Lemma A.3 and inequality (A.15) yields:
Hence \({\hat{T}}^{Q+N} \psi (x) = \lim _{k \rightarrow + \infty } {\hat{T}}^k \psi (x)\) where convergence is in a finite number of steps (uniform over \(x \in {\mathbb {X}}\)). Moreover,
Therefore, \({\hat{T}}^{Q+N} \psi (x)\) is a (continuous) fixed point of the \({\hat{T}}\) operator, and by virtue of Proposition 3.4 it is a solution of the shifted Bellman Equation for some \(c = - c ( {\hat{T}}^{Q+N} \psi , T {\hat{T}}^{Q+N} \psi )\). This implies \(c + {\hat{c}}_{\infty } =0\), which is a contradiction. \(\square \)
Whenever the sequence \({\hat{T}}^k \psi \) is pointwise convergent, one can show that also the converse inequality holds, and therefore \(-{\hat{c}}_{\infty }\) equals the optimal average performance. The next lemma is instrumental in deriving such result.
Lemma A.15
Let \(J_k (u): {\mathbb {U}} \rightarrow {\mathbb {R}}\) be a monotonically non-increasing sequence of continuous functions, converging pointwise to \({\hat{J}}(u)\), and let \({\mathbb {U}}\) be a compact set. Then the following holds:
Proof
Remark that the function \({\hat{J}}\) is upper semi-continuous, but not necessarily lower semi-continuous. Hence its minimum might, a priori, not be well-defined. By monotonicity of the minimum operator:
for all \(k \in {\mathbb {N}}\). Hence, the limit \(\lim _{k \rightarrow + \infty } \min _{u \in {\mathbb {U}} } J_k (u)\), exists. Moreover, by monotonicity of the \(\inf \) operator we see:
which holds for all \(k \in {\mathbb {N}}\). Letting k go to infinity in the previous inequality shows:
We need to show the converse inequality. To this end, denote by \(u_n\) any sequence in \({\mathbb {U}}\) such that \({\hat{J}} (u_n) - 2^{-n} \le \inf _{u \in {\mathbb {U}}} {\hat{J}}(u)\). Clearly, for any n, there exists \(k_n > n\) such that \(J_{k_n} (u_n) \le {\hat{J}} (u_n) + 2^{-n}\). Overall we see:
Letting n go to infinity in the previous inequality yields:
This completes the proof of the lemma. \(\square \)
It is sometimes useful to consider the extension of operator T to functions \(\psi \) bounded from below (and non-necessarily continuous). To this end, if \(\psi : {\mathbb {X}} \rightarrow {\mathbb {R}}\) is bounded from below, we denote by \(T \psi \) the following:
Lemma A.16
Assume that the function \({\hat{T}}^k \psi (x)\) converges pointwise to \({\hat{\psi }} (x)\), bounded from below. Then the following holds:
Proof
To prove the lemma notice that:
where the last equality follows by applying Lemma A.15 to the sequence of x-parameterized functions \(J_k(x,u):= \ell (x,u) + {\hat{T}}^k \psi (f(x,u))\). \(\square \)
We are now ready to prove the converse inequality.
Lemma A.17
Assume the sequence \({\hat{T}}^k \psi (x)\) to be pointwise convergent to some bounded function \({\hat{\psi }} (x)\). If a solution \({\bar{\psi }}\) of the shifted Bellman Equation exists, viz. \(T {\bar{\psi }} = {\bar{\psi }} + c\) for some \(c \in {\mathbb {R}}\) and some \({\bar{\psi }}: {\mathbb {X}} \rightarrow {\mathbb {R}}\), the following holds:
Proof
By Lemma A.16, we see that \({\hat{\psi }} (x) \le T {\hat{\psi }} + {\hat{c}}_{\infty }.\) Monotonicity of T together with shift-invariance yields, by induction for \(k \in {\mathbb {N}}\):
In particular then, for any continuous \(\psi \ge {\hat{\psi }}\):
Dividing both sides of the previous inequality by k and letting k tend to infinity yields:
\(\square \)
A similar analysis can be carried out with respect to the iteration \({\check{T}}^k \psi \) and the corresponding limiting value of the shift \({\check{c}}_{\infty }\). As a matter of fact, not all results extend along the same lines, due to the lack of formula (3.4). We first state the analogue of Lemma A.15.
Lemma A.18
Let \(J_k (u): {\mathbb {U}} \rightarrow {\mathbb {R}}\) be a monotonically non-decreasing sequence of (lower semi-)continuous functions, converging pointwise to \({\check{J}}(u)\), and let \({\mathbb {U}}\) be a compact set. Then the following holds:
Proof
Note that the function \({\check{J}}\) is lower semicontinuous, hence its minimum is well defined. By monotonicity of the minimum operator:
for all \(k \in {\mathbb {N}}\). Hence, the limit \(\lim _{k \rightarrow + \infty } \min _{u \in {\mathbb {U}} } J_k (u)\), exists. Moreover, again by monotonicity of the \(\min \) operator we see:
which holds for all \(k \in {\mathbb {N}}\). Letting k go to infinity in the previous inequality shows:
We need to show the converse inequality. To this end, denote by \(u_n\) any element of \({\mathbb {U}}\) such that \(J_n (u_n) = \min _{u \in {\mathbb {U}}} J_n(u)\). For any \(k \in {\mathbb {N}}\) and any \(n \ge k\) we see that \(J_n (u_n) \ge J_k (u_n)\). In particular, then:
for some limit point \(u^* \in {\mathbb {U}}\) of the sequence \(u_n\). Hence:
for all \(k \in {\mathbb {N}}\), and letting k go to infinity in the right hand side of the previous inequality yields:
This completes the proof of the lemma. \(\square \)
Corollary A.19
Assume that \({\check{T}}^k \psi (x)\) converges point-wise to a lower semi-continuous limit \({\check{\psi }} (x)\), for all \(x \in {\mathbb {X}}\). Applying Lemma A.18 to the x-parameterised sequence of cost functions:
admitting the limit:
with \(u \in {\mathbb {U}}(x)\), yields the following point-wise convergence result:
Lemma A.20
Assume that \({\check{T}}^k \psi (x)\) converges pointwise to a lower semi-continuous limit \({\check{\psi }} (x)\), for all \(x \in {\mathbb {X}}\). Then, \({\check{\psi }} (x)\) fulfills:
If in addition the limit \({\check{\psi }}(x)\) is continuous, then it is a solution of a shifted Bellman Equation.
Proof
For all \(k \in {\mathbb {N}}\) we see:
Hence, by Corollary A.19, letting \(k \rightarrow + \infty \) in the right-hand side of the latter inequality yields:
In addition, if \({\check{\psi }} \in {\mathcal {C}}( {\mathbb {X}})\) then, by Dini’s theorem, convergence is uniform and \({\check{\psi }}\) is a fixed point of \({\check{T}}\) by continuity of the \({\check{T}}\) operator in the topology of uniform convergence. \(\square \)
We are now ready to state the analogue of Lemma A.17.
Lemma A.21
Assume that the sequence \({\check{T}}^k \psi (x)\) be pointwise convergent to some bounded function \({\check{\psi }} (x)\). If a solution \({\bar{\psi }}\) of the shifted Bellman Equation exists, viz. \(T {\bar{\psi }} = {\bar{\psi }} + c\) for some \(c \in {\mathbb {R}}\), the following holds:
Proof
By Lemma A.20, we see that \(\check{ \psi } (x) \ge T {\check{\psi }} + {\check{c}}_{\infty }.\) Monotonicity of T together with shift-invariance yields, by induction for \(k \in {\mathbb {N}}\):
In particular then, for any continuous \(\psi \le {\hat{\psi }}\):
Dividing both sides of the previous inequality by k and letting k tend to infinity yields:
\(\square \)
A stronger claim can be achieved when the \({\hat{T}}^k \psi \) and \({\check{T}}^k \psi \) sequences admit a continous limit.
Lemma A.22
Let \(\psi (x)\) be a continuous function, and assume that \({\hat{T}}^{k} \psi (x)\) (or \({\check{T}}^k \psi (x) \)) converges point-wise to a continuous limit \({\hat{\psi }} (x)\) (\({\check{\psi }} (x)\), respectively ). Then, \({\hat{\psi }}\) (\({\check{\psi }}\), respectively) is a solution of the shifted Bellman Equation. (a similar argument holds for \({\check{T}}\)).
Proof
By construction \({\hat{T}}^k \psi \) is monotone non-increasing with respect to k. Hence, by Dini’s Theorem, convergence to \({\hat{\psi }}\) is uniform. The result follows by continuity of the T(. ) and c(., . ) operators with respect to the topology of uniform convergence. \(\square \)
The convergence properties of \({\hat{T}}^k \psi \) and \({\check{T}}^k \psi \) sequences will be established through a combination of Lasalle-style and monotonicity-based arguments. The following lemmas are crucial to understand the implication of certain Lyapunov functionals being constant along iterations of the \({\hat{T}}\) and \({\check{T}}\) maps.
Lemma A.23
Let \(\psi \) be a continuous function such that:
Then the following holds:
-
the sets achieving the minimum are nested:
$$\begin{aligned} \arg \min _{x \in {\mathbb {X}}} {\hat{T}} \psi (x) - T {\hat{T}} \psi (x) \subseteq \arg \min _{x \in {\mathbb {X}}} \psi (x) - T \psi (x) \end{aligned}$$ -
the operator \({\hat{T}}\) does not alter the value of \(\psi \) in the \(\arg \min \) set:
$$\begin{aligned} \psi (x_m) = {\hat{T}} \psi (x_m) \; \qquad \forall \, x_m \in \arg \min _{x \in {\mathbb {X}}} {\hat{T}} \psi (x) - T {\hat{T}} \psi (x) \end{aligned}$$ -
\(T {\hat{T}} \psi (x_m) = T\psi (x_m)\) for all \( x_m \in \arg \min _{x \in {\mathbb {X}}} {\hat{T}} \psi (x) - T {\hat{T}} \psi (x)\).
Proof
To prove the lemma notice that inequality (A.9) holds, and can be derived from the following inequalities: \(T {\hat{T}} \psi (x) \le T \psi (x)\) and \({\hat{T}} \psi (x) \ge T \psi (x) + c(\psi ,T\psi ) - d ( \psi ,T \psi )\). If (A.9) is an equality, both previous inequalities need to be fulfilled non strictly for any \(x_m \in \arg \min _{x \in {\mathbb {X}}} {\hat{T}} \psi (x) - T {\hat{T}} \psi (x) \). Hence, it holds:
and
Since \(d ( \psi ,T \psi ) \ge 0\), the first equality proves that \({\hat{T}} \psi (x_m) = \psi (x_m)\). Moreover, by assumption:
This shows that \(x_m \in \arg \min _{x \in {\mathbb {X}}} \psi (x) - T \psi (x).\) Since \(x_m\) was arbitrary to start with, inclusion of the \(\arg \min \) sets follows, which concludes the proof of the lemma. \(\square \)
Corollary A.24
Assume that a continuous solution \({\bar{\psi }}\) of the shifted Bellman Equation exists. If for some continuous \(\psi \) and all \(k \in {\mathbb {N}}\) it holds
then, \(\lim _{k \rightarrow + \infty } {\hat{T}}^k \psi \) exists and is an upper-semicontinuous function.
Proof
By virtue of Lemma A.23, if equation (A.17) holds there exists \(x_m \in {\mathbb {X}}\) such that \({\hat{T}}^k \psi (x_m) = \psi (x_m)\) for all \(k \in {\mathbb {N}}\). In particular,
which in combination with the inequality proved in Lemma A.5 and existence of a solution of the shifted Bellman Equation implies boundedness and pointwise convergence of the \({\hat{T}}\) iteration. Moreover, as \({\hat{T}}^k \psi \) is non-increasing the limiting function is upper-semicontinuous. \(\square \)
A symmetric argument can be used to establish the following lemma.
Lemma A.25
Let \(\psi \) be a continuous function such that:
Then the following holds:
-
the sets achieving the maximum are nested:
$$\begin{aligned} \arg \max _{x \in {\mathbb {X}}} {\check{T}} \psi (x) - T {\check{T}} \psi (x) \subseteq \arg \max _{x \in {\mathbb {X}}} \psi (x) - T \psi (x) \end{aligned}$$ -
the operator \({\check{T}}\) does not alter the value of \(\psi \) in the \(\arg \max \) set:
$$\begin{aligned} \psi (x_m) = {\check{T}} \psi (x_m) \; \qquad \forall \, x_m \in \arg \max _{x \in {\mathbb {X}}} {\check{T}} \psi (x) - T {\hat{T}} \psi (x) \end{aligned}$$ -
\(T {\check{T}} \psi (x_m) = T\psi (x_m)\) for all \( x_m \in \arg \max _{x \in {\mathbb {X}}} {\check{T}} \psi (x) - T {\check{T}} \psi (x)\).
A version of Corollary A.24 can be proved for the \({\check{T}}\) operator.
Corollary A.26
Assume that a continuous solution \({\bar{\psi }}\) of the shifted Bellman Equation exists. If for some continuous \(\psi \) and all \(k \in {\mathbb {N}}\) it holds
then, \(\lim _{k \rightarrow + \infty } {\check{T}}^k \psi \) exists and is a lower-semicontinuous function.
Proof
By virtue of Lemma A.25, if equation (A.18) holds there exists \(x_m \in {\mathbb {X}}\) such that \({\check{T}}^k \psi (x_m) = \psi (x_m)\) for all \(k \in {\mathbb {N}}\). In particular,
We show next that the sequence \({\check{T}}^k \psi \) is bounded from above:
where the last inequality follows by (A.20) and the former one by Lemma A.7. Hence, pointwise convergence of the \({\check{T}}^k \psi \) sequence to a lower semi-continuous function follows by boundedness and monotonicity (viz. by \({\check{T}}^k \psi \) being non-decreasing in k). \(\square \)
Appendix: Additional Proofs
1.1 Proof of Proposition 4.1
Let \(k \in {\mathbb {N}}\) be arbitrary. By induction it is possible to see that:
The claim is trivial for \(k=1\). Assume this holds for k, we will show it is true for \(k+1\):
A similar argument applies to \(\psi _2\). In particular then:
Moreover, we know that:
for all \(k \in {\mathbb {N}}\) and all \(x_0\). Assume that \(|\psi _1(x) - \psi _2 (x)| \le M\) for all \(x \in {\mathbb {X}}\), which is always fulfilled for sufficiently large M due to boundedness of \(\psi _1\) and \(\psi _2\). Then \(V_k^{\psi _2} (x_0) \le V_k^{\psi _1} (x_0) + M\) as it follows by remarking that the optimal solution relative to the terminal penalty \(\psi _1\) can be used as a feasible solution to estimate the optimal cost of the problem with terminal cost \(\psi _2\). A symmetric argument also yields \(V_{k}^{\psi _1} (x_0) \le V_k^{\psi _2} (x_0) +M\). This shows: \(| V_k^{\psi _1} (x_0) - V_k^{\psi _2} (x_0)| \le M\) for all k. We may then divide by k and let k go to infinity to realize:
which completes our proof. \(\square \)
1.2 Proof of Formula (3.4)
The formula is trivially fulfilled for \(k=0\), remarking that by definition \(\sum _{s \in \emptyset } (\cdot ) = 0\). In fact:
Arguing by induction, and assuming the formula true for an arbitrary value of k, we can derive it for \(k+1\) according to the following steps:
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Angeli, D., Grüne, L. Dissipativity in Infinite Horizon Optimal Control and Dynamic Programming. Appl Math Optim 89, 42 (2024). https://doi.org/10.1007/s00245-024-10103-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s00245-024-10103-y