1 Introduction

Dynamic programming (DP) is a cornerstone of control theory which allows to solve (in feedback form) optimal control problems formulated on horizons of increasing length through a suitable recursive formula for the computation of the so called value function, [1]

Remarkably, dynamic programming allows to study problems formulated both on a finite horizon or an infinite one, the latter achieved under suitable technical assumptions, by studying the asymptotic properties of the recursion or by computing its fixed points. By now, the subject of dynamic programming and infinite horizon optimal control has been studied in depth by many authors and several monographs on the subject exist both in the control domain [2,3,4] and in economics, [5, 6].

While, in its naive form, DP is often associated to the curse of dimensionality, which may hinder its applicability to scenarios of practical relevance, the topic of its approximate and efficient numerical treatment has also gathered significant impetus, in particular in the context of machine learning, [7]. Indeed, the dynamic programming or Bellman Equation is at the core of any (deep) reinforcement learning algorithm [8, 9].

The link of optimal control to dissipativity was already established by Willems in the seminal papers [10, 11] and in parallel in the study of nonlinear inverse optimal regulators for nonlinear systems, [12]. However, it was only brought to the forefront of the discourse on optimisation-based control in recent years, [13, 14], thanks to its surprising connections to closed-loop stability of Economic Model Predictive Control [15, 16] and long-run average optimal control, [17, 18]. In particular, [15] proposes a notion of optimal operation at steady-state and provides a sufficient conditions for this property to hold based on dissipativity of the associated systems’ dynamics with respect to a suitable supply function. The converse statement is investigated in [16] where an additional controllability assumption is needed in order to prove necessity of dissipativity. While generalizations of the above results, its relation to the so-called turnpike property, and extensions to periodic optimal solutions are provided in several subsequent works (i.e., [19] and [20]), the connection to Dynamic Programming and infinite horizon optimal control has remained elusive, due to restrictive technical assumptions needed to make sense of undiscounted cost functionals.

In this paper we further explore connections between dissipativity and infinite horizon optimal control problems, while proposing new formulations and iterative methods for their solutions that significantly expand the class of problems which can be meaningfully addressed by this approach. Our main contributions are

  • introducing a terminal penalty in infinite horizon optimal control, in the form of suitable storage functions with negative sign;

  • proposing a shifted Bellman Equation to be used in optimal control problems with non-zero (yet state-independent) optimal long run average performance (this includes systems with periodic, almost periodic or even chaotic regimes of operation allowing general time-varying asymptotic cost along optimal solutions);

  • proposing two novel recursions whose fixed points are solutions of shifted Bellman Equation (of any shift);

  • analysing the convergence properties of such recursions under fairly general technical assumptions, allowing simultaneous computation of the best average performance and of the associated value function;

  • tackling the non-existing trade-off between transient cost and asymptotic average performance.

The rest of the paper is organized as follows: Sect. 2 introduces the problem formulation, basic notations and some preliminary results, Sect. 3 introduces the shifted Bellman Equation and the novel recursion operators whose properties are investigated in Sect. 4. Section 5 provides a general convergence result under suitable conditions on the controllability of the system’s dynamics, while Sect. 6 relaxes some continuity assumptions needed for convergence analysis approaching the recursion from specific initialisations. Examples and counter-examples are shown in Sects. 7 and 8 draws some conclusions and points to further open research directions. Important intermediate technical results are collected in the appendix in Sect. A.

2 Problem Formulation and Preliminary Results

Consider the discrete-time finite dimensional nonlinear control system described by the following difference equations:

$$\begin{aligned} x (t+1) = f(x(t),u(t)) \end{aligned}$$
(2.1)

where \(x(t) \in {\mathbb {X}} \subset {\mathbb {R}}^n\) is the state-variable, taking values in some compact control invariant set \({\mathbb {X}}\), \(u(t) \in {\mathbb {R}}^m\) is the control input and \(f: {\mathbb {Z}} \rightarrow {\mathbb {X}}\), is the continuous transition map. We denote by \({\mathbb {U}}(\cdot ): {\mathbb {X}} \rightarrow 2^{{\mathbb {R}}^m} \) the upper semicontinuous set-valued mapping defined below:

$$\begin{aligned} {\mathbb {U}} (x):= \{ u \in {\mathbb {R}}^m: (x,u) \in {\mathbb {Z}} \}, \end{aligned}$$
(2.2)

which corresponds to the set of feasible control inputs in state x, given the compact state/input constraint set \({\mathbb {Z}}\). Moreover, we assume, without loss of generality,

$$\begin{aligned} f(x, {\mathbb {U}}(x) ) \subset {\mathbb {X}}, \end{aligned}$$
(2.3)

for all \(x \in {\mathbb {X}}\). For an input sequence \({{\textbf {u}}}= \{ u(t)\}_{t=0}^{\infty }\), we denote by \(\phi (t,x,{{\textbf {u}}})\) the state at time t, from initial condition \(x(0)=x\), as given by iteration (2.1). We also extend definition (2.2), to allow feasible control sequences of length \(\tau \), as follows:

$$\begin{aligned} {\mathbb {U}}_{\tau } (x):= & {} \{ {{\textbf {u}}}=\{ u(t) \}_{t=0}^{\tau -1} \in {\mathbb {R}}^{m \tau }: (\phi (t,x,{{\textbf {u}}}), u(t)) \in {\mathbb {Z}},\forall \, t \in \{0, \ldots , \tau -1\} \}.\nonumber \\ \end{aligned}$$
(2.4)

Our contribution is twofold; namely, to define optimal control problems over an infinite horizon within a significantly larger set of systems dynamics and associated cost functional than is currently possible to address by existing formulations, and, at the same time, to propose a dynamic programming approach for their solution. To this end we consider a continuous stage cost, \(\ell (x,u): {\mathbb {Z}} \rightarrow {\mathbb {R}}\), and formulate the following cost functional:

$$\begin{aligned} {J^{\psi }_{\tau } (x(\cdot ),u(\cdot ) ): = \sum _{t =0}^{\tau -1} \ell (x(t),u(t)) + \psi (x(\tau )) } \end{aligned}$$
(2.5)

where \(\psi : {\mathbb {X}} \rightarrow {\mathbb {R}}\) is a continuous function called the terminal cost. Terminal costs significantly affect the solution of an optimal control problem and a key insight of our paper will be providing guidelines for their selection in order to allow the formulation of infinite horizon optimal control problems. A finite horizon optimal control problem is then defined as follows:

$$\begin{aligned} \begin{array}{rl} V^{\psi }_{\tau } (x): = \min _{x(\cdot ), u( \cdot )} &{} J^{\psi }_{\tau } (x(\cdot ),u(\cdot )) \\ \text {subject to} &{} \\ x(0) &{} = x \\ x(t+1) &{} = f(x(t),u(t)) \qquad t \in \{0,1, \ldots , \tau -1 \} \\ (x(t),u(t))&{} \in {\mathbb {Z}} \qquad t \in \{0,1, \ldots , \tau -1 \} \\ x(\tau ) &{} \in {\mathbb {X}} \end{array} \end{aligned}$$
(2.6)

For each value of the initial condition \(x \in {\mathbb {X}}\), a solution of (2.6) is guaranteed to exist thanks to the compactness and non-emptiness properties of the feasible set, control invariance of \({\mathbb {X}}\), and continuity of the cost function.

On the other hand, when the control problem has no natural termination time, one might want to define an infinite horizon optimisation problem. This has often the additional appealing feature of being achieved through implementation of a time-invariant feedback policy. However, making sense of an infinite horizon formulation of (2.6) typically entails strong assumptions on the kind of system’s dynamics and cost functional that are allowed.

One strategy for avoiding such kind of limitations is, at least in practice, to introduce a discounting factor \(0< \gamma < 1\) in the cost function:

$$\begin{aligned} J_\gamma (x(\cdot ),u(\cdot ) ): = \sum _{t =0}^{\infty } \gamma ^t \ell (x(t),u(t)), \end{aligned}$$
(2.7)

which for \(\gamma \approx 1\) provides a good approximation to some form of infinite horizon (average) cost. While this approach has some appealing features, for instance making optimal solutions invariant with respect to translation of \(\ell \) by any finite constant value, having to settle on a specific value of \(\gamma \) less than unity is unsatisfactory as it always leaves open the question of how optimal control policies would be affected by variations in \(\gamma \), i.e., if higher values were to be considered. Moreover, as shown later in Sect. 7.5, adoption of a discounting factor may introduce non-existent trade-offs between optimisation of steady-state and transient costs.

An alternative approach is to resort to average, rather than summed costs:

$$\begin{aligned} J^{\text {avg}} (x(\cdot ),u(\cdot ) ): = \limsup _{\tau \rightarrow + \infty } \frac{ \sum _{t =0}^{\tau -1} \ell (x(t),u(t)) }{\tau }. \end{aligned}$$
(2.8)

Taking the average yields well-defined costs even when summed costs would be divergent to \(\pm \infty \), or are non-convergent (for instance oscillating), which constitute the main obstructions in the definition of infinite horizon control problems for general dynamics and costs. On the other hand, time-shift invariance of average costs along any solution, implies that this approach disregards transient costs, which therefore won’t be minimised and might be arbitrarily large even for optimal feedback policies (see again example in Sect. 7.5). It is worth pointing out that similar notions have also attracted considerable interest in the Markov Decision Process literature, where optimal average cost is usually referred to as bias optimality, in contrast to gain optimality which captures optimal transient behaviour (see for instance [21]).

Our proposed solution and novel contribution is to provide fairly general conditions on the terminal cost \(\psi \) to make sure that the functional:

$$\begin{aligned} V^{\psi }_{\infty } (x):= \lim _{\tau \rightarrow + \infty } V^{\psi }_{\tau } (x) \end{aligned}$$

is well-defined. To this end the notion of dissipativity will play an interesting role. This notion was originally introduced by Willems in [10, 11] and has recently received a surge in interest for its crucial role in the analysis of closed-loop Economic Model Predictive Control schemes [13,14,15,16]. In a nutshell a system as (2.1), is said to be dissipative with respect to the supply function \(\ell (x,u)\), if there exists a continuous storage function \(\lambda : {\mathbb {X}} \rightarrow {\mathbb {R}}\) such that:

$$\begin{aligned} \lambda ( f(x,u) ) \le \lambda (x) + \ell (x,u) \qquad \forall \, (x,u) \in {\mathbb {Z}}. \end{aligned}$$
(2.9)

This inequality is normally interpreted in “energetic” terms as enforcing, for a dissipative system, that energy stored within, at the next state, cannot exceed the energy at the current state plus the energy externally supplied through the supply function \(\ell (x,u)\). In the context of optimal control, where the objective is to minimize a cost functional, \(\lambda (x)\) can be interpreted as the value of the state x and the dissipation inequality guarantees that the gain in value for any feasible control action u and state x cannot exceed the corresponding stage cost. Notice that, while optimal control sequences over any finite control horizon (or over infinite control horizon with forgetting factor \(\gamma \)) are invariant with respect to cost translation, viz. \({\tilde{\ell }} (x,u ):= \ell (x,u) - c\) for any constant \(c \in {\mathbb {R}}\), dissipativity is not a shift-invariant property. In fact, it can always be guaranteed by a sufficiently negative value of c, given compactness of \({\mathbb {Z}}\). Trivially, if \({\tilde{\ell }} (x,u) \ge 0\) for all \((x,u) \in {\mathbb {X}}\), dissipativity is ensured just by defining \(\lambda (x)=0\) for all \(x \in {\mathbb {X}}\). Our first result is stated below.

Proposition 2.1

Assume that system (2.1) is dissipative with continuous storage function \(\lambda (\cdot )\) with respect to the supply \(\ell (x,u)\), and let \(\psi (x) = - \lambda (x)\), then the limit:

$$\begin{aligned} \lim _{\tau \rightarrow + \infty } V^{\psi }_{\tau } (x) \end{aligned}$$
(2.10)

exists for all \(x \in {\mathbb {X}}\), possibly assuming the value \(+\infty \).

Proof. Consider any feasible solution \(x^*_{\tau +1}\), \(u^*_{\tau +1}\) (with \(x^*(0) = x\)) which achieves the optimal cost \(V^{\psi }_{\tau +1} (x)\). By definition,

$$\begin{aligned} V^{\psi }_{\tau +1} (x)= & {} \sum _{t = 0}^{\tau } \ell ( x^*(t), u^*(t) ) - \lambda (x^*(\tau +1) ) \\= & {} \left( \sum _{t = 0}^{\tau -1} \ell ( x^*(t), u^*(t) ) \right) \\{} & {} + \ell (x^*(\tau ),u^*(\tau )) - \lambda (x^*(\tau +1) ) \\\ge & {} \left( \sum _{t = 0}^{\tau -1} \ell ( x^*(t), u^*(t) ) \right) - \lambda (x^*(\tau ) ) \ge V^{\psi }_{\tau } (x), \end{aligned}$$

where the first inequality holds by the dissipativity assumption, and the second because \(x^*, u^*\) is a feasible solution also over the shorter horizon \([0,\tau ]\). Hence, \(V^{\psi }_{\tau } (x)\) is monotone non-decreasing with respect to \(\tau \) and the limit (2.10) exists.

It is important to realise that Proposition 2.1 only guarantees existence of the limit, not actual boundedness of the cost \(V^{\psi }_{\infty } (x)\). In fact, typically the cost would be \(+ \infty \) unless a suitably shifted version of \(\ell (x,u)\) is considered. In particular, there is only a single value of this shift that might result in a finite cost. This can be found, by alternative means, looking for the optimal average cost,

$$\begin{aligned} \begin{array}{rl} V^{\text {avg}} (x) = \inf _{x(\cdot ), u ( \cdot ) } &{} J^{\text {avg}} (x(\cdot ),u(\cdot ) ) \\ \text {subject to} &{} \\ x(0) &{} = x \\ x(t+1) &{} = f(x(t),u(t)) \qquad t \in {\mathbb {N}} \\ (x(t),u(t)) &{} \in {\mathbb {Z}} \qquad t \in {\mathbb {N}}. \end{array} \end{aligned}$$
(2.11)

Under suitable technical conditions, for instance global controllability assumptions, the optimal cost is independent of x, and its value can be found [18, 22] by an infinite dimensional linear program, viz. by solving the following optimisation problem:

$$\begin{aligned} \begin{array}{l} V^{\text {avg}} = \sup _{\lambda (\cdot ) \in {\mathcal {C}} ({\mathbb {X}}) } \; c \\ \qquad \qquad \text {subject to} \\ \qquad \qquad \lambda (f(x,u)) \le \lambda (x) + \ell (x,u) - c \qquad \quad \forall \, (x,u) \in {\mathbb {Z}} \\ \end{array} \end{aligned}$$
(2.12)

where:

$$\begin{aligned} {\mathcal {C}} ( {\mathbb {X}} ):= \{ \lambda : {\mathbb {X}} \rightarrow {\mathbb {R}}: \lambda \text { is continuous } \}. \end{aligned}$$

We note that this approach has similarities to the effective Hamiltonian approach in continuous-time ergodic optimal control, see [23]. Dynamic programming allows to solve optimal control problems through iteration of a suitably defined operator, which computes the optimal cost for increasing values of the control horizon. To this end, for summed costs without exponential rescaling, the following Bellman operator is normally defined: \(T: {\mathcal {C}} ( {\mathbb {X}} ) \rightarrow {\mathcal {C}} ({\mathbb {X}} )\).

$$\begin{aligned} T \psi (x):= \min _{u \in {\mathbb {U}} (x)} \ell (x,u) + \psi ( f(x,u) ). \end{aligned}$$
(2.13)

The following result characterizes \(V_{\infty }^{\psi }(x)\) as a fixed-point of the Bellman operator.

Proposition 2.2

Assume that \(\psi = - \lambda \) for some storage function \(\lambda \in {\mathcal {C}} ( {\mathbb {X}} )\) and that the following limit exists and is finite:

$$\begin{aligned} V_{\infty }^{\psi } (x ) = \lim _{\tau \rightarrow + \infty } V_{\tau }^{\psi } (x). \end{aligned}$$
(2.14)

Then, \(V_{\infty }^{\psi }\) is a lower semi-continuous solution of the Bellman Equation, viz. \(T V_{\infty }^{\psi } (x) = V_{\infty }^{\psi } (x)\).

Proof

To see this, recall that \(V_{\tau }^{\psi }(x)\) is non-decreasing with respect to \(\tau \). Hence:

$$\begin{aligned} \liminf _{x \rightarrow x_0} V_{\infty }^{\psi } (x) = \liminf _{x \rightarrow x_0} \lim _{\tau \rightarrow + \infty } V_{\tau }^{\psi } (x) \ge \liminf _{x \rightarrow x_0} V_{\tau }^{\psi } (x) = V_{\tau }^{\psi } (x_0) \qquad \forall \, \tau \in {\mathbb {N}} \end{aligned}$$

Since \(\tau \) is arbitrary, we see that:

$$\begin{aligned} \liminf _{x \rightarrow x_0} V_{\infty }^{\psi } (x) \ge \lim _{\tau \rightarrow + \infty } V_{\tau }^{\psi } (x_0) = V_{\infty }^{\psi } (x_0). \end{aligned}$$

This proves that \(V^{\psi }_{\infty }\) is lower semicontinuous. Hence the minimum of

$$\begin{aligned} \min _{u \in {\mathbb {U}}(x)} \ell (x,u) + V_{\infty }^{\psi } ( f(x,u) ), \end{aligned}$$

is achieved, for some optimal feedback policy \(u^*(x)\). Moreover it fulfills:

$$\begin{aligned} T V_{\infty }^{\psi } (x)= & {} \ell (x,u^*(x)) + V_{\infty }^{\psi } ( f(x,u^*(x)) ) \; = \; \lim _{\tau \rightarrow + \infty } \ell (x,u^*(x)) + V_{\tau }^{\psi } ( f(x,u^*(x))) \\\ge & {} \lim _{\tau \rightarrow + \infty } V_{\tau +1}^{\psi } (x) = V_{\infty }^{\psi } (x). \end{aligned}$$

On the other hand:

$$\begin{aligned} V_{\infty }^{\psi } (x) = \lim _{\tau \rightarrow + \infty } V_{\tau +1}^{\psi } (x) = \lim _{\tau \rightarrow + \infty } T V_{\tau }^{\psi } (x) = \lim _{\tau \rightarrow + \infty } \min _{u \in {\mathbb {U}}(x) } \ell (x,u) + V_{\tau }^{\psi } ( f(x,u) ). \end{aligned}$$

Let \(x \in {\mathbb {X}}\) be fixed and arbitrary. Since \(V_\tau \) is continuous in x (by induction over \(\tau \), continuity of \(\psi \) and u.s.c. of \({\mathbb {U}}(x)\)), for each \(\tau >0\) and the current fixed value of x there exists a minimizer \(u_\tau (x) \in {\mathbb {U}}(x)\) for this last expression. Since \({\mathbb {U}}(x)\) is compact, we find a sequence \(\tau _n\rightarrow \infty \) (possibly x-dependent) such that \(u_{\tau _n}\) converges to a control value \(u_\infty (x) \in {\mathbb {U}}(x)\). For each \(\tau >0\) this implies

$$\begin{aligned} V_{\infty }^{\psi } (x)= & {} \lim _{n \rightarrow + \infty } \ell (x,u_{\tau _n}(x)) + V_{\tau _n}^{\psi } ( f(x,u_{\tau _n}(x)) )\nonumber \\\ge & {} \lim _{n \rightarrow + \infty } \ell (x,u_{\tau _n}(x)) + V_{\tau }^{\psi } ( f(x,u_{\tau _n}(x)) ) \; = \; \ell (x,u_{\infty }(x)) + V_{\tau }^{\psi } ( f(x,u_{\infty }(x)) ).\nonumber \\ \end{aligned}$$
(2.15)

Since \(V_{\tau }^{\psi } (x) \rightarrow V_{\infty }^{\psi } (x)\), for each \(\varepsilon >0\) there exists \(\tau _{\varepsilon } (x)>0\) such that \(V_{\tau _{\varepsilon }(x)}^{\psi } (x) \ge V^{\psi }_{\infty } (x) - \varepsilon \). Hence we see, starting from (2.15):

$$\begin{aligned} V_{\infty }^{\psi } (x)\ge & {} \ell (x,u_{\infty }(x)) + V_{\tau _\varepsilon (f(x,u_{\infty }(x)))}^{\psi } ( f(x,u_{\infty }(x)) ) \; \ge \; \ell (x,u_{\infty }(x)) \\{} & {} + V_{\infty }^{\psi } ( f(x,u_{\infty }(x)) ) - \varepsilon \\\ge & {} \inf _{u \in {\mathbb {U}}(x) } \ell (x,u) + V_{\infty }^{\psi } ( f(x,u) ) - \varepsilon \; = \; T V_{\infty }^{\psi } (x) - \varepsilon . \end{aligned}$$

Since \(x\in {\mathbb {X}}\) and \(\varepsilon >0\) were arbitrary, the assertion \(V_{\infty }^{\psi } (x) \ge T V_{\infty }^{\psi } (x)\) follows for all \(x \in {\mathbb {X}}\) and this, combined with the complementary inequality \(V_{\infty }^{\psi } (x) \le T V_{\infty }^{\psi } (x)\) previously shown, completes the proof. \(\square \)

We remark that Proposition 2.2 is connected to the result in [24], where the role of linear penalty functions is explored in guaranteeing asymptotic stability of Economic MPC.

3 Shifted Bellman Equation and Operators

In the literature, different constructive approaches for computing storage functions are described, above all the classical constructions of the available storage and the required supply, which go back to [10] and are easily adapted to the discrete-time case (see, e.g., [16, 19] for the available storage). For this reason, a possible, but ultimately unsatisfactory, way to approach an infinite horizon optimal control problem would be according to the following steps:

  1. 1.

    Computing the minimal average cost, \(V^{\text {avg}}\);

  2. 2.

    Defining a shifted stage cost, \({\tilde{\ell }} (x,u) = \ell (x,u) - V^{\text {avg}}\), so as to yield 0 optimal average;

  3. 3.

    Compute a storage function \(\lambda \) for the supply function \({\tilde{\ell }}(x,u)\);

  4. 4.

    Defining \(\psi := - \lambda \) as a terminal penalty for the infinite horizon optimal control problem, with shifted stage costs \({\tilde{\ell }}\);

  5. 5.

    Use the standard Bellman iteration to asymptotically compute the value function over an infinite horizon or directly looking for a solution of the associated Bellman Equation.

This procedure is non ideal for several reasons: first of all, computation of the optimal average cost involves a limiting operation, and therefore typically only approximate values for \(V^{\text {avg}}\) can ever be achieved. However, using approximate values in the iteration of the Bellman operator, yields diverging optimal costs over an infinite horizon, either to \(\pm \infty \), depending on whether the optimal average cost has been over or underestimated. In addition, Step 3 is bound to fail whenever the average optimal cost \(V^{\text {avg}}\) has been overestimated (in other words a storage function might exist only for \({\tilde{\ell }}(x,u)=\ell (x,u) - c\) where \(c \le V^{\text {avg}}\)).

The goal of this section is to propose operators, the \(\min \)-shifted and \(\max \)-shifted Bellman operator, whose iteration would converge to the optimal infinite horizon cost, and, at the same time, yield as a by-product the optimal average cost.

To this end, we need additional notation. Given \(\psi _1: {\mathbb {X}} \rightarrow {\mathbb {R}}\) and \(\psi _2: {\mathbb {X}} \rightarrow {\mathbb {R}}\), continuous, we define the following:

$$\begin{aligned} c ( \psi _1, \psi _2 ):= \frac{1}{2} \max _{x \in {\mathbb {X}}} [\psi _1 (x) - \psi _2 (x)] + \frac{1}{2} \min _{x \in {\mathbb {X}} } [\psi _1 (x) - \psi _2(x)], \end{aligned}$$
(3.1)

viz. the median value of the difference \(\psi _1- \psi _2\). The following distance notion is also defined:

$$\begin{aligned} d( \psi _1, \psi _2 ):= \min _{b \in {\mathbb {R}} } \Vert \psi _1 - \psi _2 + b \Vert _{\infty }. \end{aligned}$$

Notice that \(d( \psi _1 + c_1, \psi _2 + c_2 ) = d ( \psi _1, \psi _2 )\) for all \(c_1, c_2 \in {\mathbb {R}}\). Indeed,

$$\begin{aligned} d ( \psi _1 + c_1, \psi _2 + c_2 ){} & {} = \min _{b \in {\mathbb {R}} } \Vert \psi _1 - \psi _2 + c_1 - c_2 + b \Vert _{\infty }\\ {}{} & {} = \min _{{\tilde{b}} \in {\mathbb {R}} } \Vert \psi _1 - \psi _2 + {\tilde{b}} \Vert _{\infty } = d(\psi _1,\psi _2) \end{aligned}$$

where the second equality follows by the change of variables \({\tilde{b}}= c_1 - c_2 + b\). Moreover:

$$\begin{aligned} d( \psi _1, \psi _2 ) = \Vert \psi _1 - \psi _2 - c ( \psi _1,\psi _2 ) \Vert _{\infty }, \end{aligned}$$

viz. the infinity norm of the shifted version of signal \(\psi _1 - \psi _2\) once its median value has been subtracted. In fact, an equivalent alternative definition for \(d(\psi _1,\psi _2)\) is as follows:

$$\begin{aligned} d( \psi _1,\psi _2 ) = \frac{1}{2} \max _{x \in {\mathbb {X}}} [ \psi _1 (x) - \psi _2 (x) ] - \frac{1}{2} \min _{x \in {\mathbb {X}}} [ \psi _1 (x) - \psi _2 (x) ]. \end{aligned}$$

Recall the Bellman operator \(T: {\mathcal {C}} ( {\mathbb {X}} ) \rightarrow {\mathcal {C}} ({\mathbb {X}} )\) previously introduced:

$$\begin{aligned} T \psi := \min _{u \in {\mathbb {U}} (x)} \ell (x,u) + \psi ( f(x,u) ). \end{aligned}$$

Definition 3.1

Define the \(\min \)-shifted Bellman operator \({\hat{T}}: {\mathcal {C}} ( {\mathbb {X}}) \rightarrow {\mathcal {C}} ( {\mathbb {X}} )\) as:

$$\begin{aligned} {\hat{T}} \psi := \min \{ \psi , T \psi + c ( \psi , T \psi ) \}. \end{aligned}$$
(3.2)

Similarly, we may consider the following operator.

Definition 3.2

Define the \(\max \)-shifted Bellman operator \({\check{T}}: {\mathcal {C}} ( {\mathbb {X}}) \rightarrow {\mathcal {C}} ( {\mathbb {X}} )\) as:

$$\begin{aligned} {\check{T}} \psi := \max \{ \psi , T \psi + c ( \psi , T \psi ) \}. \end{aligned}$$
(3.3)

It is straightforward to see that:

$$\begin{aligned} \psi (x) \ge {\hat{T}} ( \psi ) (x) \ge {\hat{T}}^2 ( \psi ) (x) \ge \ldots \ge {\hat{T}}^k ( \psi ) (x) \ge \ldots \end{aligned}$$

for all \(k \in {\mathbb {N}}\). Opposite inequalities hold in the case of the \({\check{T}}\) operator:

$$\begin{aligned} \psi (x) \le {\check{T}} ( \psi ) (x) \le {\check{T}}^2 ( \psi ) (x) \le \ldots \le {\check{T}}^k ( \psi ) (x) \le \ldots \end{aligned}$$

Remark 3.3

By induction, and exploiting the \(\min \) commutativity property, the following formula can be proved (see Appendix B.2):

$$\begin{aligned} {\hat{T}}^k \psi (x) = \min _{\tau \in \{0, \ldots , k \}} \left\{ T^{\tau } \psi (x) + \min _{S \subseteq \{0,\ldots ,k-1 \}, |S|= \tau } \sum _{s \in S} c ( {\hat{T}}^s \psi , T {\hat{T}}^s \psi ) \right\} . \end{aligned}$$
(3.4)

Along similar lines the following inequality can be shown by induction for the \({\check{T}}\) operator:

$$\begin{aligned} {\check{T}}^k \psi (x) \ge \max _{\tau \in \{0, \ldots , k \}} \left\{ T^{\tau } \psi (x) + \max _{ S \subseteq \{0,\ldots ,k-1 \}, |S|= \tau } \sum _{s \in S} c ( {\check{T}}^s \psi , T {\check{T}}^s \psi ) \right\} . \end{aligned}$$
(3.5)

The following result holds:

Proposition 3.4

A function \({\bar{\psi }} (x) \in {\mathcal {C}}( {\mathbb {X}})\) is a fixed point of \({\hat{T}}\) or \({\check{T}}\) if and only if there exists \(b \in {\mathbb {R}}\) such that \({\bar{\psi }}\) is a solution of the following shifted Bellman Equation:

$$\begin{aligned} T {\bar{\psi }} = {\bar{\psi }} + {b}. \end{aligned}$$
(3.6)

Proof

Assume that \({\bar{\psi }}\) fulfills the shifted Bellman Equation (3.6). Then, direct computation shows:

$$\begin{aligned} {\hat{T}} {\bar{\psi }} = \min \{ {\bar{\psi }}, T {\bar{\psi }} + c ( \bar{ \psi }, T {\bar{\psi }} ) \} = \min \{ {\bar{\psi }}, {\bar{\psi }} + {b} + c ( {\bar{\psi }}, {\bar{\psi }} + {b} ) \} = \min \{ {\bar{\psi }}, {\bar{\psi }} \} = {\bar{\psi }}, \end{aligned}$$

where the equality follows since by definition \(c ( {\bar{\psi }}, {\bar{\psi }} + {b} ) = -{b}\). Conversely, assume \({\hat{T}} \bar{ \psi } = {\bar{\psi }}\):

$$\begin{aligned} {\bar{\psi }} = \min \{ {\bar{\psi }}, T {\bar{\psi }} + c ( {\bar{\psi }}, T {\bar{\psi }} ) \}. \end{aligned}$$

Hence, the following inequality holds:

$$\begin{aligned} {\bar{\psi }} (x) \le T {\bar{\psi }} (x) + c ( {\bar{\psi }}, T {\bar{\psi }} ) \; \quad \forall \, x \in {\mathbb {X}}. \end{aligned}$$
(3.7)

We claim that more is true, namely:

$$\begin{aligned} {\bar{\psi }} (x) - T {\bar{\psi }} (x) = c ( \bar{ \psi }, T {\bar{\psi }} ) \qquad \forall \, x \in {\mathbb {X}}. \end{aligned}$$
(3.8)

Assume by contradiction:

$$\begin{aligned} \min _{x \in {\mathbb {X}} } {\bar{\psi }} (x) - T {\bar{\psi }} (x) < c ( {\bar{\psi }}, T {\bar{\psi }} ), \end{aligned}$$

where the \(\min \) exists by continuity of \({\bar{\psi }}\), \(T \bar{ \psi }\) and compactness of \({\mathbb {X}}\). By inequality (3.7) we also know that:

$$\begin{aligned} \max _{x \in {\mathbb {X}} } {\bar{\psi }} (x) - T {\bar{\psi }} (x) \le c ( {\bar{\psi }}, T {\bar{\psi }} ). \end{aligned}$$

Taking a convex combination of the two previous inequalities yields:

$$\begin{aligned} c( {\bar{\psi }}, T {\bar{\psi }} ) = \frac{1}{2} \min _{x \in {\mathbb {X}} } {\bar{\psi }} (x) - T {\bar{\psi }} (x) + \frac{1}{2} \max _{x \in {\mathbb {X}} } {\bar{\psi }} (x) - T {\bar{\psi }} (x) < c ( \bar{ \psi }, T {\bar{\psi }} ), \end{aligned}$$

which is a contradiction. Hence, (3.8) holds, and \(\bar{ \psi }\) is solution of a shifted Bellman Equation. A similar proof applies to the \({\check{T}}\) operator. \(\square \)

The following proposition shows that the feedback control that can be generated from a solution \({{\bar{\psi }}}\) of the shifted Bellman equation guarantees the optimal average cost b, provided \({{\bar{\psi }}}\) is bounded.

Proposition 3.5

Let \({{\bar{\psi }}}\) be a bounded solution of the shifted Bellman equation (3.6) for some \(b\in {\mathbb {R}}\). Let \(x^\star (t)\), \(u^\star (t)\) be a solution of (2.1) which realizes the minimum in (2.13) for all \(t\ge 0\), i.e.,

$$\begin{aligned} \ell (x^\star (t),u^\star (t)) + {{\bar{\psi }}}(f(x^\star (t),u^\star (t))) = T {{\bar{\psi }}}(x^\star (t)) \end{aligned}$$

for all \(t\ge 0\). Then the average cost satisfies

$$\begin{aligned}J^{\textrm{avg}}(x^\star (\cdot ),u^\star (\cdot )) = b\end{aligned}$$

and this is the best possible value.

Moreover \(x^*(\cdot ), u^*(\cdot )\) is optimal with respect to the following finite and infinite horizon costs:

$$\begin{aligned} \sum _{t = 0}^{\tau -1} \, [ \ell (x(t),u(t)) - b ] + {\bar{\psi }}( x(\tau )) \\ \liminf _{\tau \rightarrow + \infty } \; \sum _{t = 0}^{\tau -1} \, [ \ell (x(t),u(t)) - b ] + {\bar{\psi }}( x(\tau )).\end{aligned}$$

Proof

The assumptions yield for all \(t\ge 0\)

$$\begin{aligned} \ell (x^\star (t),u^\star (t)) + {{\bar{\psi }}}(\underbrace{f(x^\star (t),u^\star (t))}_{=x^\star (t+1)}) = T {{\bar{\psi }}}(x^\star (t)) = {{\bar{\psi }}}(x^\star (t)) + b,\end{aligned}$$

implying

$$\begin{aligned} \ell (x^\star (t),u^\star (t)) = b + {{\bar{\psi }}}(x^\star (t)) - {{\bar{\psi }}}(x^\star (t+1)).\end{aligned}$$

Summing up this equation yields

$$\begin{aligned} \sum _{t=0}^{\tau -1}\ell (x^\star (t),u^\star (t))= & {} \sum _{i=0}^{\tau -1}\left[ b + {{\bar{\psi }}}(x^\star (t)) - {{\bar{\psi }}}(x^\star (t+1))\right] \nonumber \\= & {} \tau b + {{\bar{\psi }}}(x^\star (0)) - {{\bar{\psi }}}(x^\star (\tau )), \end{aligned}$$
(3.9)

which implies

$$\begin{aligned}J^{\textrm{avg}}(x^\star (\cdot ),u^\star (\cdot )) = \limsup _{\tau \rightarrow \infty }\frac{\tau b + {{\bar{\psi }}}(x^\star (0)) - {{\bar{\psi }}}(x^\star (\tau ))}{\tau } = b,\end{aligned}$$

because by the boundedness assumption on \({{\bar{\psi }}}\) the expression \(({{\bar{\psi }}}(x^\star (0)) - {{\bar{\psi }}}(x^\star (\tau )))/\tau \) converges to 0 as \(\tau \rightarrow +\infty \).

For any other solution \(x(\cdot )\), \(u(\cdot )\) of (2.1), the definition of T in (2.13) implies

$$\begin{aligned} \ell (x(t),u(t)) + {{\bar{\psi }}}(f(x(t),u(t))) \ge T {{\bar{\psi }}}(x(t)). \end{aligned}$$

Then the same computations as above yield \(J^\textrm{avg}(x(\cdot ),u(\cdot )) \ge b\), showing optimality of the average cost b. Alternatively, equality (3.9) may be restated as

$$\begin{aligned} \sum _{t=0}^{\tau -1} \, [ \ell (x^\star (t),u^\star (t)) - b] + {\bar{\psi }} ( x^\star (\tau )) = {\bar{\psi }}(x^\star (0) ). \end{aligned}$$

Similarly to the case of average costs, for any other feasible solution x(t), u(t) with \(x^\star (0)=x(0)\) we have:

$$\begin{aligned} \sum _{t=0}^{\tau -1} \, [ \ell (x(t),u(t)) - b] + {\bar{\psi }} ( x(\tau )) \ge {\bar{\psi }}(x(0) )={\bar{\psi }}(x^\star (0) ) \end{aligned}$$

thus proving optimality of \(x^\star (\cdot ),u^\star (\cdot )\) with respect to finite horizon costs \( \sum _{t=0}^{\tau -1} \, [ \ell (x(t),u(t)) - b] + {\bar{\psi }} ( x{(\tau )})\) for all \(\tau \in {\mathbb {N}}\). Letting \(\tau \) go to infinity in the above (in)equalities shows optimality of \(x^\star (\cdot )\), \(u^\star (\cdot )\) on infinite horizon costs:

$$\begin{aligned} \liminf _{\tau \rightarrow + \infty } \; \sum _{t=0}^{\tau -1} \, [ \ell (x(t),u(t)) - b] + {\bar{\psi }} ( x(\tau )). \end{aligned}$$

It is worth pointing out that \(\liminf \) in the infinite horizon cost could also be replaced by \(\limsup \) since the cost of the optimal solution admits a limit for all initial conditions. In addition, removing the bias term b from finite horizon costs only shifts them by a constant and does not affect optimality of solutions. As we will see later in Proposition 4.1, the value b for which the shifted Bellman equation can have a solution is unique (even without assuming boundedness of the solution), which is in line with the fact that due to its minimality the minimal average cost is also unique. \(\square \)

Remark 3.6

  1. (i)

    A similar computation as in the proof of Proposition 3.5 shows that for the shifted cost \(\ell (x,u)-b\) the identity

    $$\begin{aligned} J_\tau ^\psi (x^\star (\cdot ),u^\star (\cdot )) = {{\bar{\psi }}}(x^\star (0)) - {{\bar{\psi }}}(x^\star (\tau )) + \psi (x^\star (\tau )) \end{aligned}$$

    holds. If \({{\bar{\psi }}}\) and \(\psi \) are bounded, then the lim sup of this expression is bounded, implying that \(V_\infty ^\psi \) from (2.14) (if existing) is bounded from above. Particularly, in the situation of Proposition 2.1, where \(V_\infty ^\psi \) exists but may be \(+\infty \), the existence of a bounded solution of the shifted Bellman equation implies finiteness of \(V_\infty ^\psi \) when \(\ell (x,u)\) is replaced by \(\ell (x,u)-b\).

  2. (ii)

    Conversely, if \(V_\infty ^\psi \) exists and is finite for the shifted cost \(\ell (x,u)-b\), then a standard dynamic programming proof (see, e.g., [25, Theorem 4.4]) shows that \(V_\infty ^\psi \) solves the Bellman equation (i.e., the shifted Bellman equation (3.6) with \(b=0\)) and thus the shifted Bellman equation (3.6) for the original \(\ell \).

  3. (iii)

    A crucial question is thus whether \(V_\infty ^\psi \) exists and is finite, or even bounded. For strictly dissipative systems (for a definition see (4.1), below), sufficiently fast (asymptotic) controllability to the equilibrium \((x^e,u^e)\) guarantees this, see [26, Assumption 6.1 and Theorem 6.4]. A similar condition applies to optimal control problems with an optimal periodic orbit, see [19, Assumption 10 and Theorem 16]. In both references the inequalities are shown for all sufficiently large finite horizons, but carry over to the infinite horizon limit. We conjecture that this condition can be extended to systems that have more complex optimal behavior than equilibria or periodic orbits, but this extension goes beyond the scope of this paper.

  4. (iv)

    Alternatively one might wonder whether shifting the stage-cost by the optimal average cost might be enough (without adoption of a terminal penalty function) to guarantee existence of the infinite horizon cost. For this to happen, typically, one needs two conditions: constant stage cost along a recurrent (e.g. periodic) average optimal solution and some form of fast enough convergence of the cost to this constant value (e.g. exponential). Together the conditions guarantee a finite limit of the infinite horizon transient cost.

4 Properties of T, \({\hat{T}}\) and \({\check{T}}\) Operators

Throughout this section we recall some useful properties of the T operator and additionally provide original derivations for the properties of the \({\hat{T}}\) and \({\check{T}}\) operators. Some of the properties listed below are well known and can be found in [3]:

  • Monotonicity:

    $$\begin{aligned} {[} \psi _1 (x) \le \psi _2 (x), \; \forall \, x \in {\mathbb {X}}] \Rightarrow [ T \psi _1 (x) \le T \psi _2 (x), \; \forall \, x \in {\mathbb {X}} ] \end{aligned}$$
  • Translation invariance:

    $$\begin{aligned} T ( \psi + {b} ) = T \psi + {b}, \end{aligned}$$

    for any constant \({b} \in {\mathbb {R}}\);

  • Minimum commutativity, for finite index set K:

    $$\begin{aligned} T \left( \min _{k \in K} \{ \psi _k \} \right) = \min _{k \in K} \{ T \psi _k \} \end{aligned}$$

    To see the last one, notice:

    $$\begin{aligned} T \left( \min _{k \in K} \{ \psi _k \} \right)= & {} \min _{u \in {\mathbb {U}}(x) } \ell (x,u) + \min _{k \in K} \{ \psi _k (f(x,u)) \} \\ {}{} & {} = \min _{u \in {\mathbb {U}}(x) } \min _{k \in K} \{ \ell (x,u) + \psi _k (f(x,u) \}\\{} & {} = \min _{k \in K} \min _{u \in {\mathbb {U}}(x)} \ell (x,u) + \psi _k (f(x,u)) \; = \; \min _{k \in K} \{ T \psi _k \}. \end{aligned}$$
  • Concavity:

    For all \(\alpha \in [0,1]\) and any \(\psi _1, \psi _2\) it holds:

    $$\begin{aligned} T ( \alpha \psi _1 + (1-\alpha ) \psi _2 ) \ge \alpha T \psi _1 + (1- \alpha ) T \psi _2. \end{aligned}$$

    To see this, notice:

    $$\begin{aligned} T ( \alpha \psi _1 + (1-\alpha ) \psi _2 ){} & {} = \min _{u \in {\mathbb {U}}(x) }\ell (x,u) + \alpha \psi _1 (f(x,u)) + (1- \alpha ) \psi _2 (f(x,u)) \\{} & {} = \min _{u \in {\mathbb {U}} (x) } \alpha [ \ell (x,u) + \psi _1 ( f(x,u)) ]\\{} & {} \qquad + (1- \alpha ) [ \ell (x,u) + \psi _2 (f(x,u) ) ] \\\ge & {} \min _{u \in {\mathbb {U}} (x) } \alpha [ \ell (x,u) + \psi _1 ( f(x,u)) ] \\{} & {} + \min _{u \in {\mathbb {U}} (x) }(1- \alpha ) [ \ell (x,u) + \psi _2 (f(x,u) ) ]\\= & {} \alpha T \psi _1 + (1- \alpha ) T \psi _2. \end{aligned}$$
  • Max-super-commutativity: the following inequality holds:

    $$\begin{aligned} T \max \{ \psi _1, \psi _2 \} \ge \max \{ T \psi _1, T \psi _2 \}, \end{aligned}$$

    and by induction, for any finite set K:

    $$\begin{aligned} T \left( \max _{k \in K} \{ \psi _k(x) \} \right) \ge \max _{k \in K} \{ T \psi _k(x) \}. \end{aligned}$$
  • Non-expansiveness: monotonicity and shift-invariance can be exploited to show the following inequality, expressing (incremental) non-expansiveness of the T operator:

    $$\begin{aligned} d (T \psi _1, T \psi _2 ) \le d ( \psi _1, \psi _2 ), \qquad \forall \, \psi _1, \psi _2 \in {\mathcal {C}}( {\mathbb {X}} ). \end{aligned}$$

Next we derive some useful properties of the \({\hat{T}}\) and \({\check{T}}\) operators. Notice that for all \(b_1,b_2 \in {\mathbb {R}}\) the following holds:

$$\begin{aligned} c ( \psi _1+b_1, \psi _2+b_2 ) = c( \psi _1, \psi _2) + b_1 - b_2. \end{aligned}$$

Hence the following translation invariance can be seen:

$$\begin{aligned} {\hat{T}} ( \psi + b ) = {\hat{T}} \psi + b, \end{aligned}$$

for all \(b \in {\mathbb {R}}\). In fact,

$$\begin{aligned} {\hat{T}} ( \psi + {b} )= & {} \min \{ \psi + {b}, T ( \psi + {b}) + c ( \psi + {b}, T ( \psi + {b} ) ) \} \\= & {} \min \{ \psi + {b}, T \psi + {b} + c ( \psi + {b}, T \psi + {b} ) \} \\= & {} \min \{ \psi + {b}, T \psi + {b} + c ( \psi , T \psi ) \} \; \\= & {} \min \{ \psi , T \psi + c ( \psi , T \psi ) \} + {b} \; = \; {\hat{T}} \psi + {b}. \end{aligned}$$

The same property holds for \({\check{T}}\). The next proposition states that all solutions of a shifted Bellman Equation share the same shift value.

Proposition 4.1

Let \(\psi _1\) and \(\psi _2\) be continuous solutions of the shifted Bellman Equation (3.6), viz. \(T \psi _1 + c_1 = \psi _1\) and \(T \psi _2 + c_2 = \psi _2\) for suitable constants \(c_1\) and \(c_2\). Then, \(c_1=c_2\).

Proof

See Appendix B.1. \(\square \)

We show later, by means of an example, that while the shift is uniquely defined for all solutions of the shifted Bellman Equation, it is not true in general that \(d ( \psi _1, \psi _2) =0\), i.e. there may be multiple solutions of the shifted Bellman Equation, even after taking into account translation invariance. In the remainder of this section, we describe a situation in which the solution of the shifted Bellman Equation is unique, up to the addition of a constant. Again, a dissipativity inequality plays a role, but now a stronger one than (2.9). For an equilibrium \((x^e,u^e)\) (i.e., \(f(x^e,u^e)=x^e\)) we call the system strictly dissipative, if there exists a storage function \(\lambda : {\mathbb {X}} \rightarrow {\mathbb {R}}\), bounded from below, andFootnote 1\(\alpha \in {\mathcal {K}}\) such that

$$\begin{aligned} \lambda ( f(x,u) ) \le \lambda (x) + \ell (x,u) - \ell (x^e,u^e)-\alpha (\Vert x-x^e\Vert ) \qquad \forall \, (x,u) \in {\mathbb {Z}}. \end{aligned}$$
(4.1)

We note that a positive definite stage cost, i.e., an \(\ell \) satisfying \(\ell (x,u) \ge \alpha (\Vert x-x^e\Vert )\) for all \((x,u)\in {\mathbb {Z}}\) and \(\ell (x^e,u^e)=0\), satisfies the inequality (4.1) for \(\lambda \equiv 0\). For this kind of stage costs, the following proposition holds.

Proposition 4.2

Suppose the stage cost \(\ell \) satisfies \(\ell (x,u) \ge \alpha (\Vert x-x^e\Vert )\) for all \((x,u)\in {\mathbb {Z}}\) and some \(\alpha \in {\mathcal {K}}\), and \(\ell (x^e,u^e)=0\). Then, up to the addition of a constant, there exists at most one continuous solution of the shifted Bellman Equation.

Proof

Let \(\psi _1\) and \(\psi _2\) be two continuous solutions of the shifted Bellman Equation (3.6) that are bounded from below. By adding suitable constants, we can assume that \(\psi _1(x^e)=\psi _2(x^e)=0\). From (2.13) we obtain that

$$\begin{aligned}{} & {} \psi _i(x^e) + c = T\psi _i(x^e) = \min _{u \in {\mathbb {U}} (x^e)} \ell (x^e,u) \\{} & {} \quad + \psi _i ( f(x^e,u) ) \le \ell (x^e,u^e) + \psi _i ( f(x^e,u^e) ) = \psi _i(x^e), \end{aligned}$$

implying \(c\le 0\).

For each \(x\in {\mathbb {X}}\), let \(u_i^*(x)\in {\mathbb {U}}(x)\) be a control that realizes the minimum in the Bellman operator (2.13) for \(\psi =\psi _i\), \(i=1,2\). Such a \(u_i^*(x)\) exists because \(\ell \), f, and \(\psi _i\) are continuous and \({\mathbb {U}}(x)\) is compact. Then from the shifted Bellman Equation we obtain that

$$\begin{aligned} \psi _i(x) + c = \ell (x,u_i^*(x)) + \psi _i(f(x,u_i^*(x))),\end{aligned}$$

implying

$$\begin{aligned} \psi _i(f(x,u_i^*(x))) = \psi _i(x) + c - \ell (x,u_i^*(x)) \le \psi _i(x) - \alpha (\Vert x-x^e\Vert ).\end{aligned}$$
(4.2)

Now, given \(x_i^*(0)\in {\mathbb {X}}\), by \(x_i^*(k)\) we denote the sequence generated by \(x_i^*(k+1) = f(x_i^*(k),u_i^*(x_i^*(k))\). Then (4.2) implies

$$\begin{aligned} \psi _i(x_i^*(k)) \le \psi _i(x_i^*(0)) - \sum _{k'=0}^{k-1} \alpha (\Vert x_i^*(k')-x^e\Vert ).\end{aligned}$$

Since \(\psi _i\) is bounded from below in \({\mathbb {X}}\), this sum must converge, implying that \(\alpha (\Vert x_i^*(k)-x^e\Vert )\rightarrow 0\) and thus \(x_i^*(k)\rightarrow x^e\) as \(k\rightarrow \infty \). Since \(\psi _i(x^e)=0\) and \(\psi _i\) is continuous, we also obtain \(\psi _j(x_i^*(k))\rightarrow 0\) as \(k\rightarrow \infty \) for \(i=1,2\) and \(j=1,2\).

Now pick an arbitrary \(x\in {\mathbb {X}}\). We show that for each \(\varepsilon >0\) and for both choices \(i=1\), \(j=2\) and \(i=2\), \(j=1\) we have

$$\begin{aligned} \psi _j(x) - \psi _i(x) < \varepsilon \end{aligned}$$
(4.3)

holds, which shows \(\psi _1(x)=\psi _2(x)\) and thus the assertion.

To this end, consider the sequence \(x_i^*(k)\) with \(x_i^*(0)=x\). For each \(k\ge 0\) we obtain, using that c must be the same in the shifted Bellman Equation for \(\psi _j\) and \(\psi _i\) due to Proposition 4.1,

$$\begin{aligned}{} & {} \psi _j(x_i^*(k)) - \psi _i(x_i^*(k))\\{} & {} \qquad = T\psi _j(x_i^*(k)) + c - (T\psi _i(x_i^*(k)) + c)\\{} & {} \qquad = \underbrace{\min _{u \in {\mathbb {U}} (x_i^*(k))} \ell (x_i^*(k),u) + \psi _j ( f(x_i^*(k),u) )}_{\le \, \ell (x_i^*(k),u_i^*(x_i^*(k))) + \psi _j ( f(x_i^*(k),u_i^*(x_i^*(k)))} \; - \underbrace{\min _{u \in {\mathbb {U}} (x_i^*(k))} \ell (x_i^*(k),u) + \psi _i ( f(x_i^*(k)e,u) )}_{= \, \ell (x_i^*(k),u_i^*(x_i^*(k))) + \psi _i ( f(x_i^*(k),u_i^*(x_i^*(k)))}\\{} & {} \qquad \le \psi _j ( f(x_i^*(k),u_i^*(x_i^*(k))) - \psi _j ( f(x_i^*(k),u_i^*(x_i^*(k))) \; = \; \psi _j(x_i^*(k+1)) - \psi _i(x_i^*(k+1)). \end{aligned}$$

Iterating this inequality we thus obtain

$$\begin{aligned} \psi _j(x) - \psi _i(x) \le \psi _j(x_i^*(k)) - \psi _i(x_i^*(k))\end{aligned}$$

for all \(k\ge 0\). Since we know that \(\psi _j(x_i^*(k))\rightarrow 0\) and \(\psi _i(x_i^*(k))\rightarrow 0\) as \(k\rightarrow \infty \), there is \(k\in {\mathbb {N}}\) such that both \(|\psi _j(x_i^*(k))|<\varepsilon /2\) and \(|\psi _j(x_i^*(k))|<\varepsilon /2\) hold, implying \(\psi _j(x_i^*(k)) - \psi _i(x_i^*(k))<\varepsilon \) and thus (4.3). \(\square \)

Now for a strictly dissipative system satisfying (4.1) we consider the “rotated” stage cost

$$\begin{aligned} {{\tilde{\ell }}}(x,u) = \ell (x,u) - \ell (x^e,u^e) + \lambda (x) - \lambda (f(x,u)) \end{aligned}$$
(4.4)

and observe that it satisfies the conditions on \(\ell \) from Proposition 4.2. The corresponding Bellman operator defined by

$$\begin{aligned}\widetilde{T}\psi (x):= \min _{u \in {\mathbb {U}} (x)} {{\tilde{\ell }}} (x,u) + \psi ( f(x,u) )\end{aligned}$$

satisfies the following property.

Lemma 4.3

For any continuous function \(\lambda :{\mathbb {X}}\rightarrow {\mathbb {R}}\) the identity

$$\begin{aligned} \widetilde{T}\psi = T(\psi -\lambda )+\lambda - \ell (x^e,u^e)\end{aligned}$$

holds. Particularly, if \(\psi \) is a solution of the shifted Bellman Equation for T and some c, then \({{\tilde{\psi }}} = \psi +\lambda \) is a solution of the shifted Bellman Equation for \(\widetilde{T}\) and \({\tilde{c}} = c - \ell (x^e,u^e)\).

Proof

For all \(x\in {\mathbb {X}}\) we have that

$$\begin{aligned} \widetilde{T}\psi (x)= & {} \min _{u \in {\mathbb {U}} (x)} \{ {{\tilde{\ell }}} (x,u) + \psi ( f(x,u) ) \}\\= & {} \min _{u \in {\mathbb {U}} (x)} \{\ell (x,u) - \ell (x^e,u^e) + \lambda (x) - \lambda (f(x,u)) + \psi ( f(x,u) ) \} \\= & {} \min _{u \in {\mathbb {U}} (x)} \{ \ell (x,u) + \psi ( f(x,u) ) - \lambda (f(x,u))\} + \lambda (x) - \ell (x^e,u^e)\\= & {} T(\psi -\lambda )(x) +\lambda (x) - \ell (x^e,u^e). \end{aligned}$$

This proves the first statement. Now, if \(\psi \) is a solution of the shifted Bellman Equation for T, then

$$\begin{aligned} \widetilde{T}{{\tilde{\psi }}}= & {} T({{\tilde{\psi }}} - \lambda ) + \lambda - \ell (x^e,u^e) \\= & {} T\psi + \lambda - \ell (x^e,u^e) = \psi + c + \lambda - \ell (x^e,u^e) = {{\tilde{\psi }}} + c - \ell (x^e,u^e),\end{aligned}$$

i.e. \({{\tilde{\psi }}}\) is a solution of the shifted Bellman Equation for \(\widetilde{T}\). \(\square \)

Theorem 4.4

Consider an optimal control problem for which strict dissipativity (4.1) holds with a continuous storage function \(\lambda \).

Then, up to the addition of a constant, there exists at most one continuous solution of the shifted Bellman Equation.

Proof

Let \(\psi _1\) and \(\psi _2\) be two solutions of the shifted Bellman Equation satisfying the assumption. Then \({{\tilde{\psi }}}_i = \psi _i+\lambda \), \(i=1,2\) satisfy the assumption of Proposition 4.2 since \(\lambda \) is continuous and bounded from below. Hence, applying Proposition 4.2 to \(\widetilde{T}\) yields that \(\psi _1 + c - \ell (x^e,u^e)\) and \(\psi _2 + c - \ell (x^e,u^e)\) coincide up to the addition of a constant, implying the same for \(\psi _1\) and \(\psi _2\). \(\square \)

We note that non-strict dissipativity is not enough to obtain this uniqueness result up to additions of constants, as the example in Sect. 7.2.1 shows.

Remark 4.5

If the optimal control problem is strictly dissipative, then under a mild controllability condition it has the so-called turnpike property at the equilibrium \((x^e,u^e)\), see, e.g., [13]. This in particular means that the infinite horizon optimal trajectoryFootnote 2 from Proposition 3.5 satisfies \((x^\star (t),u^\star (t)) \rightarrow (x^e,u^e)\) as \(t\rightarrow \infty \). Moreover, the optimal control problem with cost \({{\tilde{\ell }}}(x,u) = \ell (x,u)-\ell (x^e,u^e)+\lambda (x)-\lambda (f(x,u))\) from (4.4) also has the turnpike property at \((x^e,u^e)\), which is here easily seen directly, since this cost is positive definite with respect to the equilibrium \((x^e,u^e)\).

By adding appropriate constants to \({{\bar{\psi }}}\) and \(\lambda \) we can assume without loss of generality that \({{\bar{\psi }}}(x^e)=0\) and \(\lambda (x^e)=0\). In the following, we moreover assume that \({{\bar{\psi }}}\) and \(\lambda \) are continuous near \(x^e\). For the cost \({{\tilde{\ell }}}\) defined above we obtain

$$\begin{aligned} \sum _{t=0}^{\tau -1} {{\tilde{\ell }}}(x(t),u(t)) = \lambda (x(0)) - \tau \ell (x^e,u^e) + \sum _{t=0}^{\tau -1} \ell (x(t),u(t)) - \lambda (x(\tau )), \end{aligned}$$
(4.5)

i.e., minimizing the sum over \({{\tilde{\ell }}}\) is equivalent to minimizing the sum over \(\ell \) with terminal cost \(\psi =-\lambda \). Now the fact that \({{\tilde{\ell }}}\) is positive definite with respect to the equilibrium \((x^e,u^e)\) implies that any candidate for an optimal trajectory of (4.5) must eventually be close to \((x^e,u^e)\), implying \(\lambda (x(\tau ))\rightarrow 0\). This means that a trajectory is an approximate minimizer of (4.5) if and only if it approximately minimizes

$$\begin{aligned} \sum _{t=0}^{\tau -1} \ell (x(t),u(t)) \end{aligned}$$
(4.6)

among all solutions satisfying \({\lambda (x(\tau ))}\approx 0\). The closer the value of (4.6) is to the minimum of this sum and the closer \({\lambda (x(\tau ))}\) is to 0, the closer (4.5) is to its minimum.

Now consider the optimal trajectories \((x^\star (t),u^\star (t))\) from Proposition 3.5 minimizing

$$\begin{aligned} \sum _{t=0}^{\tau -1} \, [ \ell (x(t),u(t)) - b] + {\bar{\psi }} ( x(\tau )) = -\tau b + \sum _{t=0}^{\tau -1} \ell (x(t),u(t)) + {\bar{\psi }} ( x(\tau )). \end{aligned}$$

Since the problem has the turnpike property, we have that \(x(\tau )\approx {x_e}\) and thus \({\bar{\psi }} ( x(\tau ))\approx 0\) for large \(\tau \), i.e.,

$$\begin{aligned}\sum _{t=0}^{\tau -1} \, [ \ell (x^\star (t),u^\star (t)) - b] + {\bar{\psi }} ( x(\tau )) \approx -\tau b + \sum _{t=0}^{\tau -1} \ell (x^\star (t),u^\star (t)).\end{aligned}$$

This means that \((x^\star (t),u^\star (t))\) approximately minimizes (4.6) and satisfies \(x^\star (\tau )\approx {x_e}\). Consequently, it approximately minimizes (4.5), with the approximation error tending to 0 as \(\tau \rightarrow \infty \). Conversely, the finite horizon optimal trajectories minimizing (4.5) can be extended to near-optimal infinite horizon optimal trajectories for the stage cost \(\ell \), with the gap to optimality decreasing to 0 as the length of the finite horizon in (4.5) tends to infinity. This explains from an optimal control point of view why the finite horizon optimal value functions \(V_\tau ^\psi \) with terminal cost \(\psi =-\lambda \) converge to a solution of the Bellman equation (if they converge, at all), as stated in Proposition 2.2.

5 Convergence Analysis Under Equicontinuity

In order to prove convergence of the \({\hat{T}}\) and \({\check{T}}\) iterations to a solution point of the shifted Bellman Equation we restrict the dynamics to fulfill suitable equicontinuity assumptions. Moreover, we provide sufficient conditions, in the form of controllability assumptions, which lead to the needed equicontinuity properties both for the iteration \(T^k \psi \) and \({\hat{T}}^k \psi \).

In order to have convergence guarantees for a sequence of functions, the following notion of equicontinuity is adopted.

Definition 5.1

A sequence of functions \(\{ \psi _k \}_{k=0}^{+ \infty }\), \(\psi _k: {\mathbb {X}} \rightarrow {\mathbb {R}}\) is said to be equicontinuous, if there exists a function \(\gamma \in {\mathcal {K}}_{\infty }\) such that:

$$\begin{aligned} \forall \, k \in {\mathbb {N}}, \; \forall \, x_1, x_2 \in {\mathbb {X}}: \qquad | \psi _k (x_1) - \psi _k (x_2 ) |\le \gamma (| x_1 - x_2|). \end{aligned}$$

To carry out our analysis, we will need the following assumption.

Assumption 5.2

The sequence \(\{ T^k \psi \}_{k = 0}^{+\infty }\) is equicontinuous.

The following lemma shows that this assumption immediately carries over to \({\hat{T}}^k\psi \).

Lemma 5.3

The sequence \(\{ {\hat{T}}^k \psi \}_{k=0}^{+ \infty }\) is equicontinuous provided \(\{ T^k \psi \}_{k=0}^{+ \infty }\) is such.

Proof

The lemma is a consequence of formula (3.4). In particular, for any \(x_2 \in {\mathbb {X}}\) there exists \(\tau ^*(x_2)\) such that:

$$\begin{aligned} {\hat{T}}^k \psi (x_2) = T^{\tau ^*(x_2)} \psi (x_2) + \min _{S \subset \{0,\ldots k-1\}: |S|= \tau ^*(x_2)} \sum _{s \in S} c({\hat{T}}^s \psi , T {\hat{T}}^s \psi ). \end{aligned}$$

Therefore, for any \(x_1 \in {\mathbb {X}}\) we see:

$$\begin{aligned}{} & {} {\hat{T}}^k \psi (x_1) - {\hat{T}}^k \psi (x_2) = {\hat{T}}^k \psi (x_1) \\{} & {} \quad - \left[ T^{\tau ^*(x_2)} \psi (x_2) + \min _{S \subset \{0,\ldots k-1\}: |S|= \tau ^*(x_2)} \sum _{s \in S} c({\hat{T}}^s \psi , T {\hat{T}}^s \psi ) \right] \\{} & {} \quad \le \left[ T^{\tau ^*(x_2)} \psi (x_1) + \min _{S \subset \{0,\ldots k-1\}: |S|= \tau ^*(x_2)} \sum _{s \in S} c({\hat{T}}^s \psi , T {\hat{T}}^s \psi ) \right] \\{} & {} \qquad \qquad - \left[ T^{\tau ^*(x_2)} \psi (x_2) + \min _{S \subset \{0,\ldots k-1\}: |S|= \tau ^*(x_2)} \sum _{s \in S} c({\hat{T}}^s \psi , T {\hat{T}}^s \psi ) \right] \\{} & {} \qquad \qquad = T^{\tau ^*(x_2)} \psi (x_1) - T^{\tau ^*(x_2)} \psi (x_2) \le \gamma (|x_1 - x_2|), \end{aligned}$$

where the last inequality holds by the assumption of equicontinuity of the \(T^k \psi \) sequence. The symmetric inequality \({\hat{T}}^k \psi (x_2) - {\hat{T}}^k \psi (x_1) \le \gamma (|x_1-x_2|)\) can be proved along the same lines. Therefore equicontinuity holds with respect to the same function \(\gamma \).

Our main convergence results under equicontinuity are now stated in the following two theorems.

Theorem 5.4

Let \(\psi \in {\mathcal {C}} ( {\mathbb {X}} )\) be such that \(T^k \psi (x)\) fulfill Assumption 5.2. Then, if a continuous solution of the shifted Bellman Equation exists, the sequence \({\hat{T}}^k \psi (x)\) converges uniformly to one such solution.

Proof

Consider the sequence \([{\hat{T}}^k \psi ]_n\). By Lemma A.5 this sequence is bounded since:

$$\begin{aligned} 0\le & {} [{\hat{T}}^k \psi (x)]_n \; \le \; \max _{x \in {\mathbb {X}}} {\hat{T}}^k \psi (x) - \min _{x \in {\mathbb {X}}} {\hat{T}}^k \psi (x) \\\le & {} \left[ \max _x {\bar{\psi }} (x) - \min _x {\bar{\psi }} (x) \right] + \left[ \max _x [ \psi (x) - {\bar{\psi }} (x) ]- \min _x [ \psi (x) - {\bar{\psi }} (x) ] \right] .\end{aligned}$$

Moreover, by Lemma 5.3 it is equicontinuous. Hence, by the Arzela-Ascoli Theorem, it admits a non empty set of accumulation points (with respect to the uniform topology),

$$\begin{aligned} \omega (\psi ):= \{ {\bar{\psi }} \in {\mathcal {C}}({\mathbb {X}}): \exists \{ k_n \}_{n=1}^{+\infty }, k_n \rightarrow + \infty : {\bar{\psi }} = \lim _{n \rightarrow + \infty } {\hat{T}}^{k_n} \psi \}. \end{aligned}$$

Moreover, each accumulation point in \(\omega (\psi )\) is continuous and fulfills the same continuity inequality,

$$\begin{aligned} |\psi (x_1)- \psi (x_2)| \le \gamma (|x_1-x_2|) \end{aligned}$$
(5.1)

By Lemma A.8, the function \(W( [\psi ]_n)= W (\psi ):= d(\psi ,T \psi )\) is non-increasing along the iteration of \({\hat{T}}\), viz. \(W({\hat{T}}^k \psi )\) is a non-increasing sequence, bounded from below by 0. In addition W is continuous in the topology of uniform convergence. Hence, the limit \(\lim _{k \rightarrow + \infty } W ({\hat{T}}^k \psi )\) exists, and we denote it by \({\bar{W}}\). Because of continuity of W and uniform convergence to the limit points we also have \(W ( {\bar{\psi }} ) = {\bar{W}}\) for all \({\bar{\psi }} \in \omega ( \psi )\). Notice that \(\omega (\psi )\) is invariant with respect to \({\hat{T}}\). Let, in the following, \({\bar{\psi }}\) denote an arbitrary element in \(\omega (\psi )\). For any \(k \in {\mathbb {N}}\) we have \(W ( {\hat{T}}^k {\bar{\psi }} ) = {\bar{W}}\). By combined inequalities (A.9) and (A.8) we see that \(W ( {\hat{T}}^k {\bar{\psi }} )\) can be constant only provided \(\min _{x \in {\mathbb {X}}} {\hat{T}}^k {\bar{\psi }} (x) - T {\hat{T}}^k {\bar{\psi }}(x)\) and \(\max _{x \in {\mathbb {X}}} {\hat{T}}^k {\bar{\psi }} (x) - T {\hat{T}}^k {\bar{\psi }}(x)\) are constant with respect to k. By Corollary A.24, the sequence \({\hat{T}}^k {\bar{\psi }}\) is bounded and converges monotonically to an upper semi-continuous limit. Notice that, by invariance of \(\omega (\psi )\) and the fact that all elements of \(\omega (\psi )\) fulfill inequality (5.1), equicontinuity of \({\hat{T}}^k {\bar{\psi }}\) follows. Hence the limit \(\psi _{\infty } (x):= \lim _{k \rightarrow + \infty } {\hat{T}}^k {\bar{\psi }} (x)\) not only exists (as previously established), but is also continuous and, by Dini’s Theorem, convergence is uniform in \({\mathbb {X}}\). By continuity of the \({\hat{T}}\) operator with respect to uniform convergence, \(\psi _{\infty }(x)\) is a solution of the shifted Bellman Equation (cf. Lemma A.22) and \(0=d( \psi _{\infty }, T \psi _{\infty })=d( {\bar{\psi }}, T {\bar{\psi }} )\). This shows that any element of \(\omega (\psi )\) is a solution of the shifted Bellman Equation. We only need to show that \(\omega (\psi )\) is a singleton. This follows because of Lemma A.6. Indeed, the distance to any element \({\bar{\psi }}\) of \(\omega (\psi )\) is non increasing along the iteration \({\hat{T}}^k \psi \). Since such distance is converging to 0 along some subsequence \({\hat{T}}^{k_n} \psi \), then it is converging to 0 along the sequence \({\hat{T}}^k \psi \) itself. \(\square \)

Due to the lack of an analogue to formula (3.4) for the \({\check{T}}\) operator, there is no simple way of proving a version of Lemma 5.3 for \({\check{T}}^k \psi \). As a consequence, the analogue of Theorem 5.4 for \({\check{T}}\) is stated by directly assuming equicontinuity of \({\check{T}}^k \psi \).

Theorem 5.5

Let \(\psi \in {\mathcal {C}} ( {\mathbb {X}} )\) be such that \({\check{T}}^k \psi (x)\) fulfills Assumption 5.2. Then, if a continuous solution of the shifted Bellman Equation exists, the sequence \({\check{T}}^k \psi (x)\) converges uniformly to one such solution.

Proof

Consider the sequence \([{\check{T}}^k \psi ]_n\). This sequence is bounded since:

$$\begin{aligned} 0\le & {} [{\check{T}}^k \psi (x)]_n \; \le \; \max _{x \in {\mathbb {X}}} {\check{T}}^k \psi (x) - \min _{x \in {\mathbb {X}}} {\check{T}}^k \psi (x) \\\le & {} \left[ \max _x {\bar{\psi }} (x) - \min _x {\bar{\psi }} (x) \right] + \left[ \max _x [ {\check{T}}^k \psi (x) - {\bar{\psi }} (x) ]- \min _x [ {\check{T}}^k \psi (x) - {\bar{\psi }} (x) ] \right] \\\le & {} \left[ \max _x {\bar{\psi }} (x) - \min _x {\bar{\psi }} (x) \right] + \left[ \max _x [ \psi (x) - {\bar{\psi }} (x) ]- \min _x [ \psi (x) - {\bar{\psi }} (x) ] \right] , \end{aligned}$$

where the last inequality follows by Lemma A.7. Moreover, by assumption, it is equicontinuous. Hence, by Arzela-Ascoli Theorem, it admits a non empty set of limit points (with respect to the uniform topology),

$$\begin{aligned} \omega (\psi ):= \{ {\bar{\psi }} \in {\mathcal {C}}({\mathbb {X}}): \exists \{ k_n \}_{n=1}^{+\infty }, k_n \rightarrow + \infty : {\bar{\psi }} = \lim _{n \rightarrow + \infty } {\check{T}}^{k_n} \psi (x) \}. \end{aligned}$$

Note that each limit point in \(\omega (\psi )\) is continuous and fulfills the same continuity inequality,

$$\begin{aligned} |\psi (x_1)- \psi (x_2)| \le \gamma (|x_1-x_2|) \end{aligned}$$
(5.2)

By Lemma A.11, the function \(W( [\psi ]_n)= W (\psi ):= d(\psi ,T \psi )\) is non-increasing along the iteration of \({\check{T}}\), viz. \(W({\check{T}}^k \psi )\) is a non-increasing sequence, bounded from below by 0. In addition W is continuous in the topology of uniform convergence. Hence, the limit \(\lim _{k \rightarrow + \infty } W ({\check{T}}^k \psi )\) exists, and we denote it by \({\bar{W}}\). Because of continuity of W and uniform convergence to the limit points we also have \(W ( {\bar{\psi }} ) = {\bar{W}}\) for all \({\bar{\psi }} \in \omega ( \psi )\). Notice that \(\omega (\psi )\) is invariant with respect to \({\check{T}}\). Hence, for any \({\bar{\psi }} \in \omega (\psi )\) and any \(k \in {\mathbb {N}}\) we have \(W ( {\check{T}}^k {\bar{\psi }} ) = {\bar{W}}\). By combined inequalities (A.12) and (A.13) we see that \(W ( {\check{T}}^k {\bar{\psi }} )\) can be constant only provided \(\min _{x \in {\mathbb {X}}} {\check{T}}^k {\bar{\psi }} (x) - T {\check{T}}^k {\bar{\psi }}(x)\) and \(\max _{x \in {\mathbb {X}}} {\check{T}}^k {\bar{\psi }} (x) - T {\check{T}}^k {\bar{\psi }}(x)\) are constant with respect to k. By Corollary A.26, the sequence \({\check{T}}^k {\bar{\psi }}\) is bounded and converges monotonically to a lower semi-continuous limit. Notice that, by invariance of \(\omega (\psi )\) and the fact that all elements of \(\omega (\psi )\) fulfill inequality (5.2) follows equicontinuity of \({\check{T}}^k {\bar{\psi }}\), hence the limit \(\psi _{\infty } (x):= \lim _{k \rightarrow + \infty } {\check{T}}^k {\bar{\psi }} (x)\) not only exists (as previously established), but is also continuous and, by Dini’s Theorem, convergence is uniform in \({\mathbb {X}}\). By continuity of the \({\check{T}}\) operator with respect to uniform convergence, \(\psi _{\infty }(x)\) is a solution of the shifted Bellman Equation and \(0=d( \psi _{\infty }, T \psi _{\infty })=d( {\bar{\psi }}, T {\bar{\psi }} )\). This shows that any element of \(\omega (\psi )\) is an solution of the shifted Bellman Equation. We only need to show that \(\omega (\psi )\) is a singleton. This follows because of Lemma A.7. Indeed, the distance to any element \({\bar{\psi }}\) of \(\omega (\psi )\) is non increasing along the iteration \({\check{T}}^k \psi \). Since such distance is converging to 0 along some subsequence \({\check{T}}^{k_n} \psi \), then it is converging to 0 along the sequence \({\check{T}}^k \psi \) itself. \(\square \)

In the remainder of this section we derive a sufficient condition for Assumption 5.2, which is based on a controllability condition.

Definition 5.6

Given a system as in (2.1) and the associated state and input constraint sets \({\mathbb {X}}\) and \({\mathbb {U}}(x)\), we say that the system fulfills Uniform Incremental Continuous Controllability, if there exists \(N \in {\mathbb {N}}\), and a class \({\mathcal {K}}_{\infty }\) function \(\delta \), such that, for all \(x_1, x_2 \in {\mathbb {X}}\), and for all \({{\textbf {u}}}_1 \in {\mathbb {U}}_N (x_1)\), there exists \({{\textbf {u}}}_2 \in {\mathbb {U}}_N (x_2)\) such that \(\phi (N,x_1,{{\textbf {u}}}_1) = \phi (N,x_2,{{\textbf {u}}}_2)\), and in addition: \(\Vert {{\textbf {u}}}_1 - {{\textbf {u}}}_2 \Vert \le \delta (|x_1-x_2|)\).

A milder controllability assumption can be formulated by considering continuity with respect to the cost alone, rather than the control input. To this end, let \(J_N(x, {{\textbf {u}}})\), for \(x \in {\mathbb {X}}\) and \({{\textbf {u}}} \in {\mathbb {U}}_N (x)\) denote the following:

$$\begin{aligned} J_N (x, {{\textbf {u}}} ) = \sum _{t=0}^{N-1} \ell (\phi (t,x,{{\textbf {u}}}),u(t) ). \end{aligned}$$

Definition 5.7

Given a system as in (2.1) and the associated state and input constraint sets \({\mathbb {X}}\) and \({\mathbb {U}}(x)\), we say that the system fulfills Uniform Incremental Controllability Continuous in Cost, if there exists \(N \in {\mathbb {N}}\), and a class \({\mathcal {K}}_{\infty }\) function \(\delta \), such that, for all \(x_1, x_2 \in {\mathbb {X}}\), and for all \({{\textbf {u}}}_1 \in {\mathbb {U}}_N (x_1)\), there exists \({{\textbf {u}}}_2 \in {\mathbb {U}}_N (x_2)\) such that \(\phi (N,x_1,{{\textbf {u}}}_1) = \phi (N,x_2,{{\textbf {u}}}_2)\), and in addition: \(| J_N (x_1,{{\textbf {u}}}_1) - J_N(x_2,{{\textbf {u}}}_2) | \le \delta (|x_1-x_2|)\).

Remark 5.8

Notice that Uniform Incremental Continuous Controllability implies Uniform Incremental Controllability Continuous in Cost. This is because the considered stage-cost function and the dynamics are both continuous, moreover cost is considered only over a finite interval of length N. The converse implication is not true in general.

The following proposition now shows that Uniform Incremental Controllability Continuous in Cost implies the equicontinuity in Assumption 5.2 required in Theorem 5.4.

Proposition 5.9

Assume that system (2.1) fulfills the controllability assumption in Definition 5.7. Then, for any continuous function \(\psi : {\mathbb {X}} \rightarrow {\mathbb {R}}\), the sequence \(\{ T^k \psi \}_{k=0}^{+ \infty }\) is equicontinuous, i.e., Assumption 5.2 is fulfilled.

Proof

Consider any \(k\in {\mathbb {N}}\), and arbitrary \(x_1,x_2 \in {\mathbb {X}}\). Let \({{\textbf {u}}}^*_1 \in {\mathbb {U}}_{k+N}(x_1)\) be any optimal control sequence corresponding to the optimal control problem with terminal penalty function \(\psi \) and horizon \(k+N\), with initial condition \(x_1\). Then, from the optimality principle:

$$\begin{aligned} T^{k+N} \psi (x_1) = J_N (x_1, {{\textbf {u}}}^*_1) + T^k \psi (\phi (N,x_1,{{\textbf {u}}}^*_1) ). \end{aligned}$$
(5.3)

Let now, \({{\textbf {u}}}_2\) be as in Definition 5.7. Clearly, applying \({{\textbf {u}}}_2\) is, in general, suboptimal from initial condition \(x_2\). Hence, the inequality below holds:

$$\begin{aligned} T^{k+N} \psi (x_2) \le J_N (x_2, {{\textbf {u}}}_2 ) + T^k \psi (\phi (N,x_2,{{\textbf {u}}}_2) ). \end{aligned}$$
(5.4)

Combining equations (5.3) and (5.4) yields:

$$\begin{aligned} T^{k+N} \psi (x_2 ) - T^{k+N} \psi (x_1)\le & {} J_N (x_2, {{\textbf {u}}}_2 ) + T^k \psi (\phi (N,x_2,{{\textbf {u}}}_2) ) \\{} & {} - J_N (x_1, {{\textbf {u}}}^*_1) - T^k \psi (\phi (N,x_1,{{\textbf {u}}}^*_1) ) \\= & {} J_N (x_2, {{\textbf {u}}}_2 ) - J_N (x_1, {{\textbf {u}}}^*_1) \; \le \; \delta (|x_1-x_2|), \end{aligned}$$

where the first equality follows because \(\phi (N,x_1,{{\textbf {u}}}^*_1) = \phi (N,x_2,{{\textbf {u}}}_2)\), and the last inequality from Definition 5.7. Symmetric inequalities can be obtained swapping \(x_1\) and \(x_2\), yielding \(|T^{k+N} \psi (x_1) - T^{k+N} \psi (x_2)| \le \delta (|x_1-x_2|)\). This shows that equicontinuity holds on the tail of the sequence \(T^k \psi \). However, \(\{ T^k \psi \}_{k=0}^{N-1}\) is a finite family of continuous functions defined over a compact set (thus also fulfilling an equicontinuity property), and therefore equicontinuity of the whole sequence follows. \(\square \)

Unfortunately, due to the lack of a counterpart of Lemma 5.3, we currently do not have a controllability condition for ensuring the equicontinuity needed in Theorem 5.5 for the \({\check{T}}\) operator.

Remark 5.10

The operators \({\hat{T}}\) and \({\check{T}}\) are the main novel technical tool that we propose to compute solutions of shifted Bellman Equations and solve infinite horizon control problems. Notice that they come with different a priori guarantees of continuity of the limiting solution, i.e., upper semi-continuous and lower semi-continuous respectively. As regularity of solutions of the Bellman Equation is normally not available a priori, it is hard to state a criterion for choosing one or the other. Nevertheless, it is worth pointing out that, due to the fact that such operators share Lyapunov functionals (i.e., \(W (\psi ) = d(\psi , T\psi )\), see Lemmas A.8 and A.11, or Lemmas A.6 and A.7), one could also apply them in switched combinations or in random order while typically still retaining (or even improving) convergence. This topic is outside of the scope of the present manuscript

6 Convergence Analysis Without Continuity

In this section we provide a convergence result for the iteration using the \({{\hat{T}}}\) operator without assuming any continuity. This is possible if we assume a dissipativity condition and start the iteration from the negative storage function. The result can thus be seen as an extension of Proposition 2.2 to the shifted Bellman Equation with nontrivial shift \({b}\ne 0\). The only property we impose in this section on the iterates \(\psi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) is that they attain a minimum on \({\mathbb {X}}\). Then we can define

$$\begin{aligned} {[\psi ]_n}(x):= \psi (x) - \min _{x\in {\mathbb {X}}} \psi (x). \end{aligned}$$

We note that \({[\psi ]_n}\ge 0\) and \(\min _{x\in {\mathbb {X}}} {[\psi ]_n}(x)=0\) as well as \([\psi +{b}]_n = [\psi ]_n\) for all \({b} \in {\mathbb {R}}\) and start our analysis with a little auxiliary lemma.

Lemma 6.1

For any \({b}\in {\mathbb {R}}\) it holds that

$$\begin{aligned} {[}T(\psi + {b})]_n = [T\psi ]_n \quad \text{ and } \quad [{{\hat{T}}}(\psi + {b})]_n = [{{\hat{T}}}\psi ]_n.\end{aligned}$$

Proof

We have that

$$\begin{aligned} {{\hat{T}}} (\psi +b)= & {} \min \{ \psi +b, \underbrace{T(\psi +b)}_{=T\psi + b} + \underbrace{c(\psi +b, T(\psi +b)}_{=c(\psi ,T\psi )}\} \\= & {} \min \{\psi , T\psi + c(\psi , T\psi )\}+b \; = \; {{\hat{T}}}\psi + b. \end{aligned}$$

This implies the assertion since \([{{\hat{T}}} \psi +b]_n = [{{\hat{T}}}\psi ]_n\) for all \(b\in {\mathbb {R}}\). A similar computation works for T in place of \({{\hat{T}}}\). \(\square \)

We now first consider the case where \(\ell \ge 0\). To this end, we make the following assumption.

Assumption 6.2

There exists a nonempty set \(N\subset {\mathbb {X}}\) such that for any \(\psi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) with \(\psi \ge 0\) and \(\psi |_N\equiv 0\) we have that \(T\psi |_N\equiv 0\).

We denote by \(\psi |_N\) the restriction of function \(\psi \) over the set N, and by \(\equiv 0\) the fact that the function is identically 0 on its domain. We note that this assumption is satisfied for instance if \(\ell \ge 0\) and there is an equilibrium \((x^e,u^e)\) with \(\ell (x^e,u^e)=0\). Then one can choose \(N=\{x^e\}\).

Lemma 6.3

Assume \(\ell \ge 0\) and let Assumption 6.2 hold for some set N. Then for \(\psi ^0\equiv 0\) the sequence of functions \(\psi ^k:= [{{\hat{T}}}^k \psi ^0]_n\), \(k\in {\mathbb {N}}\), satisfies the following properties for all \(k\in {\mathbb {N}}\):

$$\begin{aligned} \begin{array}{llll} \text{(a) } &{} T^k \psi ^0 \ge \psi ^k, \qquad &{} \text{(b) } &{} T\psi ^k \ge \psi ^k, \\ \text{(c) } &{} \psi ^k|_N=0, &{} \text{(d) } &{} \psi ^{k+1} \ge \psi ^k. \end{array} \end{aligned}$$

Proof

By applying Lemma 6.1 inductively we see that \(\psi ^{k+1} = [{{\hat{T}}} \psi ^k]_n\). Moreover, we observe for all \(\psi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) the equality

$$\begin{aligned} ({{\hat{T}}} \psi )^n= & {} [ \min \{\psi ,T\psi +c(\psi ,T\psi )\} ]_n \; = \; [\min \{\psi -c(\psi ,T\psi ),T\psi \}+c(\psi ,T\psi ) ]_n \\= & {} [\min \{\psi -c(\psi ,T\psi ),T\psi \}]_n. \end{aligned}$$

Now we prove (a)–(d) by induction over k.

For \(k=0\), (a) and (c) hold trivially, while (b) and (d) hold because \(\psi ^0\equiv 0\) and \(T\psi ^0\ge 0\) (since \(\ell \ge 0\)) and \(\psi ^1\ge 0\) (by definition of the \([\cdot ]_n\) operator).

For \(k\rightarrow k+1\), assume that (a), (b), and (c) hold for \(\psi ^k\). We now prove these three properties for \(\psi ^{k+1}\) and start with (c). By the above computation it holds that

$$\begin{aligned} \psi ^{k+1} = [ {{\hat{T}}} \psi ^k ]_n= [ \min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\} ]_n.\end{aligned}$$

By induction assumption (b) we have that \(T\psi ^k\ge \psi ^k\) implying that \(c(\psi ^k,T\psi ^k)\le 0\) and thus \(\psi ^k-c(\psi ^k,T\psi ^k)\ge 0\). Since \(\ell \ge 0\) and \(\psi ^k\ge 0\) we moreover have \(T\psi ^k\ge 0\). By induction assumption (c) we know that \(\psi ^k|_N\equiv 0\). Thus, Assumption 6.2 yields \(T\psi ^k|_N\equiv 0\). Together this implies that \(\min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\}\ge 0\) and is equal to 0 on N. This implies that

$$\begin{aligned} \psi ^{k+1} = [ \min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\} ]_n = \min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\} \end{aligned}$$
(6.1)

and thus \(\psi ^{k+1}|_N\equiv 0\), i.e., (c) for \(k+1\).

Next we prove (b) for \(k+1\). Using (6.1) as well as the min commutativity and the translation invariance of T we obtain

$$\begin{aligned} T\psi ^{k+1}= & {} T \min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\} \\= & {} \min \{T\psi ^k-c(\psi ^k,T\psi ^k),TT\psi ^k\}. \end{aligned}$$

Now using the induction assumption for (b) and the monotonicity of T we obtain \(T\psi ^k\ge \psi ^k\) and \(TT\psi ^k \ge T\psi ^k\), implying, using (6.1) once more

$$\begin{aligned} \min \{T\psi ^k-c(\psi ^k,T\psi ^k),TT\psi ^k\} \ge \min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\} = \psi ^{k+1}. \end{aligned}$$

This shows (b) for \(k+1\). From the induction assumption (a) and (b) and monotonicity of T we obtain

$$\begin{aligned} T^{k+1}\psi ^0 = T T^k\psi ^0 \ge T\psi ^k \ge \psi ^k, \end{aligned}$$

which shows (a) for \(k+1\).

Finally, for showing (d), we use that the induction assumption for (b) yields \(c(\psi ^k,T\psi ^k)\le 0\) and \(T\psi ^k\ge \psi ^k\). Together with (6.1) we obtain

$$\begin{aligned} \psi ^{k+1} = \min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\} \ge \min \{\psi ^k,\psi ^k\} = \psi ^k. \square \end{aligned}$$

Proposition 6.4

Assume \(\ell \ge 0\), let Assumption 6.2 hold and assume that \(V_\infty ^{\psi ^0}\) is finite for \(\psi ^0\equiv 0\). Then the sequence of functions \(\psi ^k={[{{\hat{T}}}^k \psi ^0]_n}\), \(k\in {\mathbb {N}}\), converges to \(V_\infty ^{\psi ^0}\), i.e., in particular to a solution of the Bellman Equation.

Proof

From Lemma 6.3 it follows that \(\psi ^k\) is increasing and bounded from above by \(V_\infty ^{\psi ^0}\). Hence, it converges to some limit function \(\psi ^\infty \le V_\infty ^{\psi ^0}\). Now from \(T\psi ^k\ge \psi ^k\) we obtain that

$$\begin{aligned} c(\psi ^k,T\psi ^k) \le - \frac{1}{2} \max _{{{\tilde{x}}}\in {\mathbb {X}}} [T\psi ^k({{\tilde{x}}})-\psi ^k({{\tilde{x}}})],\end{aligned}$$

implying that

$$\begin{aligned} \psi ^k(x)-c(\psi ^k,T\psi ^k) \ge \psi ^k(x) + \frac{1}{2} \max _{{{\tilde{x}}}\in {\mathbb {X}}} [T\psi ^k({{\tilde{x}}})-\psi ^k({{\tilde{x}}}) ] \ge \frac{1}{2} (\psi ^k(x) + T\psi ^k(x)). \end{aligned}$$

Since \(T\psi ^k\ge \psi ^k\) we moreover obtain that \(T\psi ^k\ge \frac{1}{2} (\psi ^k(x) + T\psi ^k(x))\). Inserting these inequalities into (6.1) then yields

$$\begin{aligned} \psi ^{k+1} = \min \{\psi ^k-c(\psi ^k,T\psi ^k),T\psi ^k\} \ge \frac{1}{2} (\psi ^k + T\psi ^k)\end{aligned}$$

and using this inequality and \(T(\psi _1/2 + \psi _2/2) \ge (T\psi _1)/2 + (T\psi _2)/2\) yields

$$\begin{aligned} \psi ^1\ge & {} \frac{1}{2} \psi ^0 + \frac{1}{2} T\psi ^0\\ \psi ^2\ge & {} \frac{1}{2}\psi ^1 + \frac{1}{2} T\psi ^1 \; \ge \; \frac{1}{2} \left( \frac{1}{2} \psi ^0 + \frac{1}{2} T\psi ^0\right) + \frac{1}{2} \left( \frac{1}{2} T\psi ^0 + \frac{1}{2} T^2\psi ^0\right) \\= & {} \frac{1}{4}(\psi ^0 + 2T\psi ^0 + T^2\psi ^0)\\ \psi ^3\ge & {} \frac{1}{2}\psi ^2 + \frac{1}{2} T\psi ^2 \; \ge \; \frac{1}{2} \left( \frac{1}{2} \psi ^1 + \frac{1}{2} T\psi ^1\right) + \frac{1}{2} \left( \frac{1}{2} T\psi ^1 + \frac{1}{2} T^2\psi ^1\right) \\\ge & {} \frac{1}{8}(\psi ^0+3T\psi ^0+3T^2\psi ^0+T^3\psi ^0)\\&\vdots&\end{aligned}$$

which by induction yields the general formula

$$\begin{aligned} \psi ^k \ge \frac{1}{2^k}\sum _{l=0}^{k} \left( {\begin{array}{c}k\\ l\end{array}}\right) T^l\psi ^0.\end{aligned}$$

Since \(\sum _{l=0}^{k} \left( {\begin{array}{c}k\\ l\end{array}}\right) =2^k\) grows exponentially in k while for each fixed \(p\in {\mathbb {N}}\) the sum \(\sum _{l=0}^{p-1} \left( {\begin{array}{c}k\\ l\end{array}}\right) \) grows only polynomially in k, we have that

$$\begin{aligned} \frac{\sum _{l=p}^k \left( {\begin{array}{c}k\\ l\end{array}}\right) }{2^k} = 1 - \underbrace{\frac{\sum _{l=0}^{p-1} \left( {\begin{array}{c}k\\ l\end{array}}\right) }{2^k}}_{\rightarrow 0} \rightarrow 1\end{aligned}$$

as \(k\rightarrow \infty \). Combining this with \(T^q\psi ^0\ge T^p\psi ^0\ge 0\) for \(q\ge p\ge 0\), we obtain that for each \(C\in (0,1)\) and \(p\in {\mathbb {N}}\) there is \(k_{C,p}\in {\mathbb {N}}\) with

$$\begin{aligned} \psi ^k \ge \frac{1}{2^k}\sum _{l=0}^{k} \left( {\begin{array}{c}k\\ l\end{array}}\right) T^l\psi ^0 \ge \frac{1}{2^k}\sum _{l=p}^{k} \left( {\begin{array}{c}k\\ l\end{array}}\right) T^l\psi ^0 \ge \frac{1}{2^k}\sum _{l=p}^{k} \left( {\begin{array}{c}k\\ l\end{array}}\right) T^p\psi ^0 \ge C T^p\psi ^0\end{aligned}$$

for all \(k\ge k_{C,p}\). This implies that

$$\begin{aligned} \psi ^\infty = \lim _{k\rightarrow \infty }\psi ^k \ge C\lim _{p\rightarrow \infty }T^p \psi ^0 = CV_\infty ^{\psi ^0}\end{aligned}$$

for any \(C\in (0,1)\). Since C can be chosen arbitrarily close to 1, this implies \(\psi ^\infty \ge V_\infty ^{\psi ^0}\), which finishes the proof. \(\square \)

Now we extend our results to dissipative stage costs. The dissipativity inequality here is similar to (2.9), where we explicitly include a shift of the cost function by b in the inequality.

Assumption 6.5

There exists a continuous storage function \(\lambda :{\mathbb {X}}\rightarrow {\mathbb {R}}\) and a value \({b}\in {\mathbb {R}}\) such that

$$\begin{aligned} \lambda ( f(x,u) ) \le \lambda (x) + \ell (x,u) - {b} \qquad \forall \, (x,u) \in {\mathbb {Z}} \end{aligned}$$
(6.2)

For such a function \(\lambda \), similar to (4.4) we define the rotated cost

$$\begin{aligned} {{\tilde{\ell }}}(x,u) = \ell (x,u) - {b} + \lambda (x) - \lambda (f(x,u)) \end{aligned}$$
(6.3)

and the corresponding operators \(\widetilde{T}\) and \(\hat{{\widetilde{T}}}\). The next lemma extends Lemma 4.3.

Lemma 6.6

For any continuous function \(\lambda :{\mathbb {X}}\rightarrow {\mathbb {R}}\) and for all \(k\in {\mathbb {N}}\) the identities

$$\begin{aligned} \widetilde{T}^k\psi = T^k(\psi -\lambda )+\lambda - k {b} \quad \text{ and } \quad \hat{{{\widetilde{T}}}^k}\psi = {{\hat{T}}}^k(\psi -\lambda )+\lambda \end{aligned}$$

hold.

Proof

The first identity follows with an analogous proof as for Lemma 4.3 followed by induction over k. For the second identity we compute

$$\begin{aligned} \hat{{\widetilde{T}}}\psi= & {} \min \{ \psi ,\widetilde{T}\psi + c(\psi ,\widetilde{T}\psi )\} \\= & {} \min \{ \psi , T(\psi -\lambda ) + \lambda - {b} + \underbrace{c(\psi ,T(\psi -\lambda ) + \lambda - {b})}_{=c(\psi -\lambda ,T(\psi -\lambda )) + {b}} \}\\= & {} \min \{ \psi - \lambda , T(\psi -\lambda ) + c(\psi -\lambda ,T(\psi -\lambda )) \} + \lambda \\= & {} {{\hat{T}}}(\psi -\lambda ) + \lambda . \end{aligned}$$

From this, the statement for \(\hat{{{\widetilde{T}}}^k}\psi \) follows by induction over k. \(\square \)

Assumption 6.7

There exists a nonempty set \(N\subset {\mathbb {X}}\) such that for any \(\psi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) with \(\psi \ge - \lambda \) and \(\psi (x) = - \lambda (x)\) for all \(x\in N\) we have that \(T\psi (x) = c - \lambda (x)\) for all \(x\in N\).

Somewhat similar to Assumption 6.2, for dissipative optimal control problems Assumption 6.7 holds with \(N=\{x^e\}\) for an equilibrium \((x^e,u^e)\) with \(\ell (x^e,u^e)=c\). This is because dissipativity implies \({{\tilde{\ell }}} \ge 0\) and \({{\tilde{\ell }}}(x^e,u^e)=0\). Together this yields for all \(u\in {\mathbb {U}}(x^e)\) that

$$\begin{aligned} \ell (x^e,u) + \psi (f(x^e,u)) \ge \ell (x^e,u) - \lambda (f(x^e,u)) = {{\tilde{\ell }}}(x^e,u) + c - \lambda (x^e) \ge c - \lambda (x^e),\end{aligned}$$

while for \(u=u^e\) we get

$$\begin{aligned} \ell (x^e,u^e) + \psi (f(x^e,u^e)) = c - \lambda (x^e), \end{aligned}$$

implying that this is the minimum and hence \(T\psi (x^e) = c - \lambda (x^e)\). The situation just described in particular occurs for strictly dissipative problems, cf. eq. (4.1).

Theorem 6.8

Assume that the optimal control problem is dissipative in the sense of Assumption 6.5, that Assumption 6.7 holds and that there is \(M>0\) with \(T^k(\psi ^0)\le M + ck\) for all \(k\in {\mathbb {N}}\) and \(\psi ^0 = -\lambda \). Then the sequence of functions \(\psi ^k={[{{\hat{T}}}^k \psi ^0]_n}\), \(k\in {\mathbb {N}}\), converges to a solution of the shifted Bellman Equation.

Proof

The assumptions together with Lemma 6.6 imply that the operator \(\hat{{\widetilde{T}}}\) corresponding to the cost \({{\tilde{\ell }}}\) from (6.3) satisfies all assumptions of Proposition 6.4. Hence, for \({{\tilde{\psi }}}^0\equiv 0\) the sequence \({{{\tilde{\psi }}}^k=[\hat{{{\widetilde{T}}}^k}{{\tilde{\psi }}}^0]_n}\) converges to a solution \({{\tilde{\psi }}}^\infty \) of the Bellman Equation for \(\tilde{\ell }\), i.e., \(\widetilde{T}{{\tilde{\psi }}}^\infty ={{\tilde{\psi }}}^\infty \). Because of Lemma 6.6 and using that \({[\psi +\phi ]_n = [[\psi ]_n+\phi ]_n}\) for all functions \(\psi ,\phi :{\mathbb {X}}\rightarrow {\mathbb {R}}\) attaining a minimum on \({\mathbb {X}}\) we obtain that

$$\begin{aligned} \psi ^k = [{{\hat{T}}}^k ({{\tilde{\psi }}}^0 - \lambda )]_n = [\hat{{{\widetilde{T}}}^k}({{\tilde{\psi }}}^0) - \lambda ]_n = [[\hat{{{\widetilde{T}}}^k}({{\tilde{\psi }}}^0)]_n - \lambda ]_n = [{{\tilde{\psi }}}^k - \lambda ]_n \end{aligned}$$

implying that

$$\begin{aligned} \psi ^\infty = {[{{\tilde{\psi }}}^\infty - \lambda ]_n.} \end{aligned}$$

From this we get, again using Lemma 6.6 and \(w:={[{{\tilde{\psi }}}^\infty - \lambda ]_n}-{{\tilde{\psi }}}^\infty {+} \lambda \),

$$\begin{aligned} T\psi ^\infty= & {} T ([{{\tilde{\psi }}}^\infty - \lambda ]_n) \; = \; T ({{\tilde{\psi }}}^\infty - \lambda + w)\\= & {} T ({{\tilde{\psi }}}^\infty - \lambda ) + w \; = \; \widetilde{T}{{\tilde{\psi }}}^\infty - \lambda + w + c\\= & {} {{\tilde{\psi }}}^\infty - \lambda + w + c \; = \; [{{\tilde{\psi }}}^\infty - \lambda ]_n + c \; = \; \psi ^\infty + c. \end{aligned}$$

This finishes the proof. \(\square \)

7 Examples and Counterexamples

In this section we illustrate the performance of the iterations proposed and discussed in this paper with various examples.

7.1 Comparison of Solution Methods

The examples in Sect. 7.1 are meant to illustrate different approaches for the formulation and solution of infinite horizon optimal control problems using dynamic programming. In particular, they emphasize the need for a terminal penalty function and highlight the benefits of using the \({\hat{T}}\) and \({\check{T}}\) operators for their solution.

7.1.1 Need for Terminal Penalty Function

We consider the following scalar linear system:

$$\begin{aligned} x^+ = - x + u \end{aligned}$$
(7.1)

along with state x taking values in \({\mathbb {X}} = [-2,2]\), and input constraints \({\mathbb {U}} (x) = [-2+x,2+x]\). The stage cost is piecewise linear and defined as:

$$\begin{aligned} \ell (x,u) = \min \left\{ |x-1| - \frac{1}{4}, |x+1| + \frac{1}{4} \right\} + |u|. \end{aligned}$$
(7.2)

Notice that the state-dependent part of the cost has two local minima, at x equal \(-1\) and \(+1\). Moreover, for \(u=0\) solutions are 2-periodic and fulfill \(x(t)= (-1)^t x(0)\). It is possible to show that the optimal average cost is 0, achieved by the solution \(x(t) = (-1)^t\) corresponding to \(u(t)=0\). We show that using \(\psi =0\) does not lead to a convergent sequence of cost-to-go functions, see Fig. 1. In particular, \(T^k \psi \) converges to a period 2 oscillation between two distinct piecewise linear functions after 2 iterations. Accordingly the optimal state-feedback (which is bang-bang) does not converge and will differ at least in some regions of state-space depending on whether an horizon of odd or even length is considered.

Fig. 1
figure 1

Sequence of cost-to-go functions \(T^k \psi \), from \(\psi =0\) for (7.1), (7.2)

Fig. 2
figure 2

Sequence of cost-to-go functions \(T^k \psi \), from \(\psi =-\lambda _1\) (left) and \(\psi =-\lambda _2\) (right) for (7.1), (7.2)

In order to obtain meaningful infinite horizon costs and feedback policies we need to use a suitable penalty function for the final state, as shown in Proposition 2.2, in particular by letting \(\psi =- \lambda \) where \(\lambda \) is a storage function. The intuition is that letting \(\tau \) go to infinity without terminal penalty function is akin to minimising costs up to time \(\tau \) assuming that time stops afterwards and therefore the final state reached does not carry any value. This is, generally speaking, unsuitable as our optimal costs might be affected by boundary effects that propagate all the way back to decisions taken at time 0, as in the current example, where different decisions are optimal considering an even or odd time horizon. The flexibility afforded by a terminal cost (which of course never materializes in the case of infinite horizon), is to allow costs to converge to a steady-state expression and decisions to asymptotically converge to a time-invariant policy. For the considered example one can show that the function:

$$\begin{aligned} \lambda _1(x) = \min \left\{ |x-1|+ \frac{1}{2}, |x+1| \right\} / 2 \end{aligned}$$

is a storage function. Figure 2(left) shows that the iteration initialized with \(\psi ={-} \lambda _1\) converges. Notice that the cost monotonically converges in 3 steps to its infinite horizon value. It is well known that storage functions need not be unique. For instance the following function is another storage function:

$$\begin{aligned} \lambda _2 (x) = - \min \left\{ |x+1| +\frac{1}{4}, |x-1| - \frac{1}{4}, 2 |x+1| \right\} \end{aligned}$$

Our results show that any storage function can be used in order to define a suitable infinite horizon cost, provided this exists finite. We show in Fig. 2(right) how choosing a different penalty function \(\psi =-\lambda _2\) still leads, for this particular example, to the same infinite horizon cost, with convergence in just one time step. It is important to emphasize that convergence of the cost iteration to a steady state solution does not translate to “optimal trajectories being or converging to an equilibrium”. In fact, in this example, optimal trajectories converge to period 2 solutions of the type previously described. On the other hand, convergence of the iteration to a steady state allows to extract a time-invariant optimal policy in the form of state-feedback. Solutions achieving the minimum of the Bellman operator are optimal in many respects, as clarified in Proposition 3.5. In this particular case, since convergence of the iteration to its steady-state solution is achieved in a finite number of steps (rather than asymptotically) it is possible to show that such solutions are also optimal with respect to infinite horizon costs with terminal penalty function \(-\lambda _1\) and \(- \lambda _2\) respectively.

7.1.2 Solution with \({\hat{T}}\) Operator

We consider below the same system and constraints as in the previous example, namely

$$\begin{aligned} x^+ = - x + u \end{aligned}$$
(7.3)

along with state x taking values in \({\mathbb {X}} = [-2,2]\), and input constraints \({\mathbb {U}} (x) = [-2+x,2+x]\). The stage cost is merely a shifted version of the previous piecewise linear cost:

$$\begin{aligned} \ell (x,u) = \min \left\{ |x-1| - \frac{15}{4}, |x+1| - \frac{13}{4} \right\} + |u|. \end{aligned}$$
(7.4)

Example 7.1.1 was crafted to have optimal average cost equal to 0 so that iteration convergence could be achieved just by choosing a suitable terminal cost function. When formulating a generic optimal control problem one cannot expect to have 0 optimal average cost, except for very special cases. Introducing a shifted version of the cost is meant to highlight the power of the approach, which is genuinely “shift-invariant” and does not rely on a priori knowledge of the optimal average cost. Therefore, rather then applying ad hoc considerations trying to figure out the optimal average performance (which in this case is \(-7/2\)) and correspondingly shifting \(\ell \) in order to make the problem into its previous version with optimal 0 average, we directly apply the operator \({\hat{T}}\) to an arbitrary initialization \(\psi (x)=0\). We show in Fig. 3, the resulting non-increasing sequence of functions \({\hat{T}}^k \psi \), and the corresponding limit, which is a solution of the shifted Bellman Equation. The value of shift applied \(c ({\hat{T}}^k \psi ,T {\hat{T}}^k \psi )\) is displayed in Fig. 4. Notice that the shifts converge to 7/2, which is indeed the positive translation needed in order to compensate for the optimal infinite horizon average performance of \(-7/2\). To highlight the power of the \({\hat{T}}\) iteration, which simultaneously adjusts to the right value of shift and asymptotic cost, we show in Fig. 5 its evolution for a different initialisation \(\psi (x)=- \sin (x)\).

Fig. 3
figure 3

Sequence \({\hat{T}}^k \psi \) from initialisation \(\psi =0\) (left) and limiting function (right) for (7.3), (7.4)

Fig. 4
figure 4

Sequence of shifts \(c ({\hat{T}}^k \psi ,T {\hat{T}}^k \psi )\) for (7.3), (7.4)

Fig. 5
figure 5

Sequence \({\hat{T}}^k \psi \) from initialisation \(\psi =- \sin (x)\) for (7.3), (7.4)

Fig. 6
figure 6

Sequence \({\check{T}}^k \psi \) from initialisation \(\psi =-\sin (x)\) with shift term c (left) and shift term \({{\tilde{c}}}\) with \(\alpha =3/4\) (right)

7.1.3 Solution with \({\check{T}}\) Operator

We provide next numerical evidence of convergence using the \({\check{T}}\) operator in Fig. 6 (left). It is also interesting to remark that both \({\hat{T}}\) and \({\check{T}}\) operators show robustness with respect to the definition of the shift term \(c( \psi , T \psi )\). Specifically, any strict convex combination (\(\alpha \in (0,1)\) ):

$$\begin{aligned} {\tilde{c}}( \psi _1, \psi _2):= \alpha \max _{x \in {\mathbb {X}}} \big [ \psi _1 (x) - \psi _2 (x) \big ] + (1- \alpha ) \min _{x \in {\mathbb {X}}} \big [ \psi _1 (x) - \psi _2 (x) \big ] \end{aligned}$$

yields convergence, although at possibly different speed. To this end we show the iteration corresponding to \(\alpha =3/4\) in Fig. 6 (right).

7.2 Non Uniqueness of Optimal Solutions

The following examples illustrate non-uniqueness phenomena arising when dealing with infinite horizon control problems. In particular, they emphasize non uniqueness of the fixed-points of the Bellman Equation and/or of the associated optimal feedback policies.

7.2.1 Example With Multiple Solutions of the Bellman Equation

Consider the scalar linear system:

$$\begin{aligned} x^+ = - x + u \end{aligned}$$
(7.5)

along with the state constraint: \({\mathbb {X}} = [-2,2]\) and input constraints \({\mathbb {U}} (x) = [-2+x,2 + x]\). We consider a piecewise linear stage cost defined as:

$$\begin{aligned} \ell (x,u) = \varepsilon x + |u| \end{aligned}$$
(7.6)

for some constant \(\varepsilon \) which will need to be sufficiently small. Any function \(\psi (x) = \alpha |x| + \varepsilon x/2\) is a solution of the (shifted) Bellman Equation, as long as \(0 \le \alpha < 1- \varepsilon \). In fact:

$$\begin{aligned} T \psi{} & {} = \min _{u \in {\mathbb {U}} (x) } \varepsilon x + |u| + \alpha | -x + u | + \varepsilon (u-x) /2\\{} & {} = \varepsilon x / 2 + \min _{u \in {\mathbb {U}}(x) } |u| + \alpha | -x + u | + \varepsilon u. \end{aligned}$$

We notice that if \(0 \le \alpha <1- \varepsilon \) then the optimal value is achieved for \(u=0\), since the slope of the absolute value of |u| dominates the slope of the other terms. In particular, substituting \(u=0\) yields \(T \psi (x) = \alpha |x| + \varepsilon x/2\). Hence there are infinitely many (even continuous) solutions to the shifted Bellman Equation (3.6) (although the associated optimal feedback policies happen to be the same). We remark that because of Theorem 4.4 this implies that the problem is not strictly dissipative. Applying the \({\check{T}}\) iteration from the \(\psi \equiv 0\) initial condition and converges in one step to a fixed point of the type previously considered, in particular for \(\alpha =\epsilon =0.1\). The final fixed point depends in a non-trivial way from the initialisation. For instance, applying the \({\hat{T}}\) iteration from some non-monotone function \(\psi (x)=-\sin (x) + 2 \cos (3x)\) converges to a fixed point of different shape, hinting at some even stronger variety of possible steady-state solutions, see Fig. 7.

Fig. 7
figure 7

\({\hat{T}}\) iteration for non monotone initial function \(\psi \)

7.2.2 Example with Multiple Optimal Feedback Policies

We consider the following scalar linear system:

$$\begin{aligned} x^+ = x + u \end{aligned}$$
(7.7)

along with the state constraint \({\mathbb {X}} = [-1,1]\) and input constraints \({\mathbb {U}} (x) =[-1-x,1-x]\). We consider a piecewise linear stage cost defined as:

$$\begin{aligned} \ell (x,u) = 1 - |x| + |u|/2. \end{aligned}$$
(7.8)

Notice that, for each given \(x \in {\mathbb {X}}\), \(u=0\) minimizes the stage cost and makes x into an equilibrium for the system. Hence, maximizing |x| so as to minimize \(\ell \), the optimal average performance is achieved for the equilibrium solutions \(x= \pm 1\) provided a zero input is applied. Consider the following terminal penalty functions:

$$\begin{aligned} \psi _1 (x)&= 1 - |x| + (1+x)/2 \nonumber \\ \psi _2 (x)&= 1 - |x| + (1-x)/2 \end{aligned}$$
(7.9)

As seen in Fig. 8, the functions \(\psi _1\) and \(\psi _2\) assign different terminal costs to the two optimal equilibria. In particular \(\psi _1\) favours \(-1\), with 0 terminal cost, while \(\psi _2\) favours \(+1\).

Fig. 8
figure 8

Multiple solutions of Bellman Equation

Both functions fulfill the Bellman Equation. In fact:

$$\begin{aligned} \min _{u \in {\mathbb {U}} (x) } \ell (x,u) + \psi _1 (f(x,u)){} & {} = \min _{u \in [-1-x,1-x]} 1 - |x| + |u|/2 + [ 1 - |x+u|\\{} & {} \qquad + (1+x+u)/2 \\{} & {} = 1 - |x| + (1+x)/2, \end{aligned}$$

which is achieved for \(u_1^*(x) = -1 -x\). Similarly one can show that \(u_2^*(x)=1-x\) achieves the optimum for \(\psi _2\) and that \(\psi _2\) is a solution of the Bellman Equation. Notice that:

$$\begin{aligned} {\hat{\psi }} (x) = \frac{3}{2} ( 1 -|x| ) = \min \{ \psi _1(x), \psi _2 (x) \} \end{aligned}$$

is also a legitimate choice of terminal penalty function. In fact, this is the infimum element in \(\Psi \), and is therefore the terminal penalty function that corresponds to the cheapest infinite horizon transient cost. As shown in Proposition 4.1, feedback policies corresponding to different solutions of the shifted Bellman Equation, share the same infinite horizon average cost. Notice, in addition, that for any constants \(c_1\) and \(c_2\), the function:

$$\begin{aligned} \psi (x) = \min \{ \psi _1 (x) +c_1, \psi _2 (x) + c_2 \}, \end{aligned}$$

solves the shifted Bellman Equation. In fact, in this case, it can be shown that every solution of the shifted Bellman Equation is of this form. This result is likely to admit an extension to more general control set-ups.

7.3 Regularity of Fixed-Points of Bellman Equation

The following examples are meant to illustrate potential discontinuity and unboundedness issues of the fixed-point of the (shifted) Bellman Equation.

7.3.1 Example With Lower Semi-Continuous Solution of the Bellman Equation

Consider the following bilinear scalar system:

$$\begin{aligned} x^+ = x (1 + u ) \end{aligned}$$
(7.10)

with state taking values in \({\mathbb {X}} =[-2,2]\) and input constraints:

$$\begin{aligned} {\mathbb {U}} (x) = [ -2,0 ]. \end{aligned}$$

Let the stage cost be piecewise linear defined according to:

$$\begin{aligned} \ell (x,u) = \max \{ 0, x \} + |u|. \end{aligned}$$

Notice that for \(u=0\) every point is an equilibrium. Hence, simply letting \(u=0\) whenever the initial condition is \(\le 0\) achieves the minimum average cost. If the initial condition is positive, the best control action is \(u=-1\). Indeed, an input \(u \le -1\) is needed in order to leave the set of positive states and enter the negative semi-axis, where the optimal average performance can be achieved. Hence, the best choice, given the penalty |u| on inputs, is to have \(u=-1\). Moreover, waiting to apply such a control action does not pay off as the same cost will need to be incurred at some point in the future in order to switch to negative states. The following function is a lower semi-continuous solution of the associated Bellman Equation:

$$\begin{aligned} \psi (x) = \left\{ \begin{array}{cl} 0 &{} \text {if } x \le 0 \\ 1 + x &{} \text {if } x > 0 \end{array} \right. \end{aligned}$$

which is achieved for the following control policy:

$$\begin{aligned} u^*(x) = \left\{ \begin{array}{cl} 0 &{} \text {if } x \le 0 \\ -1 &{} \text {if } x > 0 \end{array} \right. \end{aligned}$$

We show in Fig. 9, how the iterations of the operators \({\hat{T}}\) and \({\check{T}}\) behave when initialised from \(\psi (x)=0\).

Fig. 9
figure 9

Iteration of \({\check{T}}^k \psi \) and \({\hat{T}}^k \psi \)

It is worth pointing out that while both sequences seem to asymptotically approximate the correct ‘shape’ of infinite-horizon cost, the theory confirms that \({\hat{T}}^k \psi (x)\) cannot be bounded, since its pointwise limit is known to be at least upper semi-continuous, which is not the case for the solution in the considered example. Essentially two possibilities are left: numerical round-off errors make \({\hat{T}}\) converge to something that looks approximately correct or \({\hat{T}}^k \psi \) diverges to \(- \infty \), though very slowly.

7.3.2 Example With Unbounded Infinite Horizon Cost

Consider the following bilinear scalar system:

$$\begin{aligned} x^+ = x u \end{aligned}$$
(7.11)

with state \(x \in [0,1]:= {\mathbb {X}}\) and \(u \in [1/2,1]\). Consider the stage cost:

$$\begin{aligned} \ell (x,u) = |u-1| + |x|. \end{aligned}$$
(7.12)

We claim that the optimal average cost is 0. In fact, the control sequence \(u(t) = 1/2\) for \(t = 0 \ldots K-1\) and \(u(t) = 1\) for \(t >K\) yields: \(x(K) = x(0)/2^K\), and \(x(t)=x(K)\) for \(t \ge K\). Notice that \(\ell (x(t),u(t)) = |x(0)|/2^K <= 1/ 2^K\) for all \(t \ge K\). Hence the average cost can be made less or equal than \(1/2^K\) for any positive integer K, and this, together with the inequality \(\ell (x,u) \ge 0\), proves 0 optimal average cost. We show next that the optimal cost is unbounded.

By induction, \(x(k) = x(0) \prod _{t=0}^{k-1} u(t)\). For the infinite horizon cost to be bounded we need to find an input such that \(x(k) \rightarrow 0\) as \(k \rightarrow + \infty \). Hence, the input needs to fulfill \(\prod _{t=0}^{k-1} u(t) \rightarrow 0\). On the other hand:

$$\begin{aligned} \prod _{t=0}^{k-1} u(t)= e^{ \sum _{t=0}^{k-1} \log ( u(t) ) } \end{aligned}$$

and therefore, for the cost to be bounded we need:

$$\begin{aligned} \sum _{t=0}^{k-1} \log ( u(t) ) \rightarrow - \infty \end{aligned}$$

as \(k \rightarrow + \infty \). However, on the interval [1/2, 1], concavity of the \(\log \) function yields:

$$\begin{aligned} \log (u ) \ge \log (2) ( u -1 ). \end{aligned}$$

Using the inequality above shows:

$$\begin{aligned} \sum _{t=0}^{k-1} \log ( u(t) ) \ge \log (2) \sum _{t=0}^{k-1} (u(t) -1 ). \end{aligned}$$

As a consequence, for the infinite horizon cost to be bounded we need:

$$\begin{aligned} \sum _{t=0}^{k-1} (u(t) -1 ) \rightarrow - \infty , \end{aligned}$$

as \(k \rightarrow + \infty \). This, however, contradicts boundedness of the cost as \(\ell (x,u) \ge 1 - u\).

It is worth pointing out that the optimal steady state for the considered example is \(x_s=0\) and \(u_s=1\). This steady state is not reachable in finite time, though. Notice also that this is trivially a dissipative system with storage function \(\lambda (x)=0\) due to the non-negativity of the cost. As a consequence no bounded solution of the shifted Bellman Equation exists.

7.3.3 Example With Continuous and Discontinuous Solutions

Consider the autonomous nonlinear system:

$$\begin{aligned} x^+ = \frac{3}{2} x - \frac{1}{2} x^3, \end{aligned}$$

along with the cost functional \(\ell (x,u)=0\). Choose \({\mathbb {X}} = [-1,1]\) which is a forward invariant set for the dynamics, with 3 equilibria in \(-1,0\) and 1 respectively. The equilibrium in 0 is antistable, while the equilibria in \(\pm 1\) are asymptotically stable with basin of attraction (0, 1) and \((-1,0)\) respectively. Clearly, \(\psi (x) \equiv 0\) is a solution of the Bellman Equation. Any function of the form:

$$\begin{aligned} \psi (x) = \left\{ \begin{array}{cl} c_{1} &{} x < 0 \\ c_2 &{} x = 0 \\ c_3 &{} x>0 \end{array} \right. \end{aligned}$$

is also a solution. Consider next an arbitrary continuous increasing initialisation of \(\psi \) of the \({\hat{T}}\) and \({\check{T}}\) maps. It can be seen that \(T \psi \) is also increasing, as f(x) is such in the interval \([-1,1]\). As a consequence \({\hat{T}} \psi \) and \({\check{T}} \psi \) are also increasing. Moreover, \(T \psi (0) = \psi (0)\) and \(T \psi ( \pm 1) = \psi (\pm 1)\). Thus, \({\hat{T}} \psi (0) - {\hat{T}} \psi (-1) = \psi (0)- \psi (-1)\) and \({\hat{T}} \psi (1) - {\hat{T}} \psi (0) = \psi (1) - \psi (0)\). By induction then, \({\hat{T}}^k \psi (x)\) is increasing with respect to x for all k and so is \({\check{T}}^k \psi (x)\). It can be shown that for \(\psi (x)=x\) it holds \(c({\hat{T}}^k \psi ,T {\hat{T}}^k \psi )=0\) for all k. In particular, \({\hat{T}}^k \psi \) converges to:

$$\begin{aligned} {\hat{\psi }} (x) = \left\{ \begin{array}{cl} -1 &{} x < 0 \\ x &{} x \ge 0 \end{array} \right. \end{aligned}$$

Numerical simulations indeed confirm this claim, see Fig. 10. This shows that even if \({\hat{T}}\) (or \({\check{T}}\)) admit continuous fixed points, the iteration of \({\hat{T}}^k \psi \) does not necessarily converge to a solution of the Bellman Equation. Similarly, considering the iteration \({\check{T}}^k \psi \), for the same initial function \(\psi (x)=x\), it holds \(c({\check{T}}^k \psi , T {\check{T}}^k \psi )=0\) for all k and \({\check{T}}^k \psi \) converges to:

$$\begin{aligned} {\check{\psi }} (x) = \left\{ \begin{array}{cl} 1 &{} x > 0 \\ x &{} x \le 0 \end{array} \right. \end{aligned}$$
Fig. 10
figure 10

Numerical \({\hat{T}}^k x\) iteration

7.3.4 Example With Upper Semi-Continuous Solution

We slightly modify the previous example to include a scalar control input and induce an upper semi-continuous solution. Consider the nonlinear system:

$$\begin{aligned} x^+ = u \left( \frac{3}{2} x - \frac{1}{2} x^3 \right) =: f(x,u), \end{aligned}$$

with state-space \({\mathbb {X}}=[0,1]\), scalar input u constrained in \({\mathbb {U}}(x) = [0,1]\) along with the cost functional

$$\begin{aligned} \ell (x,u)= |u-1| - f(x,u) + x.\end{aligned}$$

Notice that:

$$\begin{aligned} \sum _{k=0}^{T-1} \ell (x(k),u(k)) = x(0) - x(T) + \sum _{k=0}^{T-1} |u(k)-1|. \end{aligned}$$

Hence, the optimal average performance is 0, achieved for \(u(\cdot )=1\). The function \({\bar{\psi }}\) defined below:

$$\begin{aligned} {\bar{\psi }} (x) = \left\{ \begin{array}{ll} -2 + x &{} \text {for } x \in (0,1] \\ 0 &{} \text {for } x=0 \end{array} \right. \end{aligned}$$

is a solution of the Bellman Equation. To see this, notice, assuming \(x \ne 0\):

$$\begin{aligned} T {\bar{\psi }} (x)= & {} \min _{u \in [0,1]} |u-1| + x - f(x,u) + {\bar{\psi }} ( f(x,u) ) \\= & {} \min \left\{ 1 + x + {\bar{\psi }} (0), \inf _{u \in (0,1]} |u-1| + x - f(x,u) + {\bar{\psi }} ( f(x,u) ) \right\} \\= & {} \min \left\{ 1 + x + {\bar{\psi }} (0), \inf _{u \in (0,1]} |u-1| + x - 2 \right\} \; = \; -2 + x. \end{aligned}$$

For \(x=0\), it is easy to verify \( T {\bar{\psi }} (0) = 0\). We show in Fig. 11 the iteration converging to \({\bar{\psi }}\). Notice that, despite \({\bar{\psi }}\) being upper semi-continuous, not admitting a minimum in [0, 1], and the discontinuity point \(x=0\) being reachable from all states in \({\mathbb {X}}\) within a single step, still the minimum in the definition of the operator \(T {\bar{\psi }}\) is achieved. More in general we see that the iteration \({\hat{T}}^k \psi \) converges, for \(x>0\), to \(\psi (1)-1 + x\).

Fig. 11
figure 11

Numerical \({\hat{T}}^k (- x)\) iteration

We don’t have any examples of optimal control problems where the only solutions are upper semi-continuous (and not continuous), or where the minimum \(T {\bar{\psi }}\) is not achieved. It is worth pointing out that \({\bar{\psi }}(x)=x\) is also a solution of the Bellman Equation.

7.4 Complex Optimal Regime of Operation

We consider examples where the optimal average performance is not achieved at steady-state, but for more exotic type of behaviours. It is worth pointing out that dealing with a terminal penalty function allows to treat such examples without the need of an a priori known terminal absorbing state or terminal absorbing set. Moreover, the optimal regime of operation does not entail a constant (or zero) optimal stage cost in steady-state.

7.4.1 Example With Chaotic Optimal Regime

Consider the scalar nonlinear system:

$$\begin{aligned} x^+ = u x (1-x) \end{aligned}$$
(7.13)

with scalar state \(x \in {\mathbb {X}}:= [0,1]\) and input \(u \in {\mathbb {U}}: = [0,4]\). We consider the stage-cost:

$$\begin{aligned} \ell (x,u) = x^2 - [u x (1-x)]^2 + |u-18/5|. \end{aligned}$$

Notice that \(\ell (x(k),u(k)) = x(k)^2 - x(k+1)^2 + |u(k)-18/5|\). Therefore, along arbitrary solutions we have:

$$\begin{aligned} \sum _{k=0}^{\tau -1} \ell (x(k),u(k)) = x(0)^2 - x(\tau )^2 + \sum _{k=0}^{\tau -1} |u(k)-18/5|. \end{aligned}$$

In particular, computing asymptotic time averages we see:

$$\begin{aligned}{} & {} \lim _{\tau \rightarrow + \infty } \frac{ \sum _{k=0}^{\tau -1} \ell (x(k),u(k))}{\tau } = \lim _{\tau \rightarrow + \infty } \frac{ x(0)^2 - x(\tau )^2 + \sum _{k=0}^{\tau -1} |u(k)-18/5|}{\tau } \\{} & {} \qquad = \lim _{\tau \rightarrow + \infty } \frac{ \sum _{k=0}^{\tau -1} |u(k)-18/5|}{\tau }. \end{aligned}$$

The optimal average performance is therefore 0, and is achieved for instance, for any input u(k) converging to 18/5. Notice that for \(u=18/5\) the considered dynamical system is known to have chaotic solutions. Moreover \(u(k) = 18/5\) is potentially an optimal infinite horizon control policy. This policy corresponds to the solution \({\bar{\psi }} (x) = x^2\) of the Bellman Equation. Indeed,

$$\begin{aligned}{} & {} T {\bar{\psi }} = \min _{u \in [0,4] } x^2 - [u x (1-x)]^2 + |u-18/5| + {\bar{\psi }} ( u x (1-x) ) \\{} & {} \qquad \qquad = \min _{u \in [0,4]} x^2 + |u-18/5| = x^2 = {\bar{\psi }} (x). \end{aligned}$$

Numerical solution using the \({\hat{T}}\) operator is shown in Fig. 12, starting from two distinct initializations, \(\psi (x) =0\) and \(\psi (x) = \sin (4x)\). The optimal average performance is correctly estimated to be 0 and \({\hat{T}}^k \psi \) converges to a shifted version of \(x^2\) in both cases. The numerical solution using the \({\check{T}}\) operator is slightly different and is shown in Fig. 13. While it is hard to write an explicit analytic solution of the limiting function, due to the presence of somewhat unexpected spikes, we believe that the numerical result hint at the presence of multiple solutions to corresponding the Bellman Equation. These solutions match \(x^2\) for most of the interval [0, 1] but appear to allow for piecewise linear spikes that might correspond to transient costs in regions which are not visited by the chaotic attractor. It seems more plausible that these be true solutions rather than artifacts due to numerical approximations. The optimal average performance is identified with very good precision in both cases. In particular, for the \({\hat{T}}\) iteration the error is lower than \(10^{-16}\). See Fig. 14 for the shift sequence achieved for the \({\check{T}}\) operator when \(\psi (x) \equiv 0\).

Fig. 12
figure 12

Iteration using the \({\hat{T}}\) operator from \(\psi (x)=0\) and \(\psi (x)= \sin (4x)\)

Fig. 13
figure 13

Iteration using the \({\check{T}}\) operator from \(\psi (x)=0\) and \(\psi (x)= \sin (4x)\)

Fig. 14
figure 14

Sequence \(c ( {\check{T}}^k \psi , T {\check{T}}^k \psi )\), for \(\psi =0\)

7.4.2 Two-Dimensional Example With Periodic Optimal Regime

We consider next the following two-dimensional linear system:

$$\begin{aligned} x+ = \left[ \begin{array}{cc} 0 &{} 1 \\ -1 &{} 0 \end{array} \right] \, x + \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] \, u, \end{aligned}$$
(7.14)

with state \(x \in {\mathbb {X}}:= [-1,1]^2\), and input \(u \in {\mathbb {U}} (x):= [-1-x_2,1-x_2]\). Consider the stage-cost

$$\begin{aligned} \ell (x,u)= |u| + x_1^2 - |x_1|/2. \end{aligned}$$

Notice that this cost is not positive definite. In particular, the optimal average performance can be expected to be negative, as the zero solution is feasible with zero input, yielding 0 average cost. However, the stage cost can be made negative for some values of \(x_1 \ne 0\). The zero-input responses of the system are (feasible) period 4 oscillations. Moreover the system is controllable, which guarantees an optimal average performance independent of the initial condition (and regardless of the adopted stage cost \(\ell (x,u)\)). We show in Fig. 15 a solution of the shifted Bellman Equation. The iterations resulting from the \({\hat{T}}\) operator and the \({\check{T}}\) operator starting in \(\psi ^0\equiv 0\) are shown in Fig. 16.

Fig. 15
figure 15

solution of the 2d Bellman Equation

Fig. 16
figure 16

Iterations of the 2d \({\hat{T}}\) operator (left) and of the 2d \({\check{T}}\) operator (right)

7.5 Inefficiency of Exponential Discounting Factors

We end our example section with an example of a discounted optimal control problem, which shows that ensuring well-posedness of infinite horizon optimal control problems by means of discounting can have unwanted side effects, making the proposed approach via the shifted Bellman Equation an attractive alternative. To this end, we consider a scalar infinite horizon linear quadratic optimal control problem with exponential discounting. In particular, the system’s dynamics are given as:

$$\begin{aligned} x^+ = (x+u)/2, \end{aligned}$$
(7.15)

with x and u taking values in \({\mathbb {R}}\). The stage cost is:

$$\begin{aligned} \ell (x,u) = (x-1)^2 + u^2. \end{aligned}$$

Since this choice will not give rise to bounded costs over an infinite horizon we use a discounting factor \(\gamma \in (0,1)\):

$$\begin{aligned} J_\gamma = \sum _{k=0}^{+\infty } \gamma ^k \ell (x(k),u(k)). \end{aligned}$$

The optimal infinite horizon cost fulfills the following Bellman Equation:

$$\begin{aligned} J^*_{\gamma } (x) = \min _{u \in {\mathbb {R}}} \ell (x,u) + \gamma J^*_{\gamma } ( f(x,u) ). \end{aligned}$$

It is possible to show that this equation admits a solution:

$$\begin{aligned} J^*_{\gamma } (x) = \alpha x^2 + \beta x + \delta \end{aligned}$$

where \(\alpha \), \(\beta \) and \(\delta \) fulfill the conditions:

$$\begin{aligned} \begin{array}{rcl} \alpha (\gamma ) &{}=&{} \gamma - 2 + \sqrt{ \gamma ^2 + 4 } \\ \beta (\gamma ) &{}=&{} - \frac{2 \alpha (\gamma ) \gamma + 8}{ \alpha (\gamma ) \gamma + 4 - 2 \gamma } \\ \delta (\gamma ) &{}=&{} \frac{4 \alpha (\gamma ) \gamma + 16 - \beta ^2(\gamma ) \gamma ^2 }{(4 \alpha (\gamma ) \gamma + 16)(1 - \gamma )} \end{array} \end{aligned}$$

The optimal feedback is affine in x and expressed as:

$$\begin{aligned} u^*(x) = - \frac{ \beta \gamma + \alpha \gamma x }{ \alpha \gamma + 4}. \end{aligned}$$

This feedback globally asymptotically stabilizes a unique equilibrium \(x_e(\gamma )\):

$$\begin{aligned} x_e (\gamma ) = - \frac{ \beta (\gamma ) \gamma }{2 \alpha (\gamma ) \gamma +4 }. \end{aligned}$$

Notice that the optimal average performance is achieved at equilibrium, for \(x=1/2\) and \(u=1/2\), which yields \(V^{avg} = (1/2)^2 + (1/2)^2 = 1/2\). On the other hand, the equilibrium \(x_e (\gamma )\) only approaches the value 1/2 as \(\gamma \rightarrow 1\) (see Fig. 17). This shows that the long run average performance achieved by introducing a discounting factor is in general suboptimal. Moreover, the discounting factor introduces a non existent trade-off between optimising transient cost and steady-state (average) costs which persist for \(\gamma \) arbitrarily close to 1. This trade-off can be avoided by the approach pursued in this paper. On the other hand, any feedback \(u=k(x)\) (for instance affine, \(u= k_1 x + k_2\)) which stabilizes the equilibrium 1/2, clearly achieves optimal average performance (and is therefore optimal with respect to the cost functional \(J^{\text {avg}}\)), but, at the same time, it is not necessarily optimal from the point of view of transient costs. We refer to [28,29,30] for more examples of this kind and an in-depth study of the stability properties of discounted optimal equilibria. On a related note, [31] highlights how, by suitably adapting the stage cost, discounted and undiscounted formulations in Markov Decision Processes can yield the same optimal control law.

Fig. 17
figure 17

Equilibrium \(x_{e}\) as a function of \(\gamma \) in [0, 1]

8 Conclusions and Outlook

Two novel recursion operators are proposed for the simultaneous computation of value functions and minimal average asymptotic cost in discrete-time infinite horizon optimal control problems. The recursive formulas can be readily applied when average asymptotic cost is independent of initial conditions, a situation referred to as the ergodic case in [22]. The approach renders dynamic programming techniques invariant with respect to additive constants on the stage cost, as it is naturally the case in the finite horizon case, for infinite horizon control problems. The recursions converge, under fairly relaxed technical assumptions, to solutions of a shifted Bellman Equation, whose shift value is not a priori determined but is asymptotically computed alongside the value function. The approach removes the need for absorbing states and zero cost conditions on the absorbing sets which have often hindered the applicability such techniques, or the need for discounting factors which introduce unnecessary trade offs between transient cost and asymptotic average performance. While the approach is developed for the case of deterministic systems only, its extension to stochastic settings appears of potential interest. Finally, this may serve as a first step in understanding the more general question of a shift-invariant approach to infinite horizon optimal control problems in the non-ergodic case, [22, 32].