Dissipativity in infinite horizon optimal control and dynamic programming

In this paper we extend dynamic programming techniques to the study of discrete-time infinite horizon optimal control problems on compact control invariant sets with state-independent best asymptotic average cost. To this end we analyse the interplay of dissipativity and optimal control, and propose novel recursive approaches for the solution of so called shifted Bellman Equations.


Introduction
Dynamic programming (DP) is a cornerstone of control theory which allows to solve (in feedback form) optimal control problems formulated on horizons of increasing length through a suitable recursive formula for the computation of the so called value function, [1].
Remarkably, dynamic programming allows to study problems formulated both on a finite horizon or an infinite one, the latter achieved under suitable technical assumptions, by studying the asymptotic properties of the recursion or by computing its fixed points. By now, the subject of dynamic programming and infinite horizon optimal control has been studied in depth by many authors and several monographs on the subject exist both in the control domain [2,3,4] and in economics, [5,6].
While, in its naive form, DP is often associated to the curse of dimensionality, which may hinder its applicability to scenarios of practical relevance, the topic of its approximate and efficient numerical treatment has also gathered significant impetus, in particular in the context of machine learning, [7]. Indeed, the dynamic programming or Bellman Equation is at the core of any (deep) reinforcement learning algorithm [8,9].
The link of optimal control to dissipativity was already established by Willems in the seminal papers [10,11] and in parallel in the study of nonlinear inverse optimal regulators for nonlinear systems, [12]. However, it was only brought to the forefront of the discourse on optimisation-based control in recent years, [13,14], thanks to its surprising connections to closed-loop stability of Economic Model Predictive Control [15,16] and long-run average optimal control, [17,18]. In particular, [15] proposes a notion of optimal operation at steady-state and provides a sufficient conditions for this property to hold based on dissipativity of the associated systems' dynamics with respect to a suitable supply function. The converse statement is investigated in [16] where an additional controllability assumption is needed in order to prove necessity of dissipativity. While generalizations of the above results, its relation to the so-called turnpike property, and extensions to periodic optimal solutions are provided in several subsequent works (i.e. [19] and [20]), the connection to Dynamic Programming and infinite horizon optimal control has remained elusive, due to restrictive technical assumptions needed to make sense of undiscounted cost functionals.
In this paper we further explore connections between dissipativity and infinite horizon optimal control problems, while proposing new formulations and iterative methods for their solutions that significantly expand the class of problems which can be meaningfully addressed by this approach. Our main contributions are • introducing a terminal penalty in infinite horizon optimal control, in the form of suitable storage functions with negative sign; • proposing a shifted Bellman Equation to be used in optimal control problems with non-zero (yet state-independent) optimal long run average performance (this includes systems with periodic, almost periodic or even chaotic regimes of operation allowing general time-varying asymptotic cost along optimal solutions); • proposing two novel recursions whose fixed points are solutions of shifted Bellman Equation (of any shift); • analysing the convergence properties of such recursions under fairly general technical assumptions, allowing simultaneous computation of the best average performance and of the associated value function; • tackling the non-existing trade-off between transient cost and asymptotic average performance.
The rest of the paper is organized as follows: Section 2 introduces the problem formulation, basic notations and some preliminary results, Section 3 introduces the shifted Bellman Equation and the novel recursion operators whose properties are investigated in Section 4. Section 5 provides a general convergence result under suitable conditions on the controllability of the system's dynamics, while Section 6 relaxes some continuity assumptions needed for convergence analysis approaching the recursion from specific initialisations. Examples and counter-examples are shown in Section 7 and Section 8 draws some conclusions and points to further open research directions. Important intermediate technical results are collected in the appendix in Section A.

Problem formulation and preliminary results
Consider the discrete-time finite dimensional nonlinear control system described by the following difference equations: where x(t) ∈ X ⊂ R n is the state-variable, taking values in some compact control invariant set X, u(t) ∈ R m is the control input and f : Z → X, is the continuous transition map. We denote by U(·) : X → 2 R m the upper semicontinuous set-valued mapping defined below: U(x) := {u ∈ R m : (x, u) ∈ Z}, (2.2) which corresponds to the set of feasible control inputs in state x, given the compact state/input constraint set Z. Moreover, we assume, without loss of generality, for all x ∈ X. For an input sequence u = {u(t)} ∞ t=0 , we denote by φ(t, x, u) the state at time t, from initial condition x(0) = x, as given by iteration (2.1). We also extend definition (2.2), to allow feasible control sequences of length T , as follows: x, u), u(t)) ∈ Z, ∀ t ∈ {0, . . . , T − 1}}. (2.4) Our contribution is twofold; namely, to define optimal control problems over an infinite horizon within a significantly larger set of systems dynamics and associated cost functional than is currently possible to address by existing formulations, and, at the same time, to propose a dynamic programming approach for their solution. To this end we consider a continuous stage cost, (x, u) : Z → R, and formulate the following cost functional: where ψ : X → R is a continuous function called the terminal cost. Terminal costs significantly affect the solution of an optimal control problem and a key insight of our paper will be providing guidelines for their selection in order to allow the formulation of infinite horizon optimal control problems. A finite horizon optimal control problem is then defined as follows: V ψ T (x) := min x(·),u(·) J T (x(·), u(·)) subject to x(0) = x x(t + 1) = f (x(t), u(t)) t ∈ {0, 1, . . . , T − 1} (x(t), u(t)) ∈ Z t ∈ {0, 1, . . . , T − 1} x(T ) ∈ X (2. 6) For each value of the initial condition x ∈ X, a solution of (2.6) is guaranteed to exist thanks to the compactness and non-emptiness properties of the feasible set and continuity of the cost function.
On the other hand, when the control problem has no natural termination time, one might want to define an infinite horizon optimisation problem. This has often the additional appealing feature of being achieved through implementation of a time-invariant feedback policy. However, making sense of an infinite horizon formulation of (2.6) typically entails strong assumptions on the kind of system's dynamics and cost functional that are allowed.
One strategy for avoiding such kind of limitations is, at least in practice, to introduce a discounting factor 0 < γ < 1 in the cost function: J γ (x(·), u(·)) := ∞ t=0 γ t (x(t), u(t)), (2.7) which for γ ≈ 1 provides a good approximation to some form of infinite horizon (average) cost. While this approach has some appealing features, for instance making optimal solutions invariant with respect to translation of by any finite constant value, having to settle on a specific value of γ less than unity is unsatisfactory as it always leaves open the question of how optimal control policies would be affected by variations in γ, i. e. if higher values were to be considered. Moreover, as shown later in Section 7.5, adoption of a discounting factor may introduce non-existent trade-offs between optimisation of steady-state and transient costs.
An alternative approach is to resort to average, rather than summed costs: Taking the average yields well-defined costs even when summed costs would be divergent to ±∞, or are non-convergent (for instance oscillating), which constitute the main obstructions in the definition of infinite horizon control problems for general dynamics and costs. On the other hand, time-shift invariance of average costs along any solution, implies that this approach disregards transient costs, which therefore won't be minimised and might be arbitrarily large even for optimal feedback policies (see again example in Section 7.5. Our proposed solution and novel contribution is to provide fairly general conditions on the terminal cost ψ to make sure that the functional: is well-defined. To this end the notion of dissipativity will play an interesting role. This notion was originally introduced by Willems in [10,11] and has recently received a surge in interest for its crucial role in the analysis of closed-loop Economic Model Predictive Control schemes [15,16,13,14]. In a nutshell a system as (2.1), is said to be dissipative with respect to the supply function (x, u), if there exists a continuous storage function λ : X → R such that: This inequality is normally interpreted in "energetic" terms as enforcing, for a dissipative system, that energy stored within, at the next state, cannot exceed the energy at the current state plus the energy externally supplied through the supply function (x, u). In the context of optimal control, where the objective is to minimize a cost functional, λ(x) can be interpreted as the value of the state x and the dissipation inequality guarantees that the gain in value for any feasible control action u and state x cannot exceed the corresponding stage cost. Notice that, while optimal control sequences over any finite control horizon (or over infinite control horizon with forgetting factor γ) are invariant with respect to cost translation, viz.˜ (x, u) := (x, u) − c for any constant c ∈ R, dissipativity is not a shift-invariant property. In fact, it can always be guaranteed by a sufficiently negative value of c, given compactness of Z. Trivially, if˜ (x, u) ≥ 0 for all (x, u) ∈ X, dissipativity is ensured just by defining λ(x) = 0 for all x ∈ X. Our first result is stated below.
Proposition 2.1 Assume that system (2.1) is dissipative with continuous storage function λ(·) with respect to the supply (x, u), and let ψ(x) = −λ(x), then the limit: exists for all x ∈ X, possibly assuming the value +∞.
Proof. Consider any feasible solution x * T +1 , u * T +1 (with x * (0) = x) which achieves the optimal cost V ψ T +1 (x). By definition, where the first inequality holds by the dissipativity assumption, and the second because x * , u * is a feasible solution also over the shorter horizon [0, T ]. Hence, V ψ T (x) is monotone non-decreasing with respect to T and the limit (2.10) exists.
It is important to realise that Proposition 2.1 only guarantees existence of the limit, not actual boundedness of the cost V ψ ∞ (x). In fact, typically the cost would be +∞ unless a suitably shifted version of (x, u) is considered. In particular, there is only a single value of this shift that might result in a finite cost. This can be found, by alternative means, looking for the optimal average cost, (2.11) Under suitable technical conditions, for instance global controllability assumptions, the optimal cost is independent of x, and its value can be found [21,18] by an infinite dimensional linear program, viz. by solving the following optimisation problem: where: We note that this approach has similarities to the effective Hamiltonian approach in continuoustime ergodic optimal control, see [22]. Dynamic programming allows to solve optimal control problems through iteration of a suitably defined operator, which computes the optimal cost for increasing values of the control horizon. To this end, for summed costs without exponential rescaling, the following Bellman operator is normally defined: T : C(X) → C(X). (2.13) The following result characterizes V ψ ∞ (x) as a fixed-point of the Bellman operator.
Proposition 2.2 Assume that ψ = −λ for some storage function λ ∈ C(X) and that the following limit exists and is finite: (2.14) Proof. To see this, recall that V ψ T (x) is non-decreasing with respect to T . Hence: Since T is arbitrary, we see that: This proves that V ψ ∞ is lower semicontinuous. Hence the minimum of is achieved, for some optimal feedback policy u * (x). Moreover it fulfills: On the other hand: Let x ∈ X be fixed and arbitrary. Since V τ is continuous in x, for each τ > 0 and the current fixed value of x there exists a minimizer u τ (x) ∈ U(x) for this last expression. Since U(x) is compact, we find a sequence τ n → ∞ (possibly x-dependent) such that u τn converges to a control value u ∞ (x) ∈ U(x). For each T > 0 this implies Hence we see, starting from (2.15): Since x ∈ X and ε > 0 were arbitrary, the assertion V ψ ∞ (x) ≥ T V ψ ∞ (x) follows for all x ∈ X.

Shifted Bellman Equation and operators
In the literature, different constructive approaches for computing storage functions are described, above all the classical constructions of the available storage and the required supply, which go back to [10] and are easily adapted to the discrete-time case (see, e.g., [16,19] for the available storage). For this reason, a possible, but ultimately unsatisfactory, way to approach an infinite horizon optimal control problem would be according to the following steps: This procedure is non ideal for several reasons: first of all, computation of the optimal average cost involves a limiting operation, and therefore typically only approximate values for V avg can ever be achieved. However, using approximate values in the iteration of the Bellman operator, yields diverging optimal costs over an infinite horizon, either to ±∞, depending on whether the optimal average cost has been over or underestimated. In addition, Step 3 is bound to fail whenever the average optimal cost V avg has been overestimated (in other words a storage function might exist only for˜ (x, u) = (x, u) − c where c ≤ V avg ). The goal of this section is to propose operators, the min-shifted and max-shifted Bellman operator, whose iteration would converge to the optimal infinite horizon cost, and, at the same time, yield as a by-product the optimal average cost.
To this end, we need additional notation. Given ψ 1 : X → R and ψ 2 : X → R, continuous, we define the following: The following distance notion is also defined: Notice that d(ψ 1 + c 1 , ψ 2 + c 2 ) = d(ψ 1 , ψ 2 ) for all c 1 , c 2 ∈ R. Moreover: In fact, an equivalent alternative definition for d(ψ 1 , ψ 2 ) is as follows: Recall the Bellman operator T : C(X) → C(X) previously introduced: Definition 3.1 Define the min-shifted Bellman operatorT : C(X) → C(X) as: Similarly, we may consider the following operator.
Definition 3.2 Define the max-shifted Bellman operatorŤ : C(X) → C(X) as: It is straightforward to see that: for all k ∈ N. Opposite inequalities hold in the case of theŤ operator: Along similar lines the following inequality can be shown by induction for theŤ operator: Tψ =ψ + c. Proof. Assume thatψ fulfills the shifted Bellman Equation (3.6). Then, direct computation shows: where the equality follows since by definition c(ψ,ψ + c) = −c. Conversely, assumeTψ =ψ: Hence, the following inequality holds: We claim that more is true, namely: Assume by contradiction: min where the min exists by continuity ofψ, Tψ and compactness of X. By inequality (3.7) we also know that: max Taking a convex combination of the two previous inequalities yields: which is a contradiction. Hence, (3.8) holds, andψ is solution of a shifted Bellman Equation.
A similar proof applies to theŤ operator.

Properties of T ,T andŤ operators
Throughout this section we recall some useful properties of the T operator and additionally provide original derivations for the properties of theT andŤ operators. Some of the properties listed below are well known and can be found in [3]: • Monotonicity: • Translation invariance: for any constant c ∈ R; • Minimum commutativity, for finite index set K: To see the last one, notice: • Concavity: For all α ∈ [0, 1] and any ψ 1 , ψ 2 it holds: To see this, notice: • Max-super-commutativity: the following inequality holds: and by induction, for any finite set K: • Non-expansiveness: monotonicity and shift-invariance can be exploited to show the following inequality, expressing (incremental) non-expansiveness of the T operator: Next we derive some useful properties of theT andŤ operators. Notice that for all c 1 , c 2 ∈ R the following holds: Hence the following translation invariance can be seen: for all c ∈ R. In fact, The same property holds forŤ . The next proposition states that all solutions of a shifted Bellman Equation share the same shift value. Proof. See Appendix B.1.
We show later, by means of an example, that while the shift is uniquely defined for all solutions of the shifted Bellman Equation, it is not true in general that d(ψ 1 , ψ 2 ) = 0, i.e. there may be multiple solutions of the shifted Bellman Equation, even after taking into account translation invariance. In the remainder of this section, we describe a situation in which the solution of the shifted Bellman Equation is unique, up to the addition of a constant. Again, a dissipativity inequality plays a role, but now a stronger one than (2.9). For an equilibrium (x e , u e ) we call the system strictly dissipative, if there exists a storage function λ : X → R, bounded from below, and 1 α ∈ K such that We note that a positive definite stage cost, i.e., an satisfying (x, u) ≥ α( x−x e ) for all (x, u) ∈ Z and (x e , u e ) = 0, satisfies the inequality (4.1) for λ ≡ 0. For this kind of stage costs, the following proposition holds. Proof. Let ψ 1 and ψ 2 be two continuous solutions of the shifted Bellman Equation (3.6) that are bounded from below. By adding suitable constants, we can assume that ψ 1 (x e ) = ψ 2 (x e ) = 0. From (2.13) we obtain that implying c ≤ 0.
For each x ∈ X, let u * i (x) ∈ U(x) be a control that realizes the minimum in the Bellman operator (2.13) for ψ = ψ i , i = 1, 2. Such a u * i (x) exists because , f , and ψ i are continuous and U(x) is compact. Then from the shifted Bellman Equation we obtain that implying Since ψ i is bounded from below in X, this sum must converge, implying that α( x * i (k) − x e ) → 0 and thus x * i (k) → x e as k → ∞. Since ψ i (x e ) = 0 and ψ i is continuous, we also obtain ψ j (x * i (k)) → 0 as k → ∞ for i = 1, 2 and j = 1, 2. Now pick an arbitrary x ∈ X. We show that for each ε > 0 and for both choices i = 1, j = 2 and i = 2, j = 1 we have holds, which shows ψ 1 (x) = ψ 2 (x) and thus the assertion.
To this end, consider the sequence Iterating this inequality we thus obtain Now for a strictly dissipative system satisfying (4.1) we consider the "rotated" stage cost and observe that it satisfies the conditions on from Proposition 4.2. The corresponding Bellman operator defined by T ψ(x) := min satisfies the following property.
holds. Particularly, if ψ is a solution of the shifted Bellman Equation for T and some c, theñ Proof. For all x ∈ X we have that This proves the first statement. Now, if ψ is a solution of the shifted Bellman Equation for T , then i.e.ψ is a solution of the shifted Bellman Equation for T . Proof. Let ψ 1 and ψ 2 be two solutions of the shifted Bellman Equation satisfying the assumption. Thenψ i = ψ i + λ, i = 1, 2 satisfy the assumption of Proposition 4.2 since λ is continuous and bounded from below. Hence, applying Proposition 4.2 to T yields that ψ 1 + c − (x e , u e ) and ψ 2 + c − (x e , u e ) coincide up to the addition of a constant, implying the same for ψ 1 and ψ 2 .
We note that non-strict dissipativity is not enough to obtain this uniqueness result up to additions of constants, as the example in Subsection 7.2.1 shows.

Convergence analysis under equicontinuity
In order to prove convergence of theT andŤ iterations to a fixed point of the shifted Bellman Equation we restrict the dynamics to fulfill suitable equicontinuity assumptions. Moreover, we provide sufficient conditions, in the form of controllability assumptions, which lead to the needed equicontinuity properties both for the iteration T k ψ andT k ψ.
In order to have convergence guarantees for a sequence of functions, the following notion of equicontinuity is adopted.
k=0 , ψ k : X → R is said to be equicontinuous, if there exists a function γ ∈ K ∞ such that: To carry out our analysis, we will need the following assumption.
Assumption 5.2 The sequence {T k ψ} +∞ k=0 is equicontinuous. The following lemma shows that this assumption immediately carries over toT k ψ.
The lemma is a simple consequence of formula (3.4). In particular, equicontinuity holds with the same function γ, i.e. |T k ψ( Our main convergence results under equicontinuity are now stated in the following two theorems. Theorem 5.4 Let ψ ∈ C(X) be such that T k ψ(x) fulfill Assumption 5.2. Then, if a continuous fixed point of the shifted Bellman Equation exists, the sequenceT k ψ(x) converges uniformly to one such fixed point.
Proof. Consider the sequence [T k ψ] n . By Lemma A.5 this sequence is bounded since: Moreover, by Lemma 5.3 it is equicontinuous. Hence, by the Arzela-Ascoli Theorem, it admits a non empty set of accumulation points (with respect to the uniform topology), Moreover, each accumulation point in ω(ψ) is continuous and fulfills the same continuity inequality, is a non-increasing sequence, bounded from below by 0. In addition W is continuous in the topology of uniform convergence. Hence, the limit lim k→+∞ W (T k ψ) exists, and we denote it byW . Because of continuity of W and uniform convergence to the limit points we also have W (ψ) =W for allψ ∈ ω(ψ). Notice that ω(ψ) is invariant with respect toT . Hence, for anyψ ∈ ω(ψ) and any k ∈ N we have W (T kψ ) =W . By combined inequalities (A.9) and (A.8) we see that W (T kψ ) can be constant only provided min x∈XT kψ (x) − TT kψ (x) and max x∈XT kψ (x) − TT kψ (x) are constant with respect to k. By Corollary A.24, the sequenceT kψ is bounded and converges monotonically to an upper semi-continuous limit. Notice that, by invariance of ω(ψ) and the fact that all elements of ω(ψ) fulfill inequality (5.1), equicontinuity ofT kψ follows. Hence the limit ψ ∞ (x) := lim k→+∞T kψ (x) not only exists (as previously established), but is also continuous and, by Dini's Theorem, convergence is uniform in X. By continuity of thê T operator with respect to uniform convergence, ψ ∞ (x) is a fixed point of the shifted Bellman Equation (cf. Lemma A.22) and 0 = d(ψ ∞ , T ψ ∞ ) = d(ψ, Tψ). This shows that any element of ω(ψ) is an equilibrium of the shifted Bellman Equation. We only need to show that ω(ψ) is a singleton. This follows because of Lemma A.6. Indeed, the distance to any elementψ of ω(ψ) is non increasing along the iterationT k ψ. Since such distance is converging to 0 along some subsequencê T kn ψ, then it is converging to 0 along the sequenceT k ψ itself.
Due to the lack of an analogue to formula (3.4) for theŤ operator, there is no simple way of proving a version of Lemma 5.3 forŤ k ψ. As a consequence, the analogue of Theorem 5.4 forŤ is stated by directly assuming equicontinuity ofŤ k ψ. Proof. Consider the sequence [Ť k ψ] n . This sequence is bounded since: where the last inequality follows by Lemma A.7. Moreover, by assumption, it is equicontinuous. Hence, by Arzela-Ascoli Theorem, it admits a non empty set of limit points (with respect to the uniform topology), Note that each limit point in ω(ψ) is continuous and fulfills the same continuity inequality, is a non-increasing sequence, bounded from below by 0. In addition W is continuous in the topology of uniform convergence. Hence, the limit lim k→+∞ W (Ť k ψ) exists, and we denote it byW . Because of continuity of W and uniform convergence to the limit points we also have W (ψ) =W for allψ ∈ ω(ψ). Notice that ω(ψ) is invariant with respect toŤ . Hence, for anyψ ∈ ω(ψ) and any k ∈ N we have W (Ť kψ ) =W . By combined inequalities (A.12) and (A.13) we see that W (Ť kψ ) can be constant only provided min x∈XŤ kψ (x) − TŤ kψ (x) and max x∈XŤ kψ (x)−TŤ kψ (x) are constant with respect to k. By Corollary A.26, the sequenceŤ kψ is bounded and converges monotonically to a lower semi-continuous limit. Notice that, by invariance of ω(ψ) and the fact that all elements of ω(ψ) fulfill inequality (5.2) follows equicontinuity ofŤ kψ , hence the limit ψ ∞ (x) := lim k→+∞Ť kψ (x) not only exists (as previously established), but is also continuous and, by Dini's Theorem, convergence is uniform in X. By continuity of theŤ operator with respect to uniform convergence, ψ ∞ (x) is a fixed point of the shifted Bellman Equation and 0 = d(ψ ∞ , T ψ ∞ ) = d(ψ, Tψ). This shows that any element of ω(ψ) is an equilibrium of the shifted Bellman Equation. We only need to show that ω(ψ) is a singleton. This follows because of Lemma A.7. Indeed, the distance to any elementψ of ω(ψ) is non increasing along the iterationŤ k ψ. Since such distance is converging to 0 along some subsequenceŤ kn ψ, then it is converging to 0 along the sequenceŤ k ψ itself.
In the remainder of this section we derive a sufficient condition for Assumption 5.2, which is based on a controllability condition. Definition 5.6 Given a system as in (2.1) and the associated state and input constraint sets X and U(x), we say that the system fulfills Uniform Incremental Continuous Controllability, if there exists N ∈ N, and a class K ∞ function δ, such that, for all x 1 , x 2 ∈ X, and for all , and in addition: A milder controllability assumption can be formulated by considering continuity with respect to the cost alone, rather than the control input. To this end, let J N (x, u), for x ∈ X and u ∈ U N (x) denote the following: Definition 5.7 Given a system as in (2.1) and the associated state and input constraint sets X and U(x), we say that the system fulfills Uniform Incremental Controllability Continuous in Cost, if there exists N ∈ N, and a class K ∞ function δ, such that, for all x 1 , x 2 ∈ X, and for all , and in addition: Remark 5.8 Notice that Uniform Incremental Continuous Controllability implies Uniform Incremental Controllability Continuous in Cost. This is because the considered stage-cost function and the dynamics are both continuous, moreover cost is considered only over a finite interval of length N . The converse implication is not true in general.
The following proposition now shows that Uniform Incremental Controllability Continuous in Cost implies the equicontinuity in Assumption 5.2 required in Theorem 5.4. Proposition 5.9 Assume that system (2.1) fulfills the controllability assumption in Definition 5.7. Then, for any continuous function ψ : X → R, the sequence {T k ψ} +∞ k=0 is equicontinuous, i.e., Assumption 5.2 is fulfilled.
Proof. Consider any k ∈ N, and arbitrary x 1 , x 2 ∈ X. Let u * 1 ∈ U k+N (x 1 ) be any optimal control sequence corresponding to the optimal control problem with terminal penalty function ψ and horizon k + N , with initial condition x 1 . Then, from the optimality principle: Let now, u 2 be as in Definition 5.7. Clearly, applying u 2 is, in general, suboptimal from initial condition x 2 . Hence, the inequality below holds: Combining equations (5.3) and (5.4) yields: , where the first equality follows because φ(N, x 1 , u * 1 ) = φ(N, x 2 , u 2 ), and the last inequality from Definition 5.7. Symmetric inequalities can be obtained swapping x 1 and x 2 , yielding |T k+N ψ( . This shows that equicontinuity holds on the tail of the sequence T k ψ. However, {T k ψ} N −1 k=0 is a finite family of continuous functions defined over a compact set (thus also fulfilling an equicontinuity property), and therefore equicontinuity of the whole sequence follows.
Unfortunately, due to the lack of a counterpart of Lemma 5.3, we currently do not have a controllability condition for ensuring the equicontinuity needed in Theorem 5.5 for theŤ operator.

Convergence analysis without continuity
In this section we provide a convergence result for the iteration using theT operator without assuming any continuity. This is possible if we assume a dissipativity condition and start the iteration from the negative storage function. The result can thus be seen as an extension of Proposition 2.2 to the shifted Bellman Equation with nontrivial shift c = 0.
We first state a little auxiliary lemma, in which for any function ψ : X → R we define We note that ψ n ≥ 0 and min x∈X ψ n (x) = 0 as well as (ψ + c) n = ψ n for all c ∈ R.
Proof. We have thatT This implies the assertion since (T ψ + c) n = (T ψ) n for all c ∈ R. A similar computation works for T in place ofT .
We now first consider the case where ≥ 0. To this end, we make the following assumption.
Assumption 6.2 There exists a nonempty set N ⊂ X such that for any ψ : X → R with ψ ≥ 0 and ψ| N ≡ 0 we have that T ψ| N ≡ 0.
Next we prove (b) for k + 1. Using (6.1) as well as the min commutativity and the translation invariance of T we obtain Now using the induction assumption for (b) and the monotonicity of T we obtain T ψ k ≥ ψ k and T T ψ k ≥ T ψ k , implying, using (6.1) once more This shows (b) for k + 1. From the induction assumption (a) and (b) and monotonicity of T we obtain which shows (a) for k + 1.
Finally, for showing (d), we use that the induction assumption for (b) yields c(ψ k , T ψ k ) ≤ 0 and T ψ k ≥ ψ k . Together with (6.1) we obtain Proposition 6.4 Assume ≥ 0, let Assumption 6.2 hold and assume that V ψ 0 ∞ is finite for ψ 0 ≡ 0. Then the sequence of functions ψ k = (T k ψ 0 ) n , k ∈ N, converges to V ψ 0 ∞ , i.e., in particular to a solution of the Bellman Equation.
Proof. From Lemma 6.3 it follows that ψ k is increasing and bounded from above by V ψ 0 ∞ . Hence, it converges to some limit function ψ ∞ ≤ V ψ 0 ∞ . Now from T ψ k ≥ ψ k we obtain that implying that Since T ψ k ≥ ψ k we moreover obtain that T ψ k ≥ 1 2 (ψ k (x) + T ψ k (x)). Inserting these inequalities into (6.1) then yields and using this inequality and T (ψ 1 /2 + ψ 2 /2) ≥ (T ψ 1 )/2 + (T ψ 2 )/2 yields . . . which by induction yields the general formula as k → ∞. Combining this with T q ψ 0 ≥ T p ψ 0 ≥ 0 for q ≥ p ≥ 0, we obtain that for each C ∈ (0, 1) and p ∈ N there is k C,p ∈ N with for all k ≥ k C,p . This implies that for any C ∈ (0, 1). Since C can be chosen arbitrarily close to 1, this implies ψ ∞ ≥ V ψ 0 ∞ , which finishes the proof. Now we extend our results to dissipative stage costs. The dissipativity inequality here is similar to (2.9), where we explicitly include a shift of the cost function by c in the inequality. Assumption 6.5 There exists a continuous storage function λ : X → R and a value c ∈ R such that For such a function λ, similar to (4.4) we define the rotated cost and the corresponding operators T andˆ T . The next lemma extends Lemma 4.3.
Lemma 6.6 For any continuous function λ : X → R and for all k ∈ N the identities Proof. The first identity follows with an analogous proof as for Lemma 4.3 followed by induction over k. For the second identity we computê From this, the statement forˆ T k ψ follows by induction over k.
Assumption 6.7 There exists a nonempty set N ⊂ X such that for any ψ : Somewhat similar to Assumption 6.2, for dissipative optimal control problems Assumption 6.7 holds with N = {x e } for an equilibrium (x e , u e ) with (x e , u e ) = c. This is because dissipativity implies˜ ≥ 0 and Assumption 6.7 implies (x e , u e ) = c implies˜ (x e , u e ) = 0. Together this yields for all u ∈ U(x e ) that (x e , u) + ψ(f (x e , u)) ≥ (x e , u) − λ(f (x e , u)) =˜ (x e , u) + c − λ(x e ) ≥ c − λ(x e ), while for u = u e we get (x e , u e ) + ψ(f (x e , u e )) = c − λ(x e ), implying that this is the minimum and hence T ψ(x e ) = c − λ(x e ). The situation just described in particular occurs for strictly dissipative problems, cf. eq. (4.1).
Theorem 6.8 Assume that the optimal control problem is dissipative in the sense of Assumption (6.5), that Assumption 6.7 holds and that there is M > 0 with T k (ψ 0 ) ≤ M + ck for all k ∈ N and ψ 0 = −λ. Then the sequence of functions ψ k = (T k ψ 0 ) n , k ∈ N, converges to a solution of the shifted Bellman Equation.
Proof. The assumptions together with Lemma 6.6 imply that the operatorˆ T corresponding to the cost˜ from (6.3) satisfies all assumptions of Proposition 6.4. Hence, forψ 0 ≡ 0 the sequencẽ ψ k = (ˆ T kψ0 ) n converges to a solutionψ ∞ of the Bellman Equation for˜ , i.e., Tψ ∞ =ψ ∞ . Because of Lemma 6.6 and using that (ψ + φ) n = (ψ n + φ) n we obtain that From this we get, again using Lemma 6.6 and w : This finishes the proof.

Examples and Counterexamples
In this section we illustrate the performance of the iterations proposed and discussed in this paper with various examples.

Comparison of solution methods
The examples in Section 7.1 are meant to illustrate different approaches for the formulation and solution of infinite horizon optimal control problems using dynamic programming. In particular, they emphasize the need for a terminal penalty function and highlight the benefits of using theT andŤ operators for their solution.

Need for terminal penalty function
We consider the following scalar linear system: along with state x taking values in X = [−2, 2], and input constraints U(x) = [−2 + x, 2 + x]. The stage cost is piecewise linear and defined as: Notice that the state-dependent part of the cost has two local minima, at x equal −1 and +1. Moreover, for u = 0 solutions are 2-periodic and fulfill x(t) = (−1) t x(0). It is possible to show that the optimal average cost is 0, achieved by the solution x(t) = (−1) t corresponding to u(t) = 0. We show that using ψ = 0 does not lead to a convergent sequence of cost-to-go functions. See Fig.  7.1.1. In particular, T k ψ converges to a period 2 oscillation between two distinct piecewise linear functions after 2 iterations. Accordingly the optimal state-feedback (which is bang-bang) does not converge and will differ at least in some regions of state-space depending on whether an horizon of odd or even length is considered. In order to obtain meaningful infinite horizon costs and feedback policies we need to use a suitable penalty function for the final state. In particular by letting ψ = −λ where λ is a storage function. For the considered example one can show that the function: is a storage function. Fig. 7.2(left) shows that the iteration initialized with ψ = λ 1 converges. Notice that the cost monotonically converges in 3 steps to its infinite horizon value. It is well known that storage functions need not be unique. For instance the following function is another storage function: Our results show that any storage function can be used in order to define a suitable infinite horizon cost, provided this exists finite. We show in Fig. 7.2(right) how choosing a different penalty function ψ = −λ 2 still leads, for this particular example, to the same infinite horizon cost, with convergence in just one time step.

Solution with use ofT operator
We consider below the same system and constraints as in the previous example, namely Rather then applying ad hoc considerations trying to figure out the optimal average performance (which in this case is −7/2) and correspondingly shifting in order to make the problem into its previous version with optimal 0 average, we directly apply the operatorT to an arbitrary initialization ψ(x) = 0. We show in Fig. 7.3, the resulting non-increasing sequence of functionŝ T k ψ, and the corresponding limit, which is a solution of the shifted Bellman Equation. The value of shift applied c(T k ψ, TT k ψ) is displayed in Fig. 7.4. Notice that the shifts converge to 7/2, which is indeed the positive translation needed in order to compensate for the optimal infinite horizon average performance of −7/2. To highlight the power of theT iteration, which simultaneously adjusts to the right value of shift and asymptotic cost, we show in Fig. 7.5 its evolution for a different initialisation ψ(x) = − sin(x).

Solution withŤ operator
We provide next numerical evidence of convergence using theŤ operator in Fig. 7.6(left). It is also interesting to remark that bothT andŤ operators show robustness with respect to the definition of the shift term c(ψ, T ψ). Specifically, any strict convex combination (α ∈ (0, 1) ): yields convergence, although at possibly different speed. To this end we show the iteration corresponding to α = 3/4 in Fig. 7.6(right).

Non uniqueness of optimal solutions
The following examples illustrate non-uniqueness phenomena arising when dealing with infinite horizon control problems. In particular, they emphasize non uniqueness of the fixed-points of the Bellman Equation and/or of the associated optimal feedback policies.

Example with multiple solutions of the Bellman Equation
Consider the scalar linear system: along with the state constraint: X = [−2, 2] and input constraints U(x) = [−2 + x, 2 + x]. We consider a piecewise linear stage cost defined as: for some constant ε which will need to be sufficiently small. Any function ψ(x) = α|x| + εx/2 is a solution of the (shifted) Bellman Equation, as long as 0 ≤ α < 1 − ε. In fact: |u| + α| − x + u| + εu. We notice that if 0 ≤ α < 1 − ε then the optimal value is achieved for u = 0, since the slope of the absolute value of |u| dominates the slope of the other terms. In particular, substituting u = 0 yields T ψ(x) = α|x| + εx/2. Hence there are infinitely many (even continuous) solutions to the shifted Bellman Equation (3.6) (although the associated optimal feedback policies happen to be the same). We remark that because of Theorem 4.4 this implies that the problem is not strictly dissipative.

Example with multiple optimal feedback policies
We consider the following scalar linear system: along with the state constraint X = [−1, 1] and input constraints We consider a piecewise linear stage cost defined as: Notice that, for each given x ∈ X, u = 0 minimizes the stage cost and makes x into an equilibrium for the system. Hence, maximizing |x| so as to minimize , the optimal average performance is achieved for the equilibrium solutions x = ±1 provided a zero input is applied. Consider the following terminal penalty functions: As seen in Fig. 7.7, the functions ψ 1 and ψ 2 assign different terminal costs to the two optimal equilibria. In particular ψ 1 favours −1, with 0 terminal cost, while ψ 2 favours +1.
Similarly one can show that u * 2 (x) = 1 − x achieves the optimum for ψ 2 and that ψ 2 is a solution of the Bellman Equation. Notice that: is also a legitimate choice of terminal penalty function. In fact, this is the infimum element in Ψ, and is therefore the terminal penalty function that corresponds to the cheapest infinite horizon transient cost. As shown in Proposition 4.1, feedback policies corresponding to different fixedpoints of the shifted Bellman Equation, share the same infinite horizon average cost. Notice, in addition, that for any constants c 1 and c 2 , the function: is a fixed-point of the shifted Bellman Equation.
In fact, in this case, it can be shown that every fixed point of the shifted Bellman Equation is of this form. This result is likely to admit an extension to more general control set-ups.

Regularity of fixed-points of Bellman Equation
The following examples are meant to illustrate potential discontinuity and unboundedness issues of the fixed-point of the (shifted) Bellman Equation.

Example with lower semi-continuous solution of the Bellman Equation
Consider the following bilinear scalar system: with state taking values in X = [−2, 2] and input constraints: Let the stage cost be piecewise linear defined according to: (x, u) = max{0, x} + |u|.
Notice that for u = 0 every point is an equilibrium. Hence, simply letting u = 0 whenever the initial condition is ≤ 0 achieves the minimum average cost. If the initial condition is positive, the best control action is u = −1. Indeed, an input u ≤ −1 is needed in order to leave the set of positive states and enter the negative semi-axis, where the optimal average performance can be achieved. Hence, the best choice, given the penalty |u| on inputs, is to have u = −1. Moreover, waiting to apply such a control action does not pay off as the same cost will need to be incurred at some point in the future in order to switch to negative states. The following function is a lower semi-continuous solution of the associated Bellman Equation: which is achieved for the following control policy: We show in Fig. 7.8, how the iterations of the operatorsT andŤ behave when initialised from ψ(x) = 0. It is worth pointing out that while both sequences seem to asymptotically approximate the correct 'shape' of infinite-horizon cost, the theory confirms thatT k ψ(x) cannot be bounded, since its pointwise limit is known to be at least upper semi-continuous, which is not the case for the fixed point in the considered example.

Example with unbounded infinite horizon cost
Consider the following bilinear scalar system: with state x ∈ [0, 1] := X and u ∈ [1/2, 1]. Consider the stage cost: We claim that the optimal average cost is 0. In fact, the control sequence u(t) = 1/2 for t = 0 . . . K − 1 and u(t) = 1 for t > K yields: x(K) = x(0)/2 K , and x(t) = x(K) for t ≥ K. Notice that (x(t), u(t)) = |x(0)|/2 K <= 1/2 K for all t ≥ K. Hence the average cost can be made less or equal than 1/2 K for any positive integer K, and this, together with the inequality (x, u) ≥ 0, proves 0 optimal average cost. We show next that the optimal cost is unbounded.
For the infinite horizon cost to be bounded we need to find an input such that x(k) → 0 as k → +∞. Hence, the input needs to fulfill k−1 t=0 u(t) → 0. On the other hand: and therefore, for the cost to be bounded we need: log(u(t)) → −∞ as k → +∞. However, on the interval [1/2, 1], concavity of the log function yields: Using the inequality above shows: As a consequence, for the infinite horizon cost to be bounded we need: as k → +∞. This, however, contradicts boundedness of the cost as (x, u) ≥ 1 − u.
It is worth pointing out that the optimal steady state for the considered example is x s = 0 and u s = 1. This steady state is not reachable in finite time, though. Notice also that this is trivially a dissipative system with storage function λ(x) = 0 due to the non-negativity of the cost. As a consequence no bounded fixed-point of the shifted Bellman Equation exists.

Example with continuous and discontinuous fixed points
Consider the autonomous nonlinear system: is also a fixed point. Consider next an arbitrary continuous increasing initialisation of ψ of theT andŤ maps. It can be seen that T ψ is also increasing, as f (x) is such in the interval [−1, 1]. As a consequenceT ψ andŤ ψ are also increasing. Moreover, T ψ(0) = ψ(0) and T ψ(±1) = ψ(±1).
. By induction then, T k ψ(x) is increasing with respect to x for all k and so isŤ k ψ(x). It can be shown that for ψ(x) = x it holds c(T k ψ, TT k ψ) = 0 for all k. In particular,T k ψ converges to: Numerical simulations indeed confirm this claim, see Fig. 7.9. This shows that even ifT (orŤ ) admit continuous fixed points, the iteration ofT k ψ does not necessarily converge to a fixed point of the Bellman Equation.

Example with upper semi-continuous fixed point
We slightly modify the previous example to include a scalar control input and induce an upper semi-continuous fixed point. Consider the nonlinear system: Notice that: Hence, the optimal average performance is 0, achieved for u(·) = 1. The functionψ defined below: To see this, notice, assuming x = 0: For x = 0, it is easy to verify Tψ(0) = 0. We show in Fig. 7.10 the iteration converging tō ψ. Notice that, despiteψ being upper semi-continuous, not admitting a minimum in [0, 1], and the discontinuity point x = 0 being reachable from all states in X within a single step, still the minimum in the definition of the operator Tψ is achieved. More in general we see that the iteration

Complex optimal regime of operation
We consider examples where the optimal average performance is not achieved at steady-state, but for more exotic type of behaviours. It is worth pointing out that dealing with a terminal penalty function allows to treat such examples without the need of an a priori known terminal absorbing state or terminal absorbing set. Moreover, the optimal regime of operation does not entail a constant (or zero) optimal stage cost in steady-state.

Example with chaotic optimal regime
Consider the scalar nonlinear system: with scalar state x ∈ X := [0, 1] and input u ∈ U := [0, 4]. We consider the stage-cost: Notice that (x(k), u(k)) = x(k) 2 − x(k + 1) 2 + |u(k) − 18/5|. Therefore, along arbitrary solutions we have: In particular, computing asymptotic time averages we see: The optimal average performance is therefore 0, and is achieved for instance, for any input u(k) converging to 18/5. Notice that for u = 18/5 the considered dynamical system is known to have chaotic solutions. Moreover u(k) = 18/5 is potentially an optimal infinite horizon control policy. This policy corresponds to the fixed pointψ(x) = x 2 of the Bellman Equation. Indeed, Numerical solution using theT operator is shown in Fig. 7.11, starting from two distinct initializations, ψ(x) = 0 and ψ(x) = sin(4x). The optimal average performance is correctly estimated to be 0 andT k ψ converges to a shifted version of x 2 in both cases. The numerical solution using thě T operator is slightly different and is shown in Fig. 7.12. While it is hard to write an explicit analytic solution of the limiting function, due to the presence of somewhat unexpected spikes, we believe that the numerical result hint at the presence of multiple solutions to corresponding the Bellman Equation. These solutions match x 2 for most of the interval [0, 1] but appear to allow for piecewise linear spikes that might correspond to transient costs in regions which are not visited by the chaotic attractor. It seems more plausible that these be true solutions rather than artifacts due to numerical approximations. The optimal average performance is identified with very good precision in both cases. In particular, for theT iteration the error is lower than 10 −16 . See Fig.  7.13 for the shift sequence achieved for theŤ operator when ψ(x) ≡ 0. We consider next the following two-dimensional linear system: with state x ∈ X := [−1, 1] 2 , and input u ∈ U(x) : Notice that this cost is not positive definite. In particular, the optimal average performance can be expected to be negative, as the zero solution is feasible with zero input, yielding 0 average cost. However, the stage cost can be made negative for some values of x 1 = 0. The zero-input responses of the system are (feasible) period 4 oscillations. Moreover the system is controllable, which guarantees an optimal average performance independent of the initial condition (and regardless of the adopted stage cost (x, u)). We show in Fig. 7.14 a fixed point of the shifted Bellman Equation.
The iterations resulting from theT operator and theŤ operator starting in ψ 0 ≡ 0 are shown in Fig. 7.15.

Inefficiency of exponential discounting factors
We end our example section with an example of a discounted optimal control problem, which shows that ensuring well-posedness of infinite horizon optimal control problems by means of discounting can have unwanted side effects, making the proposed approach via the shifted Bellman Equation an attractive alternative. To this end, we consider a scalar infinite horizon linear quadratic optimal control problem with exponential discounting. In particular, the system's dynamics are given as: with x and u taking values in R. The stage cost is: Since this choice will not give rise to bounded costs over an infinite horizon we use a discounting factor γ ∈ (0, 1): The optimal infinite horizon cost fulfills the following Bellman Equation: , u)).
It is possible to show that this equation admits a solution: where α, β and δ fulfill the conditions: The optimal feedback is affine in x and expressed as: This feedback globally asymptotically stabilizes a unique equilibrium x e (γ): Notice that the optimal average performance is achieved at equilibrium, for x = 1/2 and u = 1/2, which yields V avg = (1/2) 2 + (1/2) 2 = 1/2. On the other hand, the equilibrium x e (γ) only approaches the value 1/2 as γ → 1 (see Fig. 7.16). This shows that the long run average performance achieved by introducing a discounting factor is in general suboptimal. Moreover, the discounting factor introduces a non existent trade-off between optimising transient cost and steadystate (average) costs which persist for γ arbitrarily close to 1. This trade-off can be avoided by the approach pursued in this paper. On the other hand, any feedback u = k(x) (for instance affine, u = k 1 x + k 2 ) which stabilizes the equilibrium 1/2, clearly achieves optimal average performance (and is therefore optimal with respect to the cost functional J avg ), but, at the same time, it is not necessarily optimal from the point of view of transient costs. We refer to [23,24] for more examples of this kind and an in-depth study of the stability properties of discounted optimal equilibria.

Conclusions and outlook
Two novel recursion operators are proposed for the simultaneous computation of value functions and minimal average asymptotic cost in discrete-time infinite horizon optimal control problems. The recursive formulas can be readily applied when average asymptotic cost is independent of initial conditions, a situation referred to as the ergodic case in [21]. The approach renders dynamic programming techniques invariant with respect to additive constants on the stage cost, as it is naturally the case in the finite horizon case, for infinite horizon control problems. The recursions converge, under fairly relaxed technical assumptions, to fixed-points of a shifted Bellman Equation, whose shift value is not a priori determined but is asymptotically computed alongside the value function. The approach removes the need for absorbing states and zero cost conditions on the absorbing sets which have often hindered the applicability such techniques, or the need for discounting factors which introduce unnecessary trade offs between transient cost and asymptotic average performance. While the approach is developed for the case of deterministic systems only, its extension to stochastic settings appears of potential interest. Finally, this may serve as a first step in understanding the more general question of a shift-invariant approach to infinite horizon optimal control problems in the non-ergodic case, [21,25].

A Appendix: Technical results
In order to analyse the convergence properties of the newly introduced operatorsŤ andT it is useful to explore inequalities involving the max and min operators applied to a finite set of functions. The next two lemmas provide such tools.
Lemma A.1 Let ψ i ∈ C(X) for i = 1, . . . , N . Then, the following holds: Proof. Let x * in X be such that: for someī in {1, . . . , N }. By monotonicity of the min operator, we see that: Combining the latter inequality with the previous equality yields: The following lemma provides a similar bound for the min operator. Proof. Let x * in X be such that: for someī in {1, . . . , N }. By monotonicity of the max operator, we see that: Combining the above inequalities imply: Existence of fixed points of the shifted Bellman Equation can be used to establish useful upper and lower bounds on the rate of growth of the T k operator applied to any initial condition ψ ∈ C(X). This is stated in the following lemma. viz. Tψ =ψ + c, for some c ∈ R. Then, for any positive integer k, and any function ψ ∈ C(x), the following holds: Proof. To see the first inequality, notice: Hence, exploiting monotonicity of the min operator we get: The second inequality can be proved along similar lines.
A direct consequence of Lemma A.3 is that: Moreover, we can state the following corollary: Tψ =ψ + c, for some c ∈ R. Then, for any ψ ∈ C(X) the following holds: Proof. The result follows dividing by k both sides of the inequalities in Lemma A.3, and taking the limit as k → +∞.
Notice that, by construction, if the sequenceT k ψ is bounded it converges to an upper semicontinuous function. Analogously, ifŤ k ψ is bounded it converges to a lower semi-continuous function. Tψ =ψ + c, for some c ∈ R. Then, the solutionT k ψ fulfills the bound: Proof. To see this, notice that, by the min-commutativity property, a simple induction argument shows,T k ψ(x) = min h∈{0,...,k} T h ψ(x) + c h , for suitable values of c h ∈ R and c 0 = 0. By Lemma A.2 we have: Canceling out the constant terms and exploiting Lemma A.3 yields: This last inequality completes the proof of the lemma.
Our subsequent analysis will rely on a combination of monotonicity and Lyapunov-based arguments. To this end it is useful to show thatT andŤ operators yield non-increasing iterations according to suitable Lyapunov functionals. Exploiting Lemma A.2 yields the following: Lemma A.6 Assume that there exists a continuous solution to the shifted Bellman Equation, viz. Tψ =ψ + c, for some c ∈ R. Define the Lyapunov functional: Then, for any continuous ψ the following holds: Proof. Let ψ ∈ C(X) be arbitrary. The inequality can be derived as follows: where the first inequality follows by Lemma A.2, and the second follows because d(T ψ, Tψ) ≤ d(ψ,ψ). Then, for any continuous ψ the following holds: Proof. The inequality can be derived as follows: where the first inequality follows by Lemma A.1, and the second follows because d(T ψ, Tψ) ≤ d(ψ,ψ).
An alternative Lyapunov functional for the operatorT can be stated as follows: The following lemma proves that this is non increasing along iterations ofT (·).
Lemma A.8 Consider the function W (ψ) defined in (A.5). For any real valued continuous function ψ : X → R the following holds: Proof. To prove the lemma consider the following inequalities: In addition, by definition ofT ψ, we see that: By monotonicity and translation invariance, applying the T operator to all sides of the former inequality yields: We are now ready to estimate W (T ψ) by combining inequalities (A.6) and (A.7): Remark A.9 The same argument used to prove Lemma A.8 can be used to prove the following decoupled inequalities: and: min Our analysis indicates that regardless of whether the sequence of functionsT k ψ(x) converges, the real-valued sequence of shifts applied, c(T k ψ, TT k ψ) is always bounded and convergent.
Proof. By induction, and Remark A.9 we have that the real-valued sequence: max x∈X [T k ψ(x) − TT k ψ(x)] is monotonically non-increasing, (and bounded from below by min x∈X [T k ψ(x)−TT k ψ(x)] ). Similarly, min x∈X [T k ψ(x)−TT k ψ(x)] is monotonically non-decreasing (and bounded from above by max x∈X [T k ψ(x) − TT k ψ(x)]). Hence, both sequences admit a limit: By definition of c(T k ψ, TT k ψ) we see that: which completes the proof of the lemma.
We turn next to establishing similar inequalities for theŤ operator.
Lemma A.11 Consider the function W defined in (A.5). For any real valued continuous function ψ : X → R the following holds: Proof. To see the inequality consider that we have: In addition, by definition ofŤ ψ, we see that: By monotonicity and translation invariance, applying the T operator to all sides of the former inequality yields: We are now ready to bound from above W (T ψ) by combining inequalities (A.10) and (A.11): Remark A. 12 The same argument used to prove Lemma A.11 can also be used to prove the following decoupled inequalities: and: min A similar proof as in Lemma A.10 allows to conclude the following result: Lemma A. 13 The sequence c(Ť k ψ, TŤ k ψ) is bounded and convergent, viz. there existsč ∞ ∈ R such that: lim k→+∞ c(Ť k ψ, TŤ k ψ) =č ∞ .
It seems important to relate the value ofĉ ∞ andč ∞ with the optimal average infinite horizon cost, viz. V avg . The following result shows that −ĉ ∞ is always an upper-bound to the optimal average cost.
Lemma A.14 Assume that a fixed-point of the shifted Bellman Equation exists, viz. Tψ =ψ + c for some c ∈ R. Then, for any ψ ∈ C(X) it holds: (A.14) Proof. We argue by contradiction. Assume that c +ĉ ∞ > 0. Then, there exists ε > 0 and Q ∈ N such that: c + c(T k ψ, TT k ψ) ≥ ε > 0 (A. 15) for all k ≥ Q. Moreover, there exists N ∈ N such that for any x ∈ X 16) We claim that, under such assumptions,T k ψ(x) converges to a fixed-point within a finite number of iterations. In fact, for any m ≥ N we see that: HenceT Q+N ψ(x) = lim k→+∞T k ψ(x) where convergence is in a finite number of steps (uniform over x ∈ X). Moreover,T Therefore,T Q+N ψ(x) is a (continuous) fixed point of theT operator, and by virtue of Proposition 3.4 it is a solution of the shifted Bellman Equation for some c = −c(T Q+N ψ, TT Q+N ψ). This implies c +ĉ ∞ = 0, which is a contradiction.
Whenever the sequenceT k ψ is pointwise convergent, one can show that also the converse inequality holds, and therefore −ĉ ∞ equals the optimal average performance. The next lemma is instrumental in deriving such result. Proof. Remark that the functionĴ is upper semi-continuous, but not necessarily lower semicontinuous. Hence its minimum might, a priori, not be well-defined. By monotonicity of the minimum operator: We need to show the converse inequality. To this end, denote by u n any sequence in U such that J(u n )−2 −n ≤ inf u∈UĴ (u). Clearly, for any n, there exists k n > n such that J kn (u n ) ≤Ĵ(u n )+2 −n .
Overall we see: Letting n go to infinity in the previous inequality yields: This completes the proof of the lemma.
It is sometimes useful to consider the extension of operator T to functions ψ bounded from below (and non-necessarily continuous). To this end, if ψ : X → R is bounded from below, we denote by T ψ the following: (x, u)).
Proof. To prove the lemma notice that: where the last equality follows by applying Lemma A.15 to the sequence of x-parameterized functions J k (x, u) := (x, u) +T k ψ(f (x, u)).
We are now ready to prove the converse inequality.
Lemma A.17 Assume the sequenceT k ψ(x) to be pointwise convergent to some bounded function ψ(x). If a fixed pointψ of the shifted Bellman Equation exists, viz. Tψ =ψ + c for some c ∈ R and someψ : X → R, the following holds: 0 ≤ c +ĉ ∞ .
In particular then, for any continuous ψ ≥ψ: Dividing both sides of the previous inequality by k and letting k tend to infinity yields: A similar analysis can be carried out with respect to the iterationŤ k ψ and the corresponding limiting value of the shiftč ∞ . As a matter of fact, not all results extend along the same lines, due to the lack of formula (3.4). We first state the analogue of Lemma A.15.
If in addition the limitψ(x) is continuous, then it is a fixed point of a shifted Bellman Equation.
Hence, by Corollary A.19, letting k → +∞ in the right-hand side of the latter inequality yields: In addition, ifψ ∈ C(X) then, by Dini's theorem, convergence is uniform andψ is a fixed point of T by continuity of theŤ operator in the topology of uniform convergence.
We are now ready to state the analogue of Lemma A.17.
Lemma A.21 Assume that the sequenceŤ k ψ(x) be pointwise convergent to some bounded functioň ψ(x). If a fixed pointψ of the shifted Bellman Equation exists, viz. Tψ =ψ + c for some c ∈ R, the following holds: 0 ≥ c +č ∞ .
Dividing both sides of the previous inequality by k and letting k tend to infinity yields: 0 ≥ c +č ∞ .
A stronger claim can be achieved when theT k ψ andŤ k ψ sequences admit a continous limit.
Proof. By constructionT k ψ is monotone non-increasing with respect to k. Hence, by Dini's Theorem, convergence toψ is uniform. The result follows by continuity of the T (.) and c(., .) operators with respect to the topology of uniform convergence.
The convergence properties ofT k ψ andŤ k ψ sequences will be established through a combination of Lasalle-style and monotonicity-based arguments. The following lemmas are crucial to understand the implication of certain Lyapunov functionals being constant along iterations of thê T andŤ maps. Proof. To prove the lemma notice that inequality (A.9) holds, and can be derived from the following inequalities: TT ψ(x) ≤ T ψ(x) andT ψ(x) ≥ T ψ(x) + c(ψ, T ψ) − d(ψ, T ψ). If (A.9) is an equality, both previous inequalities need to be fulfilled non strictly for any x m ∈ arg min x∈XT ψ(x)−TT ψ(x). This shows that x m ∈ arg min x∈X ψ(x) − T ψ(x). Since x m was arbitrary to start with, inclusion of the arg min sets follows, which concludes the proof of the lemma.