On the continuity of the projection mapping from strategic measures to occupation measures in absorbing Markov decision processes

In this paper, we prove the following assertion for an absorbing Markov decision process (MDP) with the given initial distribution, which is also assumed to be semi-continuous: the continuity of the projection mapping from the space of strategic measures to the space of occupation measures, both endowed with their weak topologies, is equivalent to the MDP model being uniformly absorbing. An example demonstrates, among other interesting scenarios, that for an absorbing (but not uniformly absorbing) semi-continuous MDP with the given initial distribution, the space of occupation measures can fail to be compact in the weak topology.


Introduction
In this paper, we consider a Markov decision process (MDP) with a Borel state space X and a Borel action space A, both being endowed with their Borel σ-algebras.If there is an isolated absorbing state say 0 in the state space, then the MDP model is called absorbing for a given initial distribution P 0 if under each strategy, the expected hitting time to 0 is finite.
In terms of occupation measures, understood as the total state-action frequencies on (X \ {0}) × A, endowed with the product σ-algebra, an MDP is absorbing for the given initial distribution if the occupation measure of each strategy is a finite measure.Occupation measures are important for the study of optimal control problems of MDPs with total cost criteria because the performance measure can be written as an integral of the cost function with respect to occupation measures.This turns the original MDP problem to a static optimization problem in the space D of occupation measures.Since for absorbing MDP models, D contains only finite measures, it is natural, as we do in this paper, to endow it with the weak topology, which is metrizable.
The wonderful and insightful paper [9], with many new ideas, intended to develop a rich theory for absorbing MDPs, in particular, for the occupation measures in such MDP models.A key property was used there, which asserts that if the MDP model is absorbing and semi-continuous, then the space of occupation measures is compact with respect to the weak topology, see [9,Lemma 4.7].Here an MDP model is called semi-continuous if the action space A is compact, and the transition kernel p(dy|x, a) is either set-wise continuous in a ∈ A for each x ∈ X, or it is continuous with respect to the weak topology in (x, a) ∈ X × A.
In order to prove this result, [9] exploited the following fact: for semi-continuous MDP models, the space of strategic measures is compact in weak topology, as established in [17], and the projection mapping O, carrying the strategic measure of a strategy to the occupation measure of the same strategy, is continuous.
In the present paper, by means of an example, see Example 2 below, we demonstrate that the aforementioned assertion in [9,Lemma 4.7] is inaccurate.This is due to the fact that in general, the projection mapping O may be not continuous for absorbing semicontinuous MDP models, see Theorem 1.
Our second contribution is as follows.We show that for an absorbing MDP model, provided that it is semi-continuous, the projection mapping O is continuous if and only if when the MDP model is uniformly absorbing.The latter requires that, for the given initial distribution, the expected hitting time to 0 converges uniformly with respect to all strategies.In fact, the sufficiency part follows from the same reasoning as in the proof of [9,Lemma 4.7], by avoiding the minor error therein.This characterization of the continuity of the projection mapping O shows that the gap in [9] can be naturally closed if one further requires the MDP model there to be uniformly absorbing.
To the best of our knowledge, this definition of uniformly absorbing MDPs for a given initial distribution seems to appear in [10].In that paper, which focused on non-atomic MDP models, its equivalence to the continuity of the projection mapping O was not discussed.
A trivial example of uniformly absorbing MDP models is given by those discounted MDP models with a nonnegative discount factor strictly smaller than 1.
If the initial distribution is concentrated on a single state, and the state space is countable or finite, endowed with the discrete topology, then the MDP model, assumed to be semi-continuous, is uniformly absorbing if it possesses a uniform Lyapunov function, see Definition 6 below.MDP models with a uniform Lyapunov function were studied intensively in e.g., [1,5,12].In particular, for such MDP models, for every initial state, the MDP model is uniformly absorbing, see Proposition 1.However, if one considers a fixed initial distribution, then the existence of a uniform Lyapunov function does not imply the model to be uniformly absorbing or even absorbing.This is demonstrated in Example 1, and is sometimes overlooked.Similar models for MDPs with Borel state and action spaces were considered in e.g., [13] and [11,Chapter 9], which assumed that the total value of occupation measures is bounded or w-bounded over all initial states and strategies, and focused on the optimality equation.
Let us mention that, apart from providing an absorbing semi-continuous MDP model with the space of occupation measures being not compact, Example 2 incidentally demonstrates several other scenarios of interest.E.g., it also demonstrates that an absorbing MDP model for a fixed initial state may not possess a uniform Lyapunov function, and that some known solvability conditions are important.
The rest of this paper is organized as follows.We describe the MDP model in Section 2, and recall some known facts in Section 3. The main results are presented in Section 4. The main example, i.e., Example 2, is formulated in Section 5, where we also discuss it in the context of optimal control problem of MDPs.The proofs of all the statements (except those known ones) are in Section 6.Finally, this paper is finished with a conclusion in Section 7.

Model description
Before describing the MDP model, let us fix some notation and conventions used throughout this paper.If a space Y is discrete (with the discrete topology) and µ is a measure on B(Y), then, for singletons, we use notation µ(y), not µ({y}).Integral of a function c(•) with respect to a measure µ is written as Y c(y)dµ(y) or Y c(y)µ(dy).In R = (−∞, ∞), unless stated otherwise, the usual Euclidean topology is fixed.δ a (dx) is the Dirac measure concentrated at the point a, provided that the singleton {a} is measurable, and I{•} is the indicator function.
We fix the following primitives of a Markov decision process (MDP) model.
• X and A are the state and action spaces, both assumed to be nonempty topological Borel spaces.Here a topological Borel space is a Borel subset of some completely metrizable separable space.We endow X and A with their Borel σ-algebras B(X) and B(A), respectively.
• The transition probability p(dy|x, a) is a (measurable) stochastic kernel on B(X) given X × A.
Given the collection {X, A, p}, given each strategy (see Definition 1 below) and initial distribution on X, one can construct a probability space, and define the controlled (state) process {X t } ∞ t=0 and controlling (action) process {A t } ∞ t=1 thereon.Derman in his classic book [6, p.4] termed the bivariate process {(X t , A t+1 )} ∞ t=0 a Markov decision process.This is why we refer to {X, A, p} as the MDP model.The aforementioned construction is described next.
The space of trajectories (or say histories) is the following countable product The generic notation for an element of H is ω = (x 0 , a 1 , x 1 , . ..) ∈ H.We endow H with the product σ-algebra, which is also the Borel σ-algebra on it.All the random variables, like X t and A t+1 , are just measurable mappings defined on H: Definition 1 A control strategy, or simply say a strategy, . If, for some stochastic kernel π s on B(A) given X, π t (da|x 0 , a 1 , x 1 , . . ., a t−1 , x t−1 ) = π s (da|x t−1 ) for any t = 1, 2, . . ., then the strategy π = {π t } ∞ t=1 is called stationary, and is identified with and denoted as π s .If a stationary strategy π s takes the form π s (da|x) = δ φ(x) (da) for some measurable mapping φ from X to A, where δ φ(x) (da) denotes the Dirac measure concentrated on the singleton {φ(x)}, then the strategy π s is called deterministic stationary, and is identified with and denoted by the underlying measurable mapping φ.
The set of all strategies is denoted as ∆ All , and the set of Markov strategies is denoted as ∆ Markov .
Let the initial distribution P 0 on (X, B(X)) be given.If a strategy π is also fixed, then the strategic measure on (H, B(H)), constructed in the standard way using the Ionescu-Tulcea Theorem, is denoted as P π P 0 .It is the unique probability measure on (H, B(H)) such that P π P 0 (X 0 ∈ dx) = P 0 (dx); The corresponding mathematical expectation is denoted as E π P 0 .If the initial distribution P 0 (dx) = δ x 0 (dx) is degenerate, we use notations P π x 0 and E π x 0 .We denote by the space of all strategic measures with the given initial distribution P 0 .An important class of MDP models is the absorbing model defined as follows.
Definition 2 The MDP is called absorbing (at 0) for the given initial distribution P 0 if there is an isolated state, say 0 in X such that 0 is absorbing, i.e., p({0}|0, a) ≡ 1, and where denotes the hitting time to the state 0 by the controlled process.As usual, inf ∅ := ∞.
Verbally, an MDP model is absorbing (at 0) for the initial distribution P 0 if under each strategy, the expected hitting time to the isolated absorbing state 0 is finite.The above definition of an absorbing MDP model for the initial distribution P 0 is the same as the one given by Feinberg and Rothblum in [9, p.7].Given that the state 0 is absorbing, this definition also coincides with the one in [1, Definition 7.1].In general, in [1, Definition 7.1], for the model to be absorbing for P 0 , the state 0 was not required to be absorbing itself, and it was required that is the return time to state 0. Thus, for an absorbing MDP model (at state 0) for the initial distribution P 0 , for each strategy, the series If we require the convergence of the above series to be uniform with respect to all strategies π ∈ ∆ All , then the resulting model will be called uniformly absorbing, formulated in the next definition.
Definition 3 An absorbing (at 0) MDP model for the initial distribution P 0 is called uniformly absorbing for P 0 if This definition was given in [10,Definition 3.6].Since the state 0 was required to be absorbing in Definition 2, the requirement in Definition 3 is the same as This equality, when specialized to n = 0, allows us to formulate the definition of absorbing MDP models in terms of the finiteness of the occupation measures, defined as follows.
Definition 4 Consider an MDP model {X, A, p} with 0 ∈ X being an isolated absorbing state, i.e., p({0}|0, a) ≡ 1.The occupation measure η π P 0 of a strategy π for the given initial distribution P 0 is the [0, ∞]-valued measure on ((X \ {0}) × A, B((X \ {0}) × A)) defined by the formula If the initial distribution is a singleton, say x 0 , then we write η π x 0 for the occupation measure of a strategy π.Let be the space of all occupation measures for the initial distribution P 0 .Now we see that an MDP model with an absorbing state 0 is absorbing (at 0) for the initial distribution P 0 if and only if For brevity, we make the next definition.
Definition 5 An MDP model {X, A, p} is called semi-continuous (S) if the following two conditions are satisfied: (a) The action space A is a compact topological Borel space.
(b) For each x ∈ X, the function X u(y)p(dy|x, a) is continuous in a ∈ A for every bounded measurable function u(•).
An MDP model {X, A, p} is called semi-continuous (W) if the following two conditions are satisfied: (c) The action space A is a compact topological Borel space.
(d) The function X u(y)p(dy|x, a) is continuous in (x, a) ∈ X × A for every bounded continuous function u(•).
An MDP model {X, A, p} is called semi-continuous if it is either semi-continuous (S) or semi-continuous (W). in [17,Section 5].This is why we termed semi-continuous (S) and semi-continuous (W) models in the above.If X is countable or finite, and is endowed with the discrete topology, then semi-continuity (S) and semi-continuity (W) mean the same.Admittedly, the term of "semi-continuous MDP models" has other meanings in the literature.Nevertheless, in the present paper, its meaning is unambiguous.

Some relevant facts
In this section we present some facts about MDP models and optimal control problems of MDPs.

MDP with a uniform Lyapunov function
An important class of semi-continuous MDP models with a countable or finite state space that are uniformly absorbing (at 0) for a given initial state x 0 is given by those that admit a uniform Lyapunov function, defined as follows.
Definition 6 Consider a semi-continuous MDP model with a countable or finite state space X (endowed with the discrete topology) with the isolated absorbing state 0. A [1, ∞)valued function µ(•) on X is said to be a uniform Lyapunov function if the following conditions are satisfied: (b) For each x ∈ X, the mapping a ∈ A → y∈X\{0} p(y|x, a)µ(y) is continuous.
(c) For each x ∈ X and each deterministic stationary strategy φ, lim t→∞ E φ x [µ(X t ) ×I{τ 0 > t}] = 0, where we recall that τ 0 := min{t ≥ 1 : The above definition of a uniform Lyapunov function was taken from [1, Definition 7.4], see also [5,Definition 4.2].In both of these two references, this definition is ascribed to [12].Many characterizations and consequences of a semi-continuous MDP model with a uniform Lyapunov function can be found in [1,5], among which is the following one, whose proof can be found on p.107 of [1].
Proposition 1 Consider a semi-continuous MDP model with a countable or finite state space X (endowed with the discrete topology) with the isolated absorbing state 0. If there exists a uniform Lyapunov function µ(•), then the MDP model is uniformly absorbing (at 0) for each initial state x 0 .
For a semi-continuous MDP model with a countable or finite state space X (endowed with the discrete topology) with the isolated absorbing state 0, if there exists a uniform Lyapunov function µ(•), it can happen that for some initial distribution P 0 , the MDP model is not absorbing (at 0) for P 0 .Sometimes, this is overlooked.We present an example to illustrate this.
Example 1 Consider an MDP model {X, A, p} with X = {0, 1, . . .}, endowed with the discrete topology, and A being a singleton, so that we will omit the argument a ∈ A everywhere in this example, and p(0|x) = 1 2 x = 1 − p(x|x) for all x ∈ {1, 2, . . .} and p(0|0) = 1.There is only one strategy say π in this model.Then for each x 0 ∈ {1, 2, . . .}, Hence, this MDP model is absorbing, in fact uniformly absorbing (at 0) for every initial state x 0 .It has a uniform Lyapunov function given by µ(0) = 1 and µ(x) = 2 x for x ≥ 1.Indeed, (a,b) in Definition 6 can be checked to be satisfied by µ(•), and for (c), note for each x ≥ 1 that as n → ∞, and the convergence also takes place for x = 0. Now if we take the initial distribution as the geometric distribution on {1, 2, . . .} with parameter 2  5 , then The main results of this paper concern the continuity of the projection mapping O from P to D. Therefore, we endow D and P with suitable topologies, described as follows.
On the space P (for the fixed initial distribution P 0 ), one can consider the so called ws ∞ -topology of Schäl, introduced in [17], see p.359 therein.This is the coarsest topology on P, with respect to which, for each integer 0 ≤ T < ∞ and each bounded measurable function f (h T ) = f (x 0 , a 1 , x 1 , . . ., a T ) continuous in (a 1 , a 2 , . . ., a T ) under arbitrarily fixed x 0 , x 1 , . . ., x T −1 ∈ X, the mapping P ∈ P → H T f (h T )P T (dh T ) is continuous.Here H 0 = X, and H T = (X × A) T × X, and P T denotes the marginal of P ∈ P on H T , T ≥ 0 being an integer.
The ws ∞ -topology works particularly well with semi-continuous (S) MDP models.One of the main results in [17] is the following compactness result, whereas for semi-continuous (W) MDP models, the weak topology on P is more convenient.What actually happens is that for semi-continuous (S) MDP models, on P, the ws ∞ -topology is the same as the weak topology, see Proposition 3 below.
Proposition 2 Consider the MDP model {X, A, p} with some initial distribution P 0 .If the MDP model is semi-continuous (S), then P endowed with ws ∞ topology is compact, whereas if the MDP model is semi-continuous (W), then P endowed with the weak topology is compact.
2 Nowak further studied the ws ∞ -topology on P in [15], and proved the following useful result.
Proposition 3 Suppose that the MDP model {X, A, p} is semi-continuous (S).Let some initial distribution P 0 be given.Then on P, the ws ∞ -topology coincides with the weak topology.Consequently, the space P endowed with the ws ∞ -topology is metrizable, and compact in the weak topology, in view of Proposition 2.
2 In view of Propositions 2 and 3, we will always consider the weak topology on P when we discuss semi-continuous MDP models, for which, P is always compact.

Facts about optimal control problems
Now we add an additional element to the MDP model, which is the cost function c(•).This is a measurable function from (X × A, B(X × A)) to [−∞, ∞] in general.
Define for each strategy π and an initial distribution P 0 where c + (x, a) := max{c(x, a), 0}, c − (x, a) = max{−c(x, a), 0}, and we accept that ∞ − ∞ := ∞.When P 0 (dx) = δ x 0 (dx) for some x 0 ∈ X, then we write v π (x 0 ) instead of v π (P 0 ).For future reference, let A strategy π * is called optimal for P 0 if it solves the following optimal control problem: i.e., In the next statement, sufficient conditions for the existence of an optimal strategy for P 0 are given.See also [7] and the references therein.
Proposition 4 Suppose that the MDP model is semi-continuous (S), and c(•) is bounded below and (−∞, ∞]-valued on X × A such that c(x, a) is lower semicontinous in a ∈ A for each x ∈ X.If for the given initial distribution P 0 , and then there exists an optimal strategy for P 0 .The same assertion holds if the MDP model is semi-continuous (W), and c(•) is a bounded below (−∞, ∞]-valued lower semicontinous function on X × A. Proof.[17], condition (7) is called (C), and condition ( 6) is referred to as "General Assumption" (GA).
For an absorbing (at 0) MDP model for P 0 , we will take c(•) such that c(0, a) ≡ 0 at the absorbing isolated point 0. Then 0 will be referred to as the costless cemetery.If Condition (GA) is satisfied, then the optimal control problem (5) can be rewritten in terms of occupation measures in the following way: Minimize over η ∈ D:

Main results
In the wonderful and insightful paper [9], a rich theory for absorbing MDP models was developed.In that paper, the following assertion was claimed, see the proof of [9, Lemma 4.7]: • If the MDP model {X, A, p} is semi-continuous, and is absorbing (at 0) for the given initial distribution P 0 , P is endowed with the weak topology (see Proposition 3 and the paragraph below it), and D is endowed with the weak topology generated by bounded continuous functions on (X \ {0}) × A, then the projection mapping O defined in (3) is continuous.
If the above claim was true, then by Propositions 2 and 3 as well as the fact D = O(P), it would then follow that D is compact, provided that the MDP model {X, A, p} is semicontinuous, and is absorbing (at 0) for P 0 .This assertion was formulated as [9, Lemma 4.7], and was used several times therein.
The first main result of this paper is that we show by means of an example that the above claims regarding the continuity of O and the compactness of D are false.
Theorem 1 There is an MDP model {X, A, p} such that the following assertions hold for a fixed initial distribution P 0 : (a) The MDP model is semi-continuous.Consequently, the space of all strategic measures P for the initial distribution P 0 is compact in the weak topology.
(b) The MDP model is absorbing (at 0) for P 0 .
(c) The MDP model is not uniformly absorbing (at 0) for P 0 .
(d) D is not compact with respect to the weak topology.
(e) The mapping O defined by ( 3) is not continuous between P and D, both of which are endowed with their weak topologies.
(f ) The MDP model does not possess a uniform Lyapunov function.
The proof of this theorem is given in Subsection 6.1.
Since the continuity of the projection mapping O defined by (3) from P (endowed with the weak topology) to D (endowed with the weak topology) played an important role in the reasoning in [9], and Theorem 1 shows that this is not guaranteed if the MDP model is semi-continuous and absorbing (at 0) for P 0 , we next investigate when it holds.Our second main result asserts that for a semi-continuous MDP model, provided that it is absorbing for P 0 , the aforementioned continuity of the projection mapping O holds if and only if the MDP model is uniformly absorbing for P 0 .
Theorem 2 Consider a semi-continuous MDP model with the fixed initial distribution P 0 .Suppose that the MDP model is absorbing (at 0) for P 0 .Then the projection mapping O defined by (3) from P (endowed with the weak topology) to D (endowed with the weak topology) is continuous if and only if the MDP model is uniformly absorbing (at 0) for P 0 .
The proof is given in Subsection 6.2.

Example
Theorem 1 will be proved by means of the following MDP model.

Example 2
The elements of the MDP model {X, A, p} are as follows.
The transition diagram of the MDP model in Example 2 is given in Figure 1.
The MDP model in Example 2 is explored in the proof of Theorem 1. Incidentally, it can also be used to demonstrate that condition (7) is important for the solvability, see the next proposition.(a) There is no optimal strategy for the given P 0 .
(b) Condition ( 7) is not satisfied, whereas all the other conditions in Proposition 4 are satisfied.
The proof of this Proposition is given in Subsection 6.3.

Proof of the statements
In this section we provide the detailed proofs of Theorem 1, Theorem 2 and Proposition 5.

Proof of Theorem 1
Proof of Theorem 1.Throughout this proof we consider the MDP model in Example 2.
According to [2, Propositions 9.8 and 9.10], w * (•) is the minimal nonnegative solution to the Bellman equation We claim that w * (0) = 0 and To see this, it suffices to note that the function v(•) on {0, 1, . . .} given by v(x) := 2 + 2 x and v(0) = 0 solves the Bellman equation ( 9), and this is true because, v(0) = 0 trivially satisfies the first equality in ( 9), whereas for x ∈ {1, 2, . . .}, Therefore, w * (0) = 0 and w * (x) ≤ 2 + 2 x for x ∈ {1, 2, . . .}. Thus, (b) is proved.Incidentally, from the calculations similar to those in the proof of (c), we will see actually that This equality will be used in the proof of a subsequent statement, and we formulate it as Lemma 1 below, and will prove it there.Nevertheless, for the purpose here, the validity of the inequality ≤ is sufficient.
(c) We have seen in part (b) that this MDP model is absorbing (at 0) for the given initial distribution P 0 concentrated on {1}.We now verify that it is not uniformly absorbing (at 0) for P 0 .
To show this, consider the deterministic stationary strategies With slight abuse of notation, let us denote by the marginal on X \ {0} = {1, 2, . . .} of the occupation measure of the strategy π for the initial distribution P 0 .
For each n ∈ {0, 1, . . .}, η φ n 1 is given by Indeed, under the deterministic stationary strategy φ n any state x < n + 1 is reached with probability 1 2 x−1 , and given that it is reached, the controlled process spends exactly one time unit on it; any state x > n + 1 is never reached.This justifies the first and the third equalities in (12).For the state x = n + 1, note that 1 2 n is the probability that n is for X n = 0; the state n + 1 cannot appear at the steps t < n.)After that X n+t = n + 1 is realized with probability (1 − p n+1 ) t , leading to For each n ≥ 0, sup does not converge to 0 as n → ∞.In view of (2), it follows that this MDP model is not uniformly absorbing at 0 for the given P 0 .
(d) Now we fix the weak topology on D, as described in Section 3. Accordingly, in this proof, the notions of compactness and convergence of sequences in D are understood with respect to it, and this will not be signified repeatedly.The target here is to show that the space D is not compact.This is equivalent to showing that D is not sequentially compact, because the weak topology on D is metrizable, as mentioned in Section 3. Consequently, it suffices to show that the set {η φ n 1 , n = 0, 1, 2, . ..} of occupation measures on B((X \ {0}) × A) has no accumulation points in D, where the deterministic stationary strategies φ n are defined by (11) in the proof of (c).
(e) As mentioned in Section 4, if the projection mapping O defined by (3) was continuous from P endowed with the weak topology to D endowed with the weak topology, then D would have been compact, because P is compact by Proposition 2. Since D is not compact as shown in (d), O is not continuous.
However, in the proof of (d), we have seen that O(P φ n 1 ) = η φ n 1 does not converge to O(P φ 1 ) = η φ 1 in the weak topology, see (17) and (18).This shows the claimed discontinuity of the mapping O.
(f) Since this MDP model is not uniformly absorbing (at 0) for the initial state x 0 = 1, it follows from Proposition 1 that there cannot be uniform Lyapunov functions.In fact, conditions (a,b) in Definition 6 can be satisfied, but any function satisfying them violates condition (c) in Definition 6.We demonstrate this fact explicitly as follows.First, the function defined by µ(0) = 1 and µ(x) = 2 + 2 x for x ≥ 1 satisfies condition (a) in Definition 6 because 1 + ∞ y=1 µ(y)p(y|0, a) ≡ 1 = µ(0), and for each x ≥ 1, it holds that Condition (b) in Definition 6 is trivially satisfied.
Next, let us show that, for each function µ(•) satisfying condition (a) in Definition 6, condition (c) therein is violated.Indeed, condition (a) in Definition 6 implies that Now, for the deterministic stationary strategy φ(x) ≡ 2, we have for each x ≥ 1 that where 1 2 t = P φ x (T 0 > t) and, if T 0 > t, then X t = x + t with probability 1.Since the above expression does not converge to 0 as t → ∞, we see that condition (c) in Definition 6 is not satisfied by µ(•). 2

Proof of Theorem 2
Proof of Theorem 2. Suppose the MDP model is semi-continuous.First, we prove the 'if' part.This is done by mimicking the reasoning in the proof of [9,Lemma 4.7], avoiding the minor inaccuracy therein.Suppose P π i P 0 → P π P 0 as i → ∞ in the weak topology, and fix an arbitrary ε > 0 and an arbitrary bounded continuous function g(•) on X × A such that g(0, a) ≡ 0. Let ḡ := sup Let N ≥ 0 be such that, for all π ∈ ∆ All , ḡ sup ∞) such that P π i j P 0 → P π P 0 in the weak topology for some strategy π.We will show that the sequence {η π i j P 0 } ∞ j=1 does not converge to η π P 0 in the weak topology.For the constant ε > 0 fixed above, we choose N > 0 such that This can be done because the MDP is absorbing (at 0) for P 0 : After that, choose K > 0 such that, if j > K, then This can be done because P π i j P 0 → P π P 0 in the weak topology.Recall that 0 is an isolated state so that I{x ̸ = 0} is a bounded continuous function on X.Now, for j > K, n i j > N , i.e., for all big values of j, we have and the following relations for the bounded continuous function g(x, a) ≡ 1 on (X\{0})×A: Thus, the sequence {η π i j P 0 } ∞ j=1 does not converge to η π P 0 in the weak topology, and the mapping O defined by (3) is not continuous.2

Proof of Proposition 5
In this subsection, we prove Proposition 5. Firstly, we present a lemma.
Then similar considerations to those for (12) result in, with slight abuse of notation, for the marginal on X \ {0} of the occupation measure of the strategy φ n for the initial state x.Now and hence, where the first equality is by the definition of the function w * (•), see (8).
On the other hand, it was shown in the proof of (b) of Theorem 1 for each x ≥ 1 that w * (x) ≤ 2 + 2 x , see (10).Combining the previous two inequalities yields w * (x) = 2 + 2 x for all x ≥ 1.Thus, the statement is proved.
On the other hand, and thus v π (1) > v * (1), which is a desired contradiction against that π is optimal for the initial state 1.Consequently, there are no optimal strategies to problem (5) for the MDP model with the given initial state 1 and the given cost function c(•).
(b) The last assertion of this part is trivially true.In particular, by Lemma 1, sup where the first equality is by (8).Thus, it follows from (a) and Proposition 4 that condition (7) cannot be satisfied.

Now inf
where the inequality holds because c(•) ≤ 0, and the equalities hold by the similar argument as in (13). 2 Remark 1 A strategy π is called uniformly optimal if v π (x) = v * (x) for all x ∈ X.Consider an MDP model with a cost function c(•) such that sup π∈∆ All E π x [ ∞ n=0 c − (X n , A n+1 )] < ∞ and v * (x) ∈ (−∞, ∞) for all x ∈ X, where v * (•) is defined by (4).Then according to [18, Theorem 2.2], a deterministic stationary strategy φ is uniformly optimal if and only if v * (x) = c(x, φ(x)) + X v * (y)p(dy|x, φ(x)); The two equalities in (20) are called the Dubins-Savage conditions.Now, consider the optimal control problem (5) for the MDP model in Example 2 with the cost function c(x, a) ≡ −1 for x ∈ {1, 2, . . .} and c(0, a) ≡ 0. We have seen from Proposition 5(a) that there is no uniformly optimal strategy.In particular, φ(x) ≡ 2 is not uniformly optimal.Let us verify this fact again by checking the Dubins-Savage conditions.They are sufficient and necessary for the uniform optimality of φ because c(•) is (−∞, 0]-valued and v * (•) is finite-valued.Now, we observe that φ actually satisfies the first equality in (20).Nevertheless, the second equality in (20) is violated because Here 1 2 n is the probability that X n = n + 1.

Conclusion
In conclusion, we showed that for a semi-continuous absorbing MDP with a fixed initial distribution, the continuity of the projection mapping O from the space of strategic measures P to the space of occupation measures D, both of which are endowed with their weak topologies, is equivalent to the MDP model being uniformly absorbing.This is confirmed by an example.Provided that the absorbing MDP model is semi-continuous, the continuity of O is a sufficient condition for the compactness of D in the weak topology.Whether it is also necessary is an interesting open problem for future studies.Finally, we mention that if the MDP model is not absorbing, then D contains measures that are not (totally) finite.In this case, a different topology was introduced on D in e.g., [8,16].In that topology, the projection mapping O is continuous by definition.

3. 1 . 2
Projection mapping and the topologies on D and P Consider an absorbing (at 0) MDP model for the initial distribution P 0 .Let the projection mapping O from P to D be defined by

X\{0}×Ac
(y, a)η(dy × da), This is one of the main reasons for studying D in the literature of MDPs.

Figure 1 :
Figure 1: Graphical representation of the MDP.