Markov decision processes with quasi-hyperbolic discounting

We study Markov decision processes with Borel state spaces under quasi-hyperbolic discounting. This type of discounting nicely models human behaviour, which is time-inconsistent in the long run. The decision maker has preferences changing in time. Therefore, the standard approach based on the Bellman optimality principle fails. Within a dynamic game-theoretic framework, we prove the existence of randomised stationary Markov perfect equilibria for a large class of Markov decision processes with transitions having a density function. We also show that randomisation can be restricted to two actions in every state of the process. Moreover, we prove that under some conditions, this equilibrium can be replaced by a deterministic one. For models with countable state spaces, we establish the existence of deterministic Markov perfect equilibria. Many examples are given to illustrate our results, including a portfolio selection model with quasi-hyperbolic discounting.


Introduction
The discounted utility approach in dynamic decision making has been used since the beginning of modern economic theory; see e.g. Samuelson [59]. It is based on the assumption that the discount rate is constant over time. In that way, it is possible to compare outcomes occurring at different times by discounting future utility by some constant factor. A decision maker using high discount rates exhibits more impatience than one with low discount rates. It should be noted, however, that there is growing evidence to think that standard (geometric) discounting is not adequate in many real life situations; see e.g. Ainslie [2]. When discounting is non-standard, the decision maker becomes time-inconsistent, that is, a policy chosen as optimal at the beginning of the decision process is no longer optimal if it is considered as a policy in the process from some later point in time onwards. It is said that the decision maker possesses changing time preferences or that his utilities change over time. For example, consider a consumption/saving problem in discrete time. Suppose that the decision maker plans to save a lot tomorrow, but as tomorrow comes, he reconsiders his previous decision and saves little. Consumption is becoming more important and he becomes impatient. In other words, he may redecide on his plans later on. This shows that the consumption/saving problem cannot be solved via the usual dynamic programming methods.
The idea of quasi-hyperbolic discounting used to capture the case of utilities changing in time can also be described as follows. Suppose that u t is a utility (or reward) to be received in period t ≥ 1. Then the total utility (reward) collected from period t onwards is U t := u t + α(βu t+1 + β 2 u t+2 + · · · ), (1.1) where α > 0 is called the short-run discount factor and β ∈ (0, 1) is called the longrun discount factor. If α = 1, then (1.1) reduces to the standard discounted utility. If the utility (1.1) was time-consistent, then we would have U t = u t + αβU t+1 for all t.
That is the case when α = 1. To observe that the Bellman principle may not be a right tool for constructing optimal policies in models with utilities changing over time, the reader is referred to simple examples in e.g. [15,38] and Sect. 4 in this paper. Dynamic inconsistent behaviour was first formalised by Strotz [67]. Further works by Pollak [56], Phelps and Pollak [55], Peleg and Yaari [54] and others on this issue suggest that policies optimal in some sense for the decision maker in models with quasi-hyperbolic discounting can be constructed as Nash equilibria in a sequential game played by different temporal selves. Each player (self ) acts only once and takes into account both his instantaneous utility (reward) and a sequence of utilities (received by the players in subsequent periods) discounted by the given coefficient β. Within such a framework, the most commonly used solution concept is that of subgame perfect equilibrium in Markov strategies. Phelps and Pollak [55] considered a deterministic model of economic growth and using discounting with α and β as in (1.1), they introduced a multigenerational game. A generation in their game formulation is a self in the model mentioned above.
Nowadays, dynamic inconsistency plays an increasingly important role in many fields. For instance, we wish to mention the papers of Balbus et al. [6] or Har-ris and Laibson [27] that deal with consumption/investment problems with a onedimensional state space. Moreover, Barro [10], Ekeland and Pirvu [24], Haurie [28], Laibson [39] considered interesting applications of time-inconsistency to neoclassical growth theory, portfolio management, global climate change problems and macroeconomic theory, respectively. The reader is also referred to other works studying various related control problems for models with a general state space; see Björk and Murgoci [15], Björk et al. [14], Christensen and Lindensjö [20], Jaśkiewicz and Nowak [35] or Nowak [51].
A seminal paper of Shapley [63] on discounted zero-sum stochastic games is a first study of Markov decision processes over an infinite time horizon. Alj and Haurie [4] extended the finite state space model of Shapley to quasi-hyperbolic discounting. They used an intergenerational dynamic game formulation of Phelps and Pollak [55] and proved that any finite horizon game has an equilibrium in Markovian strategies and each infinite horizon game has a stationary Markov perfect equilibrium. The former result is based on a dynamic-programming-like algorithm and the latter is proved using a fixed point argument. The stochastic variants of the intergenerational game related to that of Alj and Haurie [4] with a Borel state space and compact metric action spaces were further examined in Jaśkiewicz and Nowak [35] and Nowak [51]. For instance, Jaśkiewicz and Nowak [35] studied a model in which generations are risk-averse and obtained a stationary Markov perfect equilibrium in pure strategies making use of the Dvoretzky-Wald-Wolfowitz theorem. This result, however, is valid for transitions which are convex combinations of finitely many atomless measures on the state space with coefficients that depend on the state-action pairs. Although as already mentioned time-inconsistent preferences in various control models were recently studied by Björk and Murgoci [15], Björk et al. [14], Christensen and Lindensjö [20], these papers, in contrast to our present work and works of Alj and Haurie [4], Jaśkiewicz and Nowak [35], Nowak [51], examine neither stationary Markov perfect equilibria nor fixed points of best-response mappings.
Markov decision processes have many applications to economic dynamics, finance, insurance or monetary economics. The reader is referred to the books of Bäuerle and Rieder [11,Chap. 9], Stachurski [65,, Stokey et al. [66,Chaps. 10 and 13], where prominent and representative examples are given. In the present paper, we consider Markov decision processes with a Borel state space and quasi-hyperbolic discounting and the Markov perfect equilibrium as a basic solution concept. Our contribution is four-fold. First, we show that there exists a stationary Markov perfect equilibrium if the transition probability is norm-continuous in actions and has a density function. This result (Theorem 3.2) can be regarded as an improvement and an extension of the basic theorem in Nowak [51], where an additional condition on the transition probability density is imposed. Furthermore, it turns out that the obtained stationary Markov perfect equilibrium can be supported in every state on at most two points from the action set (Theorem 3.4). Second, assuming in addition that the transitions are atomless and the Borel σ -field on the state space has no so-called conditional atoms with respect to the σ -field generated by the transition density functions, we apply Theorem 3.4 and a result of Dynkin and Evstigneev [23] to prove the existence of a deterministic stationary Markov perfect equilibrium (Theorem 3.5). Our third contribution establishes the existence of a deterministic Markov perfect equilibrium in decision processes with countably many states (Theorem 5.2). This result is subsequently used for Markov decision processes with Borel state spaces to obtain -equilibria by an approximation technique (Theorem 6.2). In Sect. 5, we provide an example of a Markov decision process with two states for which a deterministic stationary Markov perfect equilibrium does not exist, but there exists a deterministic non-stationary one. It is interesting to see that a randomised stationary Markov perfect equilibrium in this example can be dominated in terms of expected utilities (rewards) by a more sophisticated (in some sense) deterministic equilibrium.
Our main results for Markov decision processes with a continuum of states have certain implications for consumption/investment models with i.i.d. shocks. They complete the results obtained by Balbus et al. [6] and Harris and Laibson [27] for such models with atomless transitions. In Sect. 4, we discuss many examples arising from economic theory, macroeconomics or monetary economics. They highlight many issues in the area of quasi-hyperbolic discounting in dynamic decision processes, including some open problems. We also present a closed-form solution to a portfolio selection model originally studied with geometric discounting by Samuelson [60] (see Example 4.1 and Remark 4.2).
The paper is organised as follows. In Sect. 2, we describe our model and define the notion of a stationary Markov perfect equilibrium. In Sect. 3, we state our main results with many comments on the main ideas behind them. Their formal proofs are postponed to Sect. 7. Section 4 is devoted to examples of stationary Markov perfect equilibria, some comments on the literature and open problems. In Sect. 5, we study deterministic Markov perfect equilibria in decision models with countably many states and show that a deterministic Markov perfect equilibrium need not exist even if the state space is finite. In Sect. 6, making use of an approximation of the original Markov decision process by models with countably many states, we establish the existence of -equilibria. Finally, Sect. 8 contains some concluding remarks.

The model and main solution concept
First we give some basic definitions and facts used in the description of our model. Let N be the set of all positive integers, R be the set of all real numbers and R + = [0, ∞). A Borel space, say X, is a nonempty Borel subset of a complete separable metric space. Let B(X) denote the σ -field of all Borel subsets of X and Pr(X) the space of all probability measures on B(X), endowed with the topology of weak convergence. This is the coarsest topology for which the functionals p → X ηdp are continuous for every bounded continuous function η : X → R.
A Borel transition probability from X to a Borel space Z is by definition a function γ : B(Z) × X → [0, 1] such that γ (B, ·) is Borel-measurable on X for each B ∈ B(Z) and γ (·, x) ∈ Pr(Z) for each x ∈ X. We write γ (B|x) for γ (B, x). It is well known that any Borel transition probability from X to Z can be viewed as a Borel mapping from X to Pr(Z); see Bertsekas and Shreve [13,Chap. 7].
Let S and A be Borel spaces and K a Borel subset of S × A. Moreover, assume that for each s ∈ S, the section We consider a Markov decision process characterised by the following objects: (i) S is a Borel state space.
(ii) A is a Borel action space and K ⊆ B(S × A) is the constraint set for the decision maker. The set A(s) defined in (2.1) is a nonempty σ -compact set of actions available in state s ∈ S. (Our main results are stated for models with compact action spaces, but in some examples, we only assume that the sets A(s) are σ -compact.) (iii) u : K → R is a bounded from above Borel instantaneous utility or reward function.
(iv) q is a Borel transition probability from K to S.
Let be the set of all Borel transition probabilities φ from S to A such that φ(A(s)|s) = 1 for each s ∈ S. Every φ ∈ can be viewed as a Borel mapping from S to Pr(A) (denoted also by φ) by setting φ(s)(·) for φ(·|s). Let F be the set of all Borel selectors of the correspondence s → A(s). By Brown and Purves [17,Theorem 1] In the decision model with quasi-hyperbolic discounting, we envision an individual decision maker as a sequence of autonomous temporal selves. The selves are indexed by period numbers t ∈ T := N. For each state s t ∈ S at the beginning of the tth period, self t chooses an action a t ∈ A(s t ) according to some probability distribution over the set A(s t ). A strategy for self t is a Borel transition probability ϕ t ∈ . Thenφ = (ϕ 1 , ϕ 2 , . . . ) with ϕ t ∈ for every t ∈ T is called a strategy profile of the selves or a Markov strategy of the decision maker. If ϕ t = φ for some φ ∈ and all t ∈ T , then such a strategy of the decision maker is called stationary or a stationary strategy profile of the selves. We often identify a constant sequenceφ = (φ, φ, . . . ) with φ ∈ .
By the Ionescu-Tulcea theorem (Neveu [48, Proposition V.1.1]), for any s t ∈ S andφ = (φ 1 , φ 2 , . . . ) ∈ ∞ , there exists a unique probability measure Pφ s t on the space (S × A) ∞ of all sequences of state-action pairs (starting at s t ) endowed with the product Borel σ -field. The symbol Eφ s t denotes the corresponding expectation operator.
The expected reward for self t is defined as This definition explains why β ∈ (0, 1) is a long-run discount factor and α > 0 is a short-run discount factor. Notice that the discount factor applied between periods t and t + 1 is αβ, and for α < 1, it is lower than the one used between consecutive dates in the future. This fact leads to time-inconsistency in preferences. Such preferences have been observed in many cases and individuals' behaviours; see for instance Krusell et al. [37], Krusell and Smith [38], Laibson [39], Phelps and Pollak [55] or Strotz [67]. Their axiomatic characterisation can be found in Montiel Olea and Strzalecki [47].
In Sects. 2-4, we consider solutions for Markov decision processes in stationary strategies. Therefore, we simplify the notation as follows. Suppose that the selves are going to use a stationary strategy profileφ = (φ, φ, . . . ) identified with φ ∈ . We use E φ s t for Eφ s t . Moreover, R t (φ)(s t ) defined in (2.2) is equal to R(φ)(s) given by with s = s t . In order to write this reward in a more friendly way, we define Let ν ∈ Pr(A(s)). We introduce the notations Moreover, for any s ∈ S and φ ∈ , we put Observe that now (2.3) takes the form Furthermore, for any s ∈ S and ν ∈ Pr(A(s)), we define For any a ∈ A(s), let us define where δ a is the Dirac measure at the point a. Assume that self t chooses a randomised action ν ∈ Pr(A(s)) in state s = s t . If all following selves are going to employ a strategy φ, then P (s, ν, φ) is the expected utility of self t is state s = s t . Definition 2.1 A stationary Markov perfect equilibrium is a φ * ∈ such that for every s ∈ S, we have sup ν∈Pr(A(s)) P (s, ν, φ * ) = P s, φ * (s), φ * = R(φ * )(s).
A stationary Markov perfect equilibrium φ * is called deterministic if φ * ∈ F . One can imagine that every self is a short-lived player in a non-cooperative game and acts only once. Such an interpretation is given by Alj and Haurie [4], Phelps and Pollak [55]. The payoff function of self t ∈ T is given by (2.3). Then a stationary Markov perfect equilibrium is a constant sequence (φ * , φ * , . . . ) (identified with φ * ∈ ) being a symmetric Nash equilibrium in this game. From Definition 2.1, it follows that this equilibrium is subgame perfect; see Osborne [53,Chap. 5.4]. The term Markov perfect equilibrium was introduced by Maskin and Tirole [45]. The strategies in a Markov perfect equilibrium have the Markov property of the lack of memory, meaning that each player's mixed action can be conditioned only on the state of the game. Moreover, the state can only encode payoff-relevant information. The strategy for the decision maker built from a Markov perfect equilibrium in the game is time-consistent, that is, no self (as time goes on) has an incentive to change his best response to equilibrium strategies of the following selves.

Existence of stationary Markov perfect equilibria
In this section, we state our main results on stationary equilibria and give many comments. The proofs are postponed to Sect. 7.

Basic assumptions and three equilibrium theorems
In order to formulate our results, we need the following additional assumptions.   Let p be any probability distribution on S such that p(s) > 0 for all s ∈ S. If we take into account that ρ(s, a, s ) = q(s |s, a)/p(s ), then by Scheffé's lemma, (C3.3) implies (C3.2) and we conclude the following fact. A related result for intergenerational games with finite state and action spaces was given in Alj and Haurie [ Our second main result allows to simplify the form of the equilibrium above. For the existence of a deterministic equilibrium, we need some additional assumptions.
Let μ be an atomless probability measure on B(S). Let G be a sub-σ -field of B(S). Following He and Sun [29], we say that D ∈ B(S) is a G-atom or a conditional atom under μ if μ(D) > 0 and for any D 1 ∈ B(S), there exists a set D 2 ∈ G such that Intuitively, this means that given the realisation of an event D, the σ -fields G and B(S) carry essentially the same information. The definition of a G-atom was used by Dynkin and Evstigneev [23] in their studies of the conditional expectation of correspondences. As noted by He and Sun [29], the definitions of a G-atom in their paper and the works of Dynkin and Evstigneev [23] as well as Jacobs [ [30,Example 1]. In the second case, let us consider a Borel-measurable partition (B j ) j ∈N of S, i.e., B j ∈ B(S) for each j ∈ N, S = j ∈N B j and B i ∩ B j = ∅ for i = j . By G we denote the σ -field generated by this partition. Let μ ∈ Pr(S) be atomless. Then B(S) has no G-atoms under μ on B(S). Usually, the notion of conditional atoms is applied to stochastic dynamic decision models or games with the product state space; see for instance Duggan [21] or He and Sun [29,30].  The following example shows that (C3.4) is not implied by (C3.2).
Remark 3.8 The assertion of Theorem 3.4 cannot be strengthened, that is, a deterministic stationary Markov perfect equilibrium need not exist. This is shown in Example 5.6 in Sect. 5. Let φ * ∈ be a stationary Markov perfect equilibrium. Assume that the support of φ * (·|s) is a connected subset of A(s) for each s ∈ S. The function a → P (s, a, φ * ) is continuous and hence has the Darboux property on the support of φ * (·|s). (Recall that the Darboux theorem says that every continuous function f : X → R on a compact connected space X has the property that for any x 1 , x 2 ∈ X with f (x 1 ) = f (x 2 ) and any y between f (x 1 ) and f (x 2 ), there exists x ∈ X such that f (x) = y.) This implies that for each s ∈ S, there exists some a s ∈ A(s) such that A simple modification of the proof of Theorem 3.4, using the above conclusion from Darboux's theorem, yields the existence of a deterministic stationary Markov perfect equilibrium. However, to check the mentioned connectedness condition, one has to know φ * .

Remark 3.9
The existence of deterministic stationary Markov perfect equilibria was proved for some classes of models with one-dimensional state and action spaces and atomless transitions by Harris and Laibson [27] and Balbus et al. [6]. The methods of proof used there do not work in models with many commodities (more general state space). Theorem 3.5 provides some sufficient conditions for the existence of deterministic equilibria in models with a general state space. It is inspired by the approach of He and Sun [29], who dealt with Nash equilibria in standard non-zerosum discounted stochastic games with general state spaces. However, we emphasise that He and Sun [29] do not prove the existence of deterministic Markov perfect equilibria. Therefore, Theorem 3.5 is new.

Remark 3.10
Uniqueness of a stationary Markov perfect equilibrium can be proved only for specific models. Namely, Balbus et al. [8] showed that the stochastic optimal growth model with quasi-hyperbolic discounting with the state space S = [0, ∞) and concave transition and reward functions admits a unique solution. Within our framework, we may have to deal with multiple equilibria; see Example 5.6 in Sect. 5. It is rather well known that even in simple economic models, we may encounter multiple deterministic equilibria; see Krusell and Smith [38], Vieille and Weibull [69]. [15] analyse time-inconsistent stochastic Markov models in discrete time with finite and infinite horizons. Their approach embraces quasi-hyperbolic discounting as a special case. However, their objective is to provide an extension of the standard Bellman equation in the form of a system of nonlinear equations. In other words, assuming that an equilibrium point exists, they show that the corresponding equilibrium function must satisfy a system of nonlinear equations. They do not provide a proof of the existence of an equilibrium point. In Sect. 6, they even stress that "these issues [existence and uniqueness of equilibrium] are in fact quite complicated".

Some comments on the proofs and possible extensions
In the proof of Theorem 3.2, we consider a best-response correspondence defined on the quotient space p of p-equivalence classes of functions in endowed with the weak-star topology. The space p is compact and convex, and the correspondence has closed and convex values. The existence of a stationary Markov perfect equilibrium then follows from a standard fixed point argument. Note that F is not a convex subset of . Therefore, a similar method does not work for determining deterministic equilibria.
Ifφ ∈ is an equilibrium established in Theorem 3.2, then it is easy to see that for any bounded Borel function η : A(s) → R, there exist a 1 , a 2 in the support of the probability measureφ(·|s) and some ϑ ∈ [0, 1] such that Using this observation, we can get Borel mappings f, g ∈ F and a Borel function for all s ∈ S. These facts imply that φ * is a stationary Markov perfect equilibrium from the assertion of Theorem 3.4.
If the transition probability q is a convex combination of finitely many atomless probability measures on S with coefficients depending on (s, a) ∈ K, then a deterministic stationary equilibrium can be obtained by applying Theorem 3.4 and the elimination of randomisation method based on a version of Lyapunov's theorem [41] given by Dvoretzky et al. [22]. (This method was used by Jaśkiewicz and Nowak [35] in the study of some special cases of the model from the present paper.) In our case, we deal with an infinite family of atomless measures q(ds |s, a) = ρ(s, a, s )p(ds ) indexed by (s, a) ∈ K, and Lyapunov's theorem is not true for infinitely many measures; see Lyapunov [42]. Therefore, the existence of a deterministic stationary Markov perfect equilibrium under assumptions (C3.1) and (C3.2) with an atomless measure p is problematic. The additional assumption in Theorem 3.5 on the lack of G-atoms allows the elimination of randomisation in an equilibrium obtained in Theorem 3.4 (purification of φ * ) thanks to an extension of Lyapunov's theorem given by Dynkin and Evstigneev [23].

Remark 3.12
Discounted dynamic programming problems with an unbounded reward function u are usually studied using a so-called "bounding" or "weighted" Borel-measurable function ω : S → [1, ∞) satisfying the following conditions: for all (s, a) ∈ K, |u(s, a)| ≤ ω(s) and S ω(s )q(ds |s, a) ≤βω(s) for someβ > 0 witĥ ββ < 1. For details, the reader is referred to Hernández-Lerma and Lasserre [32,Sect. 8.3] or to Wessels [71]. It is quite easy to see that the n-stage expected discounted utility If we assume in addition that for any s ∈ S and a n → a 0 in A(s) as n → ∞, we have lim n→∞ S |ρ(s, a n , s ) − ρ(s, a 0 , s )|ω(s )p(ds ) = 0, then the main results given in this section can be extended to the more general case with an unbounded utility u. The proofs given in Sect. 7 need only some very simple adaptation. In this way, we can apply our results to examples where having u unbounded is very natural.

Examples and an overview of selected literature
In this section, we give a number of examples with several comments. Some of them lie in our theoretical framework from Sects. 2 and 3.
In other examples (taken from the literature), our assumptions (e.g. compactness of the action spaces) are not satisfied. However, the examples have solutions in closed form that show the difference between Markov perfect equilibria in models with quasi-hyperbolic discounting and solutions obtained in models with standard discounting via a dynamic programming principle.
In 1969, Samuelson [60] published a seminal paper on portfolio selection and stochastic dynamic programming. He considered a finite-horizon model with a power utility function. His paper inspired many researchers to develop a modern theory of portfolio selection in discrete and continuous time; see Bäuerle and Rieder [11,Chap. 4], Bobryk and Stettner [16], Merton [46], Shreve and Soner [64] and the references cited therein. For instance, Bobryk and Stettner [16] extended the optimal portfolio selection model of Samuelson [60] to an infinite horizon with standard discounting and completely solved the cases with power and logarithmic utilities (see [16,Proposition 1]). Below we provide a solution for power utility with quasi-hyperbolic discounting.
Example 4. 1 We start with the portfolio selection problem of Samuelson [60] that can be viewed as a Markov decision process with S = R + and . Consider a financial market consisting of a risky asset and a risk-free asset. Assume that there are two investment possibilities: a stock with a random rate of return ε t in period t ∈ T and a bank account with a constant (riskless) rate of return r. Assume that (ε t ) is a sequence of i.i.d. random variables taking values in the interval [−1, ∞) and 1 + ε t has a probability distribution μ ε ∈ Pr(S). Moreover, r ≤ E[ε t ] < ∞ for every t ∈ T . Denote by s t and a t the capital (wealth) and the consumption, respectively, in period t ∈ T . Clearly, a t ∈ [0, s t ]. The remaining value s t − a t is invested. Let w t ∈ [0, 1] be the portfolio weight on the risky asset in period t ∈ T . We obviously have (a t , w t ) ∈ A(s t ). Starting with an initial wealth s 1 ∈ R + , we have the recursive formula Then the transition probability q is given by with 1 D the indicator function of the set D. The consumer has a reward (utility) function u that measures his satisfaction of consumption, that is, u(s, (a, w)) = u(a) for all (s, (a, w)) ∈ K. We assume that u(a) = a σ with σ ∈ (0, 1). Define Let w * ∈ [0, 1] be such that Assume that We are going to show that a deterministic stationary equilibrium can be found in the and w ∈ [0, 1] (linear consumption functions and constant portfolio weights). If Hence, .
Suppose that all future generations are going to use f * (s) = (cs, w * ). Then the current self t faces the optimisation problem is a deterministic stationary Markov perfect equilibrium and the expected payoff to each self t is If α = 1, then (c 1 s, w * ) is a solution to the dynamic programming portfolio problem with standard discounting; see Bobryk and Stettner [16,Proposition 1]. Clearly, we conclude from the above discussion that a decision maker who faces standard discounting consumes less than one with quasi-hyperbolic discounting. At the same time, he is willing to increase his investments.
The reason that the portfolio selection problem with quasi-hyperbolic discounting in the above example is solvable is that the instantaneous utility function u has the property u(a 1 a 2 ) = u(a 1 )u(a 2 ) for all a 1 , a 2 ≥ 0. Example 4.1 has also a simple analytical solution for the logarithmic utility case, i.e., when u(a) = ln a. Then u(a 1 a 2 ) = u(a 1 ) + u(a 2 ) for a 1 , a 2 > 0. For details, see Björk and Murgoci [15,Proposition 8.3].

Remark 4.2
The existence of a stationary Markov perfect equilibrium in the above portfolio selection model with quasi-hyperbolic discounting is an open problem if u is a general continuous and concave function. The methods used here and in Björk and Murgoci [15] are not adequate. The class F 0 ⊆ F is too small, because both consumption and portfolio weights may depend on the state variable. If u is bounded or the conditions described in Remark 3.12 are satisfied and (C3.2) holds, then Theorem 3.4 implies the existence of a simple (randomised) stationary Markov perfect equilibrium. However, to ensure that (C3.2) holds, we assume that the probability distribution μ ε of the random variables 1 + ε t has a continuous density function g with respect to some p ∈ Pr(S) and A(s) = [0, a(s)] × [w(s), 1] for all s ∈ S, where a and w are Borel functions on S such that 0 ≤ a(s) < s and 0 < w(s) ≤ 1 for each s > 0. The portfolio weight w must be in [w(s), 1] for any s > 0. This definition of A(s) says that there is an upper limit for consumption for any positive stock capital s. Note that taking a = s by any self would stop the process forever. On the other hand, choosing w = 0 in some period would remove the risky asset from the portfolio. We now show how to verify condition (C3.2).
Let s, s ∈ S and Since g is continuous and (a, w) ∈ [0, a(s)] × [w(s), 1], we obtain for the density function the formula The setÃ is closed in A(s). The function (a, w) → ρ(s, (a, w), s ) is continuous oñ A and also on A(s) \Ã. Suppose now that (a n , w n ) ∈ A(s) \Ã for all n ∈ N and (a n , By the continuity of the function g, it follows that lim n→∞ ρ s, (a n , w n ), s = This is sufficient to conclude that the function (a, w) → ρ(s, (a, w), s ) is lower semicontinuous on A(s). By Remark 3.1, condition (C3.2) is satisfied.
Below we give two examples from the literature where a deterministic stationary Markov perfect equilibrium can be calculated analytically. We also discuss modifications which look difficult to solve and present some possible applications of our main results on randomised and deterministic equilibria. Other examples of deterministic stationary Markov perfect equilibria in closed form can be found in Barro [10], Chatterjee and Eyigungor [19], Krusell et al. [37], Krusell and Smith [38], Laibson [39] and Luttmer and Mariotti [40]. Example 4. 3 We consider a stochastic optimal growth model known also as consumption/saving model with the state space S = R + × R + . An element of S is denoted by (k, z), where k is capital and z is an exogenous variable. If (k t , z t ) ∈ S is the state at date t and a t is the amount consumed by the decision maker (consumer), then the next state evolves according to the equations for every t ∈ T and -d ∈ [0, 1] is the depreciation rate, -p : R + → R + is a concave and increasing production function, π : R + × [κ 1 , κ 2 ] → R + is the law of motion of an exogenous variable with 0 < κ 1 < κ 2 , -(ε t ) is a sequence of i.i.d. random variables which have the common distribution m ∈ Pr([κ 1 , κ 2 ]), -(k 1 , z 1 ) ∈ S is a given initial state.
The satisfaction of the consumer is measured by his utility function u : R + → R and depends only on the consumed part, i.e., u(s, a) = u(a) for every (s, a) ∈ K. Observe that the transition probability q takes the form for every (k, z) ∈ S and a ∈ A(k, z). Assume that all future selves are going to use a stationary strategy φ ∈ . Then the current self faces the optimisation problem, which is independent of period t, for (k, z) ∈ S. The function J β (φ) satisfies the equation for every (k, z) ∈ S. The existence of a stationary Markov perfect equilibrium in the above example is an open problem. However, as noted by Maliar and Maliar [44], such a model with specific utility and probability functions admits a closed-form solution. Assume now that u(a) = ln a,d = 1,p(k) = k σ with σ ∈ (0, 1) and z = π(z, ξ ) = ξ . This model is similar to the one studied in Stokey et al. [ where (k, z) ∈ S and the function J β (f ) satisfies the equation For a justification of (4.4), see for instance Hernández-Lerma and Lasserre [31,Sect. 4.2]. It is not difficult to find that the consumption strategy is a deterministic stationary Markov perfect equilibrium. The form of f * α (k, z) does not depend on the probability distribution m. The phenomenon that the optimal consumption strategy is identical for stochastic and deterministic transitions in the above model was also discovered for standard discounting; see Acemoglu [1,Example 17.1]. More precisely, for geometric (standard) discounting, the optimal stationary strategy is This means that the decision maker who uses quasi-hyperbolic discounting plans to save less for the future at every stage when compared to the model with standard discounting. This follows from the fact that such a decision maker in period t is represented by self t, who pays less attention to all future selves by taking into account the discount factor αβ.
The function J β (f * ) is complicated and depends on the logarithmic moments of the random variable ε t . However, if m = δ 1 (the deterministic case) and z 1 = 1, then as noted by Maliar and Maliar [44], we obtain for k ∈ R + that for every t ∈ T and w is the wage per unit of labour, r is the riskless rate of return on asset holdings, υ ∈ [0, 1] and (ε t ) is a sequence of i.i.d. random variables with the normal distribution N (0, σ 2 ).
Caballero [18] considered the above model with standard discounting in the framework of monetary economics and noticed that it can be solved analytically if the utility function of the consumer is exponential. Next, Maliar and Maliar [43] provided a closed-form solution for quasi-hyperbolic discounting. In fact, Caballero [18] analysed the more general case in which ( t ) is an ARMA process. As in Maliar and Maliar [43], we assume that The parameter θ is the individual's risk coefficient and reflects his risk attitude. Observe that the transition probability q has the form for (b, ) ∈ S and a ∈ A. Here, g is the density function of the normal distribution N (0, σ 2 ). Assume that all future selves apply f ∈ F . Then the current self faces the maximisation problem where (b, ) ∈ S. According to Maliar and Maliar [43,Proposition 1], the deterministic stationary Markov perfect equilibrium is of the form Moreover, Obviously, the functionf * is affine in the variables b and . From the above formula forf * , it follows that a decision maker with discount factors α < 1 and β ∈ (0, 1) has the identical consumption strategy as a consumer in the standard discounted decision model with short-run discount factorα = 1 and long-run discount factor β = β(1+αr) 1+r < β. That is becauseβ(1 + r) = β(1 + αr). We also observe that larger values of α and/or β imply a lower amount of consumption. Examples 4.3 and 4.4 require some comments. First, we do not claim that the derived equilibria are unique. It very often occurs that except for a smooth equilibrium, there may exist equilibria with discontinuous strategies. This fact was reported among others by Krusell and Smith [38]. Moreover, Chatterjee and Eyigungor [19] proved that in a dynamic consumer model with constant relative risk aversion preferences, equilibrium strategies must be discontinuous if the decision maker's net wealth cannot fall below a strictly positive value. Second, the numerical computation of a stationary Markov perfect equilibrium is difficult. Certain numerical methods based on the first order condition and the Euler equation are analysed by Maliar and Maliar [44]. Other specific numerical examples are provided in Chatterjee and Eyigungor [19].
In Example 4.4 and also in the model of Caballero [18], the state and action spaces are unbounded. This actually simplifies calculations. Additional assumptions such as nonnegativity or/and boundedness of variables may lead to discontinuous equilibria. For some states, the solutions may lie on the boundary of the constraint sets.
The next example is a modification of the previous one and examines a model with compact action spaces and nonnegative state variables. This modification, although natural, impedes finding a tractable solution. However, within such a framework, Theorem 3.4 is in force.
for every t ∈ T . Moreover, (ξ t+1 ) and (ζ t+1 ) are sequences of nonnegative i.i.d. random variables having continuous densities g 1 and g 2 (respectively) with respect to Lebesgue measure on R + . It is also assumed that ξ t+1 and ζ t+1 are independent for each t ∈ T . Recall that the payoff function is u(s, a) = u(a) = − 1 θ exp(−θa) with a ∈ A(b, ), s = (b, ) ∈ S and the individual's risk coefficient θ > 0. The transition probability q has now the form In order to apply Theorem 3.4, it is sufficient to see that for every s = (b, ) ∈ S and a ∈ A(s), q(·|s, a) is absolutely continuous with respect to some probability measure p on B(S) and that condition (C3.2) is satisfied. We first determine the density function x → ρ 1 (s, a, x) of the random variable w + (1 + r)b − a + ξ t+1 . From the continuity of g 1 , it follows that It is obvious that Let s = (b, ) and x be fixed. Assume that a n → a 0 in A(s) as n → ∞. IfÃ = ∅, then ρ 1 (s, a n , x) = 0 → ρ 1 (s, a 0 , x) = 0. Suppose now thatÃ = ∅. If a 0 / ∈Ã, then lim inf n→∞ ρ 1 (s, a n , x) ≥ ρ 1 (s, a 0 , x) = 0. If a 0 ∈Ã, then a n ∈Ã for all sufficiently large n, and therefore lim n→∞ ρ 1 (s, a n , x) = ρ 1 (s, a 0 , x) by the continuity of g 1 . We have shown that the function a → ρ 1 (s, a, x) is lower semicontinuous on A(s) for each s ∈ S and x ≥ 0.
Let p 1 be the probability measure on R + with the density 1/(1 + x) 2 . Let us define Then for each set D ∈ B(S), This implies that (x, z) → ρ((b, ), a, (x, z)) is a density for q (·|(b, ), a) with respect to the probability measure p 1 ⊗ p 1 on S. Obviously, ρ((b, ), a, (x, z)) is lower semicontinuous in a ∈ A(b, ). By Remark 3.1, condition (C3.2) is satisfied. The instantaneous utility function u is bounded and (C3.1) holds as well. As a consequence, Theorem 3.4 applies. The existence of a deterministic stationary Markov perfect equilibrium in this model is an open problem.

Example 4.6
Assume that there is a single good (called also a renewable resource) that can be used in each period for consumption or productive investment. The set of all resource stocks is S = R + . Self t observes the current stock s t ∈ S and chooses a t ∈ A(s t ) := [0, s t ] for consumption. The remaining part y t = s t − a t is left as an investment for future selves. The next self's inheritance or endowment is determined by a transition probability q 0 from S to S (stochastic production function) which depends on y t ∈ A(s t ) ⊆ S, i.e., q(·|s t , a t ) = q 0 (·|y t ). We assume that where π : R + → R + is a continuous increasing function with π(0) = 0 and (ε t ) is a sequence of i.i.d. nonnegative random variables having no atoms. In Harris and Laibson [27], the function π is linear and the probability distribution of ε t has a twice differentiable density function with respect to Lebesgue measure on S. Moreover, Harris and Laibson [27] and Balbus et al. [6] assume that u(s, a) = u(a) for all (s, a) ∈ K, i.e., the instantaneous utility function only depends on the consumption in state s ∈ S. Moreover, u is increasing and strictly concave. Using an additional assumption on the relative risk aversion coefficient for u and some other technical conditions, Harris and Laibson [27] establish the existence of a deterministic stationary Markov perfect equilibrium in the class of functions with locally bounded variation. They also study a stochastic version of the Euler equation associated with this model. Balbus et al. [6] consider a more general case. They only assume that q 0 is atomless and weakly continuous in investment. Under these assumptions, they show that there exists a deterministic stationary Markov perfect equilibrium in some special class  is not concave in general, even if all future selves use a Lipschitz-continuous strategy f ∈ F . The best reply (if one exists) is very often a discontinuous function. Therefore, Balbus et al. [6] and Harris and Laibson [27] consider a class of discontinuous strategies. The assumption that the transitions are atomless is very helpful. However, the techniques used in the above papers do not work for consumption/investment models with multidimensional state spaces (many commodities). The problem is not solved either if we allow the transition probabilities to possess some atoms. In particular, the existence of a stationary Markov perfect equilibrium is an open problem if the transitions are deterministic.
If the state and action spaces are one-dimensional and the transition probability is a convex combination of finitely many probability measures on the state space with coefficients depending on the state-action pairs, then under some stochastic dominance condition, one can prove the existence of a deterministic stationary Markov perfect equilibrium in the class of Lipschitz-continuous strategies with Lipschitz constant one; see Balbus et al. [5,Sect. 3.2]. Below, we give an example of such a model for which it is possible to find a solution in closed form.
It turns out that in this example, there exists a deterministic stationary Markov perfect equilibrium in the class of linear functions. Consider the subclass F 0 ⊆ F such that f ∈ F 0 if f (s) = cs for some constant c ∈ [0, 1] and all s ∈ S. Using the above equations and assuming that the equilibrium f * is in F 0 and that J β (f * )(s) = Cs σ for s ∈ S and some constant C > 0, we obtain for C the equation Clearly, a deterministic stationary Markov perfect equilibrium is f * (s) =cs. For instance, if β = 0.9, α = σ = 0.5, then C = 1.17851 andc = 0.888889. If on the other hand β = 0.9, α = 1 and σ = 0.5, then C = 1.25 andc = 0.64. Similarly as in Example 4.3, a decision maker with quasi-hyperbolic discounting saves less for the future and prefers to consume more in each period than a decision maker with standard discounting. Moreover, we note that the greater β, the smaller the amount that is consumed. If y σ is replaced by a concave increasing function η(y) with range η(S) ⊆ [0, 1] and if u is strictly concave and increasing, then from Balbus et al. [5,Theorem 2], we know that a Lipschitz equilibrium exists in the example with the transition function but the calculations look very difficult.
We close this section with an application of Theorem 3.5.

Example 4.9
We now consider a model with a transition probability whose conditional density functions with respect to some atomless probability measure generate a σ -field G such that the original σ -field on the state space has no G-atoms. Usually, for such examples, the state space is represented as S := Z × Y , where Z and Y are complete separable metric spaces with their Borel σ -fields B(Z) and B(Y ), respectively. The space S is endowed with the product σ -field. Consider a controller (decision maker) of a certain production process, whose state s ∈ S consists of two coordinates s = (z, y), where z ∈ Z is a capital stock and y ∈ Y is a noise component determining for instance specific technological shocks. Assume that in each period, the controller needs to make a decision a = (a 1 , . . . , a m ) ∈ A on the intensities for m different production processes. Here, A is a compact subset of R m + and A(s) = A for every s ∈ S. Given the current state (z, y) and an action profile a, the transition law q is determined by Here, q Z (·|s, a) denotes the marginal of q(·|s, a) on Z; additionally, q Z (·|s, a) is absolutely continuous with respect to some κ ∈ Pr(Z) for every (s, a) ∈ S × A; it is assumed that the corresponding Radon-Nikodým derivative ρ(s, a, ·) is such that a → ρ(s, a, z ) is continuous on A for every s ∈ S, z ∈ Z; -λ ∈ Pr(Y ) is atomless. Hence, s = (z, y), where z is influenced by the action of the controller and y is a technological shock. Define p := κ ⊗ λ and observe that Let u : S × A → R be a one-period bounded reward function. The controller wishes to find an equilibrium for the infinite-horizon problem with quasi-hyperbolic discounting. From Theorem 3.5, we conclude that there exists a deterministic stationary Markov perfect equilibrium for that problem. For further comments and possible structures of the reward functions, we refer the reader to Duggan [21] and references therein.
It should be mentioned that similar sets of states and transition laws were already considered in the area of standard stochastic games in the context of existence of randomised stationary Nash equilibria; see Duggan [21] and He and Sun [29,30].

Existence of deterministic non-stationary Markov perfect equilibria
In this section, we assume that S is a countable set. We shall prove that in a model with countably many states satisfying assumptions (C3.1) and (C3.3) there exists a deterministic Markov perfect equilibrium.
Let The expected utility of self t is defined as β τ −t−1 u(s τ , a τ ) .

Introducing the notation
we obtain that

f t (s t ) .
Furthermore, for any s ∈ S and a ∈ A(s), we set From this definition, it follows that a deterministic Markov perfect equilibrium is subgame perfect. Then we observe that for any t ∈ T , s = s t+1 ∈ S, we have β n−t q f t+1 · · · q f n (u f n+1 )(s). and we can assume without loss of generality that (b n ) is convergent to some b 0 ∈ A(s). Then Royden [58,Proposition 18] yields that ξ n (s) → 0 as n → ∞. Hence the proof is complete.
The set F of all selectors of the correspondence s → A(s) (see Sect. 2) can be viewed as the product space s∈S A(s). Then F with the product topology is by Tychonoff's theorem a compact metric space, and so is F ∞ . We point (C3.1) and (C3.3), the mappinḡ

Lemma 5.4 Under assumptions
is continuous on F ∞ for all s ∈ S and t ∈ T .
Remark 5.5 (a) Theorem 5.2 is new and we apply in its proof a backward induction method similar to that used in standard dynamic programming (see for instance Hernández-Lerma and Lasserre [31,Sect. 3.2] or Puterman [57,Sect. 4.5]) or finitehorizon models with quasi-hyperbolic discounting (see Alj and Haurie [4], Bernheim and Ray [12] or Goldman [26]). In our setup, this method determines a sequence of strategies in F ∞ which need not be convergent and may have many accumulation points. In Example 5.6 below, we present two accumulation points that give different deterministic Markov perfect equilibria.
(b) It is worth noticing that the existence proof of Theorem 5.2 is not based on any fixed point argument. A similar "iterative method" gives in Balbus et al. [7] a nonstationary deterministic Markov perfect equilibrium in a model with quasi-hyperbolic discounting and one-dimensional state and action spaces satisfying some additional conditions similar to that in Balbus et al. [6] and Harris and Laibson [27]. The deterministic equilibrium obtained in [7] belongs to the intersection of a decreasing family of closed sets of strategies for the decision maker. The methods used in this section and in [7] have many limitations in the sense that they work only under some special conditions on the primitive data of the model.
(c) Theorem 5.2 can be extended (with minor changes in the proof) to the unbounded reward case discussed in Remark 3.12.

An example with a finite state space
As noted in Sect. 3, a stationary Markov perfect equilibrium is a fixed point of a bestresponse correspondence defined on a compact convex set. The set F of deterministic strategies is not convex. Thus an argument based on fixed point theorems is difficult to apply. This suggests that a deterministic stationary Markov perfect equilibrium need not exist even in simple models. Below we provide an example of a Markov decision process with finite state and action spaces in which a deterministic stationary Markov perfect equilibrium does not exist. This example also shows two different deterministic Markov perfect equilibria being two accumulation points of a sequence of strategies as in the proof of Theorem 5.2.  a and g(1) = b. For simplicity, we apply standard matrix/vector notation. Any function w : S → R is written as the column vector (w(1), w(2)) T . Transition probabilities are given by stochastic matrices. For any φ ∈ , the function u(·, φ(·)) is denoted by the column payoff vector U φ . By Q φ , we denote the transition probability matrix induced by φ.
The discounted expected reward vector over the infinite horizon under fg is where (Q f Q g ) 0 = I . Therefore, we have . Suppose that the selves following self t are going to employ the strategy profile fg. Then the rewards for self t using f or g are .
In state s = 1, it is better for self t to use g. In state s = 2, the rewards are same. Assuming that the selves following self t are going to apply gf , we obtain for self t the rewards .
In this setup, it is better for self t to use in state s = 1 the strategy f . From these calculations, we conclude that both profiles fg and gf are deterministic Markov perfect equilibria. It is interesting to note that in both cases, the Markov equilibria fg and gf give higher rewards than in the stationary randomised equilibrium obtained above.
Moreover, if the Markov decision process starts in state s = 1, the equilibrium profile fg is more advantageous for the decision maker since 170/9 > 2843/153. For the initial state s = 2, the profile gf is better since 289/9 > 4753/153.

Approximate deterministic Markov perfect equilibria in Borel state space models
It is well known that if p is atomless, then the set of all p-equivalence classes of mappings in F is dense in p ; see Warga [70,Theorem IV.3.10]. Therefore, the limit in the weak-star topology on p of a sequence of deterministic strategies may be a randomised strategy; see Elliott et al. [25,Example 3.16]. This implies that the approach taken in the proof of Theorem 5.2 for a countable state space cannot be extended to a model with a Borel set of states. However, similarly as in stochastic games (see Nowak [50] and Whitt [72]), one can think about an approximation of a Markov decision process on a Borel state space by processes with countably many states. In other words, for a Borel state space, deterministic Markov perfect equilibria in the approximating model can be used to obtain deterministic Markov -equilibria in the original model. t∈T ∈ F ∞ such that for every s ∈ S and t ∈ T , we have sup a∈A P t (s, a,f t+1 ) ≤ P t s, f t (s),f t+1 + .
Let C(A) denote the Banach space of all continuous functions on A endowed with the supremum norm · c . In this section, we study the Borel state space decision model, denoted by M, satisfying the following assumption: (C6.1) A is a compact metric space, A(s) = A for all s ∈ S, (C3.1) holds and the transition q has a Borel density function ρ : S × A × S → R with respect to p satisfying (C3.4) and such that ρ(s, ·, s ) ∈ C(A) for all s, s ∈ S. Theorem 6.2 below is new for decision models with quasi-hyperbolic discounting. It is based on Theorem 5.2 and modified arguments from the works of Nowak [50] and Whitt [72] on stochastic games. The result cannot be obtained by an approximation of the original model by models with finite horizons. for B ∈ B(S) and s ∈ S j , and the reward functionũ isũ(s, a) = u j (a) for s ∈ S j and a ∈ A. We denote the Markov decision process with u j and ρ j satisfying (6.1) by M δ . Letf = (f 1 , f 2 , . . .) be an arbitrary sequence in F ∞ . We define the corresponding reward functions in M δ as follows. For s ∈ S j , j ∈ N o , and t ∈ T , we put Here,Ẽf This fact and condition (6.1) imply that for every s ∈ S, we have Consequently, for every t ∈ T , we have which means that for any f ∈ F , Here,P for s ∈ S j , j ∈ N o . Observe that the constant on the right-hand side of (6.2) is independent of t.
Clearly, the approximating model M δ induces a Markov decision process with countable state space N o and transition probabilityq(k|j, a) = S k ρ j (a, s )p(ds ), which is continuous on A for every j, k ∈ N o . This countable state space model will also be denoted by M δ . LetF be the space of piecewise constant functions f : S → A defined as follows. A functionf belongs toF if for each j ∈ N o there exists m j ∈ A such thatf (s) = m j for all s ∈ S j . Clearly,F ⊆ F . Hence, a deterministic Markov strategy for the decision maker in M δ is a sequence (f t ) ∈F ∞ .
Choose δ > 0 such that From Theorem 5.2, we conclude that there exists a deterministic Markov perfect equilibriumḡ = (g t ) ∈F ∞ in the model M δ . We claim thatḡ is an -equilibrium in the model M. Fix an arbitrary function f ∈ F . From (6.2) and the definition ofḡ, it follows that for every t ∈ T and s ∈ S This proves our claim.

Proofs of Theorems 3.2, 3.4 and 3.5
Let ϕ ∈ and v : S → R be a bounded Borel function. We define Let p be the space of p-equivalence classes of functions in . The elements of p are called Young measures. The space p is endowed with the weak-star topology. Since B(S) is countably generated, p is metrisable. Moreover, since every set A(s) is compact, p is a compact convex subset of a locally convex linear topological space. For a detailed discussion of these issues, see Balder [9,Theorem 1] Proof Take any h ∈ L 1 (S, p). We have The second term on the right-hand side in (7.2) converges to zero since ϕ n → * ϕ 0 as n → ∞. Observe that 3) The fact that M n (s) → 0 for every s ∈ S as n → ∞ follows from Nowak and Raghavan [52, proof of Lemma 7]. For the sake of completeness, we provide a short argument here. For any n ∈ N, we can find a n ∈ A(s) that attains the maximum in (7.3). Without loss of generality, we can assume that a n → a 0 ∈ A(s) as n → ∞. The first term on the right-hand side in (7.4) Proof Note that the series in (7.1) is convergent uniformly on × S. Obviously, r ϕ k → * r ϕ 0 in L ∞ (S, p) as k → ∞. By induction and Lemma 7.1, for each n ∈ N, q n ϕ k (u ϕ k ) → * q n ϕ 0 (u ϕ 0 ) in L ∞ (S, p) as k → ∞. Thus the lemma follows. For any ϕ ∈ p , we define the correspondence ϕ → BR p (ϕ) := ψ ∈ p : ψ(s) ∈ arg max ν∈Pr(A(s)) P (s, ν, ϕ) p-a.e. . Proof Due to measurable selection theorems (see Brown and Purves [17]), we have that BR p (ϕ) = ∅ for each ϕ ∈ p . Suppose that ϕ n → * ϕ 0 in p as n → ∞. Assume that φ n ∈ BR p (ϕ n ) for all n ∈ N. Since p is compact metric, we can assume without loss of generality that φ n → * φ 0 ∈ p as n → ∞. By Lemma 7.2, J β (ϕ n ) → * J β (ϕ 0 ) in L ∞ (S, p) as n → ∞. Using the arguments from the proof of Lemma 7.1, we can show that Since φ n ∈ BR p (ϕ n ) for all n ∈ N, we conclude from (7.6) and (7.7)    The closed-valued correspondence is weakly measurable because for any open set U ∈ A, the set of all s ∈ S with (s) ∩ U = ∅ is precisely {s ∈ S :φ(U |s) = 0} and belongs to B(S). Since A(s) is compact for each s ∈ S, also (s) is compact for each s ∈ S. Therefore has a Borel graph (see Himmelberg [33]). Let be the set of all (s, a 1 , a 2 , ) ∈ S × A × A × [0, 1] such that a 1 , a 2 ∈ (s) = supp(φ(·|s)) and Clearly, is a Borel set. Moreover, the set = P s, φ * (s), φ * . (7.9) From (7.9), we conclude that φ * is a stationary Markov perfect equilibrium with the required property that the support of every measure φ * (·|s) contains at most two points. This completes the proof.
Given a Borel function Y : S → R such that S |Y (s)|μ(ds) < ∞, we denote by E[Y |G] a version of the conditional expectation of Y with respect to the σ -field G.

Remark 7.6
In order to adapt the proofs in this section to the unbounded utility case discussed in Remark 3.12, one has to replace L ∞ (S, p) by the space of classes of functions v : S → R such that s → v(s) ω(s) is p-essentially bounded.

Concluding remarks
In this paper, we have studied a fairly general class of time-inconsistent Markov decision processes with a Borel state space. Using quasi-hyperbolic discounting and the game-theoretic formulation as for instance in Balbus et al. [6], Harris and Laibson [27], Pelag and Yaari [54], Phelps and Pollak [55] or Pollak [56], we have established the existence of a stationary Markov perfect equilibrium in models with transitions having a density function. In order to obtain a stationary equilibrium, we have used a fixed point argument. More importantly, we have shown that a stationary Markov equilibrium may be simplified in the sense that all selves can randomise their choices over at most two pure actions in each state. The existence of a deterministic stationary equilibrium requires some additional assumptions on an atomless transition probability. The dynamic-programming-like algorithm used in Sect. 5 for Markov decision processes with countably many states produces a sequence (f n ) of strategies having a subsequence converging to a deterministic Markov perfect equilibrium. The sequence (f n ) itself need not be convergent (see Example 5.6). This non-stationary equilibrium may have interesting properties. Namely, it can dominate (in the sense of expected utilities) the randomised stationary one. We have also shown by a suitable approximation that -equilibria in deterministic Markovian strategies exist in some models with a Borel state space. We should like to emphasise that an analysis of optimality (equilibria) in dynamic decision models under quasi-hyperbolic discounting cannot be done using the Bellman optimality principle. An extensive discussion of this issue can be found in Björk and Murgoci [15], Krusell and Smith [38], Maliar and Maliar [43,44]. The examples given in this paper also confirm this statement. Therefore, we apply game-theoretic tools and a fixed point theorem. However, as noted by Maliar and Maliar [44], numerical calculations of stationary Markov perfect equilibria are complicated even in simple cases where closed-form (analytical) solutions are already known. The question of existence of deterministic equilibria in different types of models with a general state space remains open. Here, we have solved this problem for some important subclasses of decision processes.
As indicated earlier, studying Markov perfect equilibria in Markov decision processes with quasi-hyperbolic discounting has some relevance to macroeconomics, portfolio management or finance. We wish to point out in conclusion that Theorems 3.4 and 3.5 for Markov decision processes with a continuum of states extend and complete the results obtained by Balbus et al. [6] and Harris and Laibson [27] for consumption/investment models with atomless transitions. In Theorem 3.4, the transitions may have some atoms. Our results can be applied to various Markov decision processes with a multidimensional state space.