Existence of Nash equilibria in stochastic games of resource extraction with risk-sensitive players

We consider a two-person stochastic game of resource extraction. It is assumed that players have identical preferences. A novelty relies on the fact that each player is equipped with the same risk coefficient and calculates his discounted utility in the infinite time horizon in a recursive way by applying the entropic risk measure parametrized by this risk coefficient. Under two alternative sets of assumptions, we prove the existence of a symmetric stationary Markov perfect equilibrium.


Introduction
A common assumption used in Markov decision processes as well as stochastic games is that the decision makers have the preferences represented by an overall utility parametrized by an expectation operator with respect to the current information. More precisely, if u is an instantaneous utility of the agent and c t is the consumption level in period t, then the discounted lifetime utility V t from period t onwards is defined in a recursive way as B Łukasz Balbus l.balbus@wmie.uz.zgora.pl Hubert Asienkiewicz h.asienkiewicz@wmie.uz.zgora.pl where β ∈ (0, 1) is a discount factor and E t is the expectation operator with respect to the information in period t. However, taking only an expectation of V t+1 means that the agent is risk neutral to future discounted utility. In real life, this assumption is very often violated. For example, in an optimal growth model, the agent may have a higher risk aversion, which generates precautionary saving. Therefore, we propose to equip the agent with a constant absolute risk aversion coefficient, say γ > 0, and assume that he/she uses the entropic risk measure, known also as certainty equivalent for the exponential function. In other words, the lifetime utility of the agent is defined now asṼ Within this framework, the agent is risk averse in future utilityṼ t+1 , in addition to being risk averse in future consumption levels c t+1 , c t+2 , . . . . The latter risk attitude is reflected in the concave function u of the agent. According to the properties of the entropic risk measure listed in Sect. 3, the agent takes into account not only the expected value of the future lifetime utility, but also all further moments with appropriate weights (see also Section 3.4 in Bäuerle and Rieder (2011)). These preferences have drawn attention of many authors. For instance, Hansen and Sargent (1995) applied them to a linear quadratic Gaussian control model, and Weil (1993) used them to examine precautionary savings and permanent income hypothesis. Moreover, these preferences found applications in the problems of Pareto optimal allocations (Anderson 2005) as well as in the study of Markov decision processes (Asienkiewicz and Jaśkiewicz 2017) or in one-sector optimal growth model with an unbounded felicity function (Bäuerle and Jaśkiewicz (2018)) . As argued by Hansen and Sargent (1995) the preferences in (1) are also attractive, because they can viewed as the robustness preferences. In this context, γ denotes the degree of robustness of the agent. This fact is a consequence of the robust representation of the entropic risk measure via the relative entropy as a penalty function, see Chapter 4 in Föllmer and Schied (2001).
In this paper, we study a strategic version of the discrete-time one-sector optimal growth model. Specifically, we deal with two players who own natural resource and they consume certain amount of the available stock in each time period. We assume that each player possesses the same risk coefficient γ and the same felicity function. Moreover, each player defines, using the aforementioned risk measure, his non-expected discounted utility. Our objective is to prove the existence of a symmetric Nash equilibrium in non-randomized strategies. Levhari and Mirman in their seminal paper Levhari and Mirman (1980) studied such a strategic optimal growth model with the same logarithmic felicity functions for each agent and the deterministic Cobb-Douglas production function. Their model has been extended in Sundaram (1989) for arbitrary production and felicity functions. Further generalizations to stochastic production functions were reported in Majumdar and Sundaram (1991), Dutta and Sundaram (1992) and Jaśkiewicz and Nowak (2018a). Other models of capital accumulation or resource extraction games with risk-neutral agents can be found in Jaśkiewicz and Nowak (2018b) and Balbus et al. (2016). Moreover, it is worth mentioning that there exist some iterative procedures (under special conditions) for finding Nash equilibria in such games, which were developed in Balbus and Nowak (2004) and Szajowski (2006). Finally, we wish to stress out that the games with risk-sensitive players have already been examined in the literature, but not with the non-expected discounted payoffs, see for instance Bäuerle and Rieder (2017), Jaśkiewicz and Nowak (2014), Klompstra (2000) and the references cited therein. Namely Bäuerle and Rieder (2017) dealt with zero-sum stochastic games, where the players take the expectation of the exponential function of accumulated discounted payoffs. Such an approach leads to a non-stationary model. Klompstra (2000), on the other hand, studied Nash equilibria for a two-person non-zero-sum game with a quadratic-exponential cost criterion, whilst in Jaśkiewicz and Nowak (2014) the authors treated intergenerational models with risk-sensitive generations. Finally, Başar (1999) dealt with risk-sensitive players playing a differential game. Therefore, to the best of our knowledge, this work is the first which studies recursive utilities in dynamic games.
To show the existence of an equilibrium, we need to accept some conditions on the felicity function and the transition probabilities. Our assumptions are borrowed from Balbus et al. (2015a) and Jaśkiewicz and Nowak (2018a). Namely we present two alternative sets of conditions. We assume either non-atomic transition probabilities or transition probabilities that allow atoms and embrace purely deterministic case. These assumptions allow us to prove the existence of an equilibrium in the class of stationary Markov strategies.
The paper is organized as follows. Section 2 is devoted to a model description. In Sect. 3, we carefully define a non-expected discounted utility in the infinite time horizon. The assumptions and the main result are formulated in Sect. 4, whereas Sect. 5 contains the proof. Examples are placed in Sect. 6.

The model
Put R + = [0, +∞). Consider a two-person stochastic game with the following objects: (i) S = R + is the state space, i.e., the space of available resource stocks; (ii) A i (s) = [0, s] is the space of actions available for player i ∈ {1, 2}, when the current resource stock is s ∈ S; (iii) u i : S × S × S → R + is a felicity function for player i ∈ {1, 2}; we assume that for every s ∈ S, a ∈ A 1 (s) and b ∈ A 2 (s), u 1 (s, a, b) = u(a) and u 2 (s, a, b) = u(b), where u : S → R + is a temporal utility for both agents; note that the utility for player 1 depends only on his/her consumption; the same remark applies to agent 2; (iv) q(·|s − a − b) is a Borel measurable transition probability on S for the given feasible pair of actions (a, b) ∈ A 1 (s) × A 2 (s), a + b ≤ s and the current resource stock s ∈ S; (v) we define (vi) γ > 0 is a risk coefficient; (vii) β ∈ (0, 1) is a discount factor.
We assume that u(s) ≤ d for every s ∈ S and some constant d > 0. In each period, the both agents observe the state s ∈ S and simultaneously choose their actions (a, b) ∈ A 1 (s) × A 2 (s) provided that the actions are feasible, i.e., (a, b) ∈ D(s). Immediately, player 1 enjoys the utility u(a), whereas player 2 enjoys u(b). The next state of the game s has a distribution q(·|s − a − b). If the pair of actions (a, b) is infeasible in state s, then the players choose again their actions. Therefore, we restrict our attention only to strategies generating feasible action pairs during the play. Next, we define a history of the game as follows: where s k ∈ S, a k + b k ≤ s k for all k = 1, ..., t. Let H t be a set of all histories up to tth step. We endow H t with a natural product topology. We shall consider only pure strategies.
Definition 1 A strategy π for player 1 is a sequence (π t ) ∞ t=1 such that each π t is a Borel measurable mapping from the history space to the space of actions available to player 1. The set of all strategies for player 1 is denoted by Π . Similarly, we define a strategy σ for player 2 and denote the set of all his/her strategies by Σ.
Furthermore, we introduce the following set of functions Definition 2 A stationary Markov strategy for player 1 is a sequence (π t ) ∞ t=1 such that π t = φ for all t ∈ N and some φ ∈ F 1 . Analogously, we define a stationary strategy for player 2 as a sequence of (σ t ) ∞ t=1 such that σ t =φ for all t ∈ N and somê φ ∈ F 2 . Further, we shall identify a stationary Markov strategy with the element of the sequence.

Non-expectedˇ-discounted utility function
In this section, we define the non-expected utilities for the players. We assume that each player is equipped with the risk coefficient γ > 0. Before giving a formal definition of the discounted utility in the infinite time horizon for each player, we introduce the notion of the entropic risk measure. Let (Ω, F, P) be a probability space and let X ∈ L ∞ (Ω, F, P) be a random payoff. Then the entropic risk measure is defined as follows: Let X and Y be random variables from L ∞ (Ω, F, P). Then ρ(·) satisfies following properties: , the consequence of Jensen's inequality.
Using the Taylor expansion for the exponential and logarithmic functions, for γ sufficiently close to 0, we obtain the following approximation: It means that the risk-sensitive player, when calculating his random payoff, takes into account not only the expected value of this random payoff but also its variance.
Formula (2) is also known in the literature as a certainty equivalent of the exponential function Weil (1993). For further properties of ρ, the reader is referred to Föllmer and Schied (2001).
By properties (P1) and (P3), we have that Next, for any v t+1 ∈ B(H t+1 ), we define the operator L i π t ,σ t for player i as follows: Further, we define an N -stage total discounted utility for player i by where 0 is a function that assigns 0 for any argument. For instance, for player 1 and stage 2 we have Similarly, we can define U 2 2 (s, π, σ ) for player 2. From the monotonicity of ρ, the sequence U i N (s, π, σ ) N ∈N is non-decreasing and bounded from below by 0 for every s ∈ S and (π, σ ) ∈ Π × Σ. Moreover, by properties (P1)-(P3) it follows that for all s ∈ S and (π, σ ) ∈ Π × Σ. The reader is referred to Asienkiewicz and Jaśkiewicz (2017), where (5) and further details are proved. Hence, lim N →∞ U i N (s, π, σ ) exists and let us denote this limit by U i (s, π, σ ). By the aforementioned discussion, it follows that each player is careful of his future unknown continuation function. Therefore, at every stage he uses the entropic risk measure, parametrized by his risk-averse coefficient γ, to calculate the discounted utility in the infinite time horizon.

Existence of symmetric stationary Nash equilibria
for each s ∈ S and any π ∈ Π such that (π, σ * ) is feasible and for each s ∈ S and any σ ∈ Σ such that (π * , σ ) is feasible.

Definition 4 A Stationary Markov Perfect Equilibrium (SMPE) is a Nash equilibrium
The purpose of this section is to find a symmetric stationary pure Nash equilibrium in an appropriate class of strategies. Therefore, we define the subset of F i as follows: The definition of the sets F 0 i (i = 1, 2) given in Jaśkiewicz and Nowak (2018a) on p. 243 should be same as above. More precisely, the function ϕ(s) := s − φ(s) in Jaśkiewicz and Nowak (2018a) must be replaced by ϕ(s) := s/2 − φ(s).
We shall need the following assumptions imposed on the felicity function.
Assumption 1 (Felicity function) Function u is increasing, bounded, strictly concave and continuous at s = 0.
We also propose two alternative sets of assumptions for the transition probability.

Remark 1
The predecessors of our work on symmetric dynamic games of resource extraction are Dutta and Sundaram (1992) and Jaśkiewicz and Nowak (2018a). The common feature of these works is that the authors deal with standard discounted expected payoffs or utilities for the players. This in turn implies that the players care only about the expected value of the future random payoffs. In other words, when calculating the discounted expected utility in the infinite time horizon, the players take into account only the expectation of the continuation function. In our approach, we allow the agents to be risk averse towards future random payoffs in the sense that according to (3) the players care not only about the expectation but also about the variance of the continuation function. Therefore, they evaluate the discounted utility in a recursive way by using the entropic risk measure (or the exponential certainty equivalent) parametrized by the risk coefficient. As in Dutta and Sundaram (1992), a felicity function is bounded (in contrast to Jaśkiewicz and Nowak (2018a)) and as in Jaśkiewicz and Nowak (2018a) the resource stock takes values in [0, +∞) [in contrast to Dutta and Sundaram (1992)].
Our assumptions imposed on the model are borrowed from Balbus et al. (2015a) and Jaśkiewicz and Nowak (2018a). More precisely, Assumption 3 coincides with Assumption (A) in Jaśkiewicz and Nowak (2018a). However, Assumption 2, analogous to the one in Balbus et al. (2015a), is slightly stronger than Assumptions (B1)-(B3) in Jaśkiewicz and Nowak (2018a). This is because the risk measure ρ used in evaluating the discounted utility is not additive in the sense that, in general, ρ(X + Y ) = ρ(X ) + ρ(Y ) for any random payoffs X and Y . Therefore, the transition probability q cannot be the convex combinations of stochastic kernels with coefficients depending on the investment amount as in Jaśkiewicz and Nowak (2018a).
On the other hand, our result can also be viewed as an extension of the optimization problem (one player case), studied in Asienkiewicz and Jaśkiewicz (2017) and Bäuerle and Jaśkiewicz (2018), to a strategic version of a one-sector optimal growth model. In contrast to Bäuerle and Jaśkiewicz (2018), we examine, as mentioned above, a model with bounded felicity functions. The crucial role played in a study of the unbounded case is the fact that both investment and consumption functions are non-decreasing. Here, this property does not hold, since the unique solution to the Bellman equation V φ in Lemma 5 depends on the consumption strategy φ of the other player.

Proof of Theorem 1
The methods of proving Theorem 1 resemble the ones used in Jaśkiewicz and Nowak (2018a). However, most of the preceding results must be formulated in terms of the entropic risk measure. Moreover, for the sake of completeness and clarity, we decided to provide all lemmas with their proofs.
Let X be the vector space of all continuous from the right functions with bounded variation on every [0, n], n ∈ N. We endow X with the topology of weak convergence. Recall that a sequence (η t ) ∞ t=1 converges weakly to η ∈ X if and only if η t (s) → η(s) as t → ∞ at any continuity point s ∈ S of η. The weak convergence of (η t ) ∞ t=1 to η is denoted by η t w − → η. Let X * be the set of all non-decreasing functions η ∈ X such that 0 ≤ η(s) ≤ d 1−β for all s ∈ S. Note that each η ∈ X * is upper semicontinuous. Furthermore, we notice that 0 is a continuity point of every function η ∈ X * . By Proposition 1 in Jaśkiewicz and Nowak (2018a), we have that X * is sequentially compact in X . Moreover, Proposition 2 in Jaśkiewicz and Nowak (2018a) yields that F 0 i is also a convex and sequentially compact subset of X when endowed with the topology of weak convergence. Now we start with a sequence of preliminary lemmas.
Lemma 1 Assume that f n w − → f in X * and y n → y in S as n → ∞. Then f (y) ≥ lim sup n→∞ f n (y n ).
Proof Let y 0 > y be a continuity point of f . Then there exists N ∈ N such that y n < y 0 for all n > N . Therefore, f n (y n ) ≤ f n (y 0 ) for n > N and finally lim sup n→∞ f n (y n ) ≤ lim sup n→∞ f n (y 0 ) = f (y 0 ). Since y 0 can be chosen arbitrarily close to y and f is continuous from the right, we deduce that lim sup n→∞ f n (y n ) ≤ f (y).
Lemma 2 Let Assumptions 2 or 3 hold. Assume that f n w − → f in X * and y n → y in S, n → ∞. Then we have We have that The first inequality follows from property (P1) and Lemma 3.2 in Serfozo (1966), whereas the second one is a consequence of Lemma 1 and (P1). Thus, the result follows.
Lemma 3 Let Assumption 3 hold. Assume that f ∈ X * and y n → y in S as n → ∞.
Then we obtain The function f * is lower semicontinuous. Furthermore, f * (z) = f (z) for any continuity point z ∈ S of f . Recall that 0 is a continuity point of f . Hence, f * (0) = f (0). By Assumption 3(i), we have that By Lemma 3.2 in Serfozo (1966), we obtain Combining (6) and (7) with Lemma 2, we infer that Thus, the result follows.
Lemma 4 Let Assumption 2 hold. Assume that y n y in S as n → ∞ and f ∈ X * . Then it follows Proof By Assumption 2(i), we infer that Hence, the above inequality and Lemma 2 yield These inequalities finish the proof.
Let φ ∈ F 0 2 and Π(φ) be the set of all strategies π for player 1 for which the pair (π, φ) is feasible. We are now ready to formulate our next lemma.
for each s ∈ S. Let either Assumptions 1 and 2 or Assumptions 1 and 3 be satisfied. Then there exists a unique function V φ ∈ X * such that for all s ∈ S. Moreover, Proof For any V ∈ X * , define the operator T as follows: Observe that since s → s − φ(s) is upper semicontinuous and u is increasing and continuous, it follows that the function (s, y) → u(s −φ(s)− y) is upper semicontinuous. Moreover, by Lemma 2 y → − β γ ln S e −γ V (z) q(dz|y) is also upper semicontinuous. Hence, by Proposition D.5 in Hernández- Lerma and Lasserre (1996), the function T V is upper semicontinuous. This fact and (4) yield that T : X * → X * . We have to prove that T is contractive. Assume that V 1 , V 2 ∈ X * . By properties (P1) and (P2) for each s ∈ S, we have Changing the roles of V 1 and V 2 we get By the Banach fixed point theorem, there exists a unique function V φ ∈ X * such that for every feasible consumption a for agent 1 (that means a + φ(s) ≤ s for every s ∈ S). Proceeding along similar lines as in Asienkiewicz and Jaśkiewicz (2017) (see formula (3.6)) we obtain by iteration that for every N ∈ N and π ∈ Π(φ) Letting N tend to infinity, we have that V φ (s) ≥ U 1 (s, π, φ) for any π ∈ Π(φ) and s ∈ S. Hence, From Proposition D.5 in Hernández- Lerma and Lasserre (1996), there exists ψ ∈ F 1 such that Again, by iteration of this equation and making use of properties (P1)-(P3), we obtain that for every s ∈ S Letting N go to infinity, we have for all s ∈ S and, consequently, Inequalities (8) and (9) imply that For any s ∈ S we set g(φ)(s) := max A φ (s).
Proof Suppose that s → A φ (s) is not ascending. This means that there exist s 1 < s 2 and y 1 ∈ A φ (s 1 ), y 2 ∈ A φ (s 2 ) such that y 1 > y 2 . Observe that the set L := {(s, y) : s ∈ S, y ∈ Φ(s)} is a lattice with the usual component-wise order on R 2 . Consequently, the points (s 1 , y 2 ) and (s 2 , y 1 ) belong to L. From Assumption 1, u is strictly concave. From the proof of Lemma 2 in Nowak (2006) and the fact that Thus, we have a contradiction.
Lemma 7 Let ψ be any selector of the correspondence s → A φ (s), i.e., ψ(s) ∈ A φ (s) for all s ∈ S. If ψ is continuous at s 0 , then A φ (s 0 ) is a singleton.
Proof Clearly, A φ (0) is a singleton. Assume that s 0 > 0 and y 1 , y 2 are elements of A φ (s 0 ) such that y 1 < y 2 . Since s → A φ (s) is ascending, we conclude that But ψ is continuous at s 0 ∈ S. Thus, we have a contradiction.

Lemma 8 The function g(φ) is a unique non-decreasing and continuous from the right selector of the correspondence s → A φ (s).
Proof From Lemma 6 the function g(φ) is non-decreasing. Moreover, we observe that the graph of the correspondence s → A φ (s) is closed from the right. Indeed, take s n s and y n ∈ A φ (s n ). From Lemma 6 it follows that y n is non-increasing and let y n converge to some y. Lemma 3 (under Assumption 3) or Lemma 4 (under Assumption 2) and Assumption 1 imply that y ∈ A φ (s). Therefore, g(φ) is continuous from the right. Hence, g(φ) is an upper semicontinuous selector of the correspondence s → A φ (s). The uniqueness follows from Lemma 7.
Proof of Theorem 1 Define the operator L as follows Lφ(s) := s−g(φ)(s) 2 for s ∈ S and φ ∈ F 0 2 . Lemma 8 implies that Lφ ∈ F 0 1 . Hence, L : F 0 2 → F 0 1 . We have to show that the operator L is continuous when F 0 1 and F 0 2 are equipped with the topology of weak convergence. Suppose that φ n w − → φ as n → ∞. From fact that the set X * is sequentially compact in X , we infer that there exists a subsequence of (V φ n ) ∞ n=1 converging to some V in X * . Without loss of generality we may accept that V n := V φ n w − → V in X * as n → ∞. Analogously, we may assume that ψ n := g(φ n ) w − → ψ in F 0 1 . Thus, for each n ∈ N, we obtain from Lemma 5 that Let S 1 ⊂ S be the set of all continuity points of the functions V , φ and ψ. For any s ∈ S 1 we get V n (s) → V (s), φ n (s) → φ(s) and ψ n (s) → ψ(s). Using Assumption 1, Lemma 2 and the last display, we obtain that Let s / ∈ S 1 . Since S 1 is dense in S and the functions V , ψ and φ are continuous from the right, we may choose a sequence (s m ) ∞ m=1 in S such that s m s as m → ∞. Therefore, we get From Lemma 2 and letting m → ∞ we conclude that (10) holds for all s ∈ S. On the other hand, for any n ∈ N, y ∈ [0, s − φ n (s)] and s ∈ S, by Lemma 5 we have Now we define the following sets: -S d is a countable set of discontinuity points of the function V ; -S 2 is the set of all continuity points of the functions V and φ; -S 3 is the set of all y ∈ S such that q(S d |y) = 0.
Recall that 0 / ∈ S d . Clearly, the set S 2 is dense in S. The set S 3 is also dense in S and contains the state 0. These two facts follow from either Assumption 2(iii) or Assumption 3(i). Choose any s ∈ S 2 ∩ S + and y ∈ S 3 ∩ [0, s − φ(s)). Then there exists some N ∈ N such that y ∈ [0, s − φ n (s)] for all n > N . Hence, we have the following inequality: By the dominated convergence theorem and the fact that y ∈ S 3 , we obtain Thus, we can conclude that for y ∈ [0, s − φ(s)) ∩ S 3 and s ∈ S 2 ∩ S + . Let us consider s 0 ∈ S and y 0 ∈ [0, s 0 − φ(s 0 )]. Now we choose two sequences (s m ) ∞ m=1 and (y m ) ∞ m=1 such that s m s 0 , y m y 0 as m → ∞ and s m ∈ S 2 ∩ S + , y m ∈ S 3 ∩ [0, s m − φ(s m )) for all m ∈ N. Obviously, s m − φ(s m ) ≥ s 0 − φ(s 0 ). Therefore, by (11), we obtain Letting m → ∞ and making use of Lemma 3 in case of Assumption 2 or Lemma 4 in case of Assumption 3, the continuity of u, the continuity from the right of functions V , and s → s−φ(s) we infer that inequality (11) holds for s 0 ∈ S and y 0 ∈ [0, s 0 −φ(s 0 )]. Finally, inequalities (10) and (11) yield that Since ψ is non-decreasing and upper semicontinuous, it follows by Lemma 8 that g(φ) = ψ. Thus, the operator L is continuous. By the Schauder-Tychonoff fixed point theorem (see Corollary 17.56 in Aliprantis and Border (2006)), there exists φ * ∈ F 0 2 such that Lφ * = φ * . This implies that (φ * , φ * ) ∈ F 0 1 × F 0 2 is a symmetric SMPE.

Examples
In this section, we provide two examples satisfying our assumptions. Further examples can be found in Balbus et al. (2015a, b), Brock and Mirman (1972) and Jaśkiewicz and Nowak (2018b).
Observe that q satisfies Assumption 2 (i) and (ii). For proving that Assumption 2 (iii) is satisfied, observe that Z s = {y ∈ S : f j (y) = s for some j = 1, . . . , m} for each s ∈ S. Hence, the cardinality of Z s is at most m. As a result, q obeys Assumption 2. Here, we may assume that the utility function for both players has the following form: u(c) = √ c for c ∈ [0, 1) and u(c) = 3 2 − 1 1 + c 2 for c ≥ 1.
Clearly u is increasing, strictly concave and continuous at 0. As a result, Assumption 1 is satisfied.