On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

Jaśkiewicz, Anna; Nowak, Andrzej S.

doi:10.1007/s00186-014-0491-8

On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

Original Article
Open access
Published: 04 January 2015

Volume 81, pages 169–179, (2015)
Cite this article

Download PDF

You have full access to this open access article

Mathematical Methods of Operations Research Aims and scope Submit manuscript

On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

Download PDF

Anna Jaśkiewicz¹ &
Andrzej S. Nowak²

1539 Accesses
8 Citations
Explore all metrics

Abstract

We deal with discounted ARAT stochastic games on a Borel state space with finite action spaces and nonatomic transition probabilities. We prove the existence of pure Nash equilibria in stationary almost Markov strategies that depend only on the current and previous state of the game. Our proof is based on an existence theorem for correlated equilibria in stochastic games and some results on the integrals of set-valued mappings with respect to a probability measure depending on a parameter.

On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Article Open access 13 January 2023

Nash Equilibria for Total Expected Reward Absorbing Markov Games: The Constrained and Unconstrained Cases

Article 17 January 2024

Stationary equilibrium in stochastic dynamic models: Semi-Markov strategies

Article 10 April 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

An integral analysis of set-valued mappings plays an important role in the study of various equilibria in nonzero-sum stochastic games with general state spaces; see Mertens and Parthasarathy (1991, 2003), Nowak and Raghavan (1992), Jaśkiewicz and Nowak (2005) and Barelli and Duggan (2014). In particular, certain results on the integrals with respect to a parametrized measure, proved by Artstein (1989) or Mertens (2003), are significant in showing the existence of Nash equilibria in general models of games in the class of randomised history dependent or semi-Markov strategies; see Mertens and Parthasarathy (1991, 2003), Barelli and Duggan (2014). In this paper, we focus on some special class of nonzero-sum stochastic games with Borel state space, finite action spaces and additive reward and transition structure, called in the sequel ARAT games. We are interested in Nash equilibria among pure stationary almost Markov strategies that depend, in each period of the game, on the current and previous state of the game. This class of strategies is a subclass of stationary semi-Markov strategies considered in Barelli and Duggan (2014). More precisely, they showed the existence of randomised Nash equilibria in the class of strategies, where dependence on the current, previous states and the previous actions is assumed. A stationary Markov Nash equilibrium consists of strategies that depend only on the current state. For convenience, they will be also called stationary. This term is common and has been used in a number of papers on stochastic games.

ARAT games with Borel state and finite action spaces were first studied by Himmelberg et al. (1976), who showed the existence of stationary Nash equilibria for $p$-almost all initial states. Their result was strengthened by Parthasarathy (1982), who obtained stationary Nash equilibria for all initial states. Pure stationary Markov Nash equilibria may not exist in ARAT stochastic games; see Example 3.1 (a game with 4 states) in Raghavan et al. (1985) or Example 4 (a game with 2 states) in Nowak (2006). The existence of pure stationary $\epsilon $-equilibria in ARAT games was proved by Nowak (1987) under the assumption that the transition probabilities are nonatomic and the discounted payoffs are “averaged” with respect to a nonatomic distribution of the initial state. Markovian $\epsilon $-equilibria in pure strategies for ARAT games can be shown to exist by using the backward induction in the finite horizon game; see Rieder (1979). Thuijsman and Raghavan (1997) (for finite state space) and Küenle (1999) (for Borel state space and compact metric action spaces) established the existence of nonstationary history dependent pure Nash equilibria in ARAT games. Their proofs are based on the well-known idea of threats used frequently in repeated games. Stochastic games with average payoffs, additive transitions and finite state and action spaces were studied by Flesch et al. (2007), where some results on $\epsilon $-equilibria were given in the class of randomised strategies.

In this paper, we study pure strategies in ARAT games, which are special cases of stationary semi-Markov ones introduced recently by Barelli and Duggan (2014). Assuming that the action spaces are finite and the transition probabilities are dominated by a nonatomic probability measure on a Borel state space, we prove the existence of pure Nash equilibria in stationary almost Markov strategies that depend only on the current and previous state of the game. Similarly as in Barelli and Duggan (2014), our proof is a combination of some arguments used by Nowak and Raghavan (1992) to study correlated equilibria and a measurable selection theorem for parametrized set-valued integrals due to Mertens (2003). Our main result on pure Nash equilibria in the class of above-mentioned strategies contributes to the literature on ARAT games. This result, however, cannot be extended to ARAT stochastic games with transition probabilities involving atoms. We give an example with two states where pure stationary almost Markov Nash equilibria do not exist. This fact shows that our assumption on nonatomic transitions is indeed essential.

2 The model and main result

We consider a two-person nonzero-sum discounted stochastic game $G$ with additive rewards and additive transitions (an ARAT game for short) for which:

(i)
$(S,\mathcal{B})$ is a nonempty Borel state space with its Borel $\sigma $-algebra $\mathcal{B}.$
(ii)
$A=\{1,2,\ldots ,n_1\}$ and $B=\{1,2,\ldots ,n_2\}$ are action spaces for players 1 and 2, respectively.
(iii)
$A(s)\subset A,$ $B(s)\subset B$ are nonempty sets of actions available to player 1 and 2 in state $s\in S.$ Assume that the set-valued mappings $s \mapsto A(s)$ and $s \mapsto B(s)$ are lower measurable. Define
$$\begin{aligned} D=\{(s,a,b): \;s\in S,\ a\in A(s),\ b\in B(s)\}. \end{aligned}$$
Then $D$ is a Borel subset of $S\times A\times B.$
(iv)
Let $u_n: S \times A \mapsto \mathbb {R}$ and $w_n: S\times B \mapsto \mathbb {R}$ be bounded Borel measurable functions for $n=1,2$. The reward (or payoff) function for player $n=1,2$ is given by
$$\begin{aligned} r_n(s,a,b) =u_n(s,a) +w_n(s,b),\ \text{ where }\ (s,a,b)\in D. \end{aligned}$$
(v)
$q:D\times \mathcal{B}\mapsto [0,1]$ is a transition probability such that
$$\begin{aligned} q(\cdot |s,a,b) = q_1(\cdot |s,a) + q_2(\cdot |s,b)\ \text{ for } \text{ each }\ (s,a,b)\in D \end{aligned}$$
and for some Borel measurable subtransition probabilities $q_1$ and $q_2.$ We assume that there exists a nonatomic probability measure $\mu $ on $(S,\mathcal{B})$ such that $q(\cdot |s,a,b)\ll \mu $ for all $(s,a,b)\in D.$
(vi)
$\beta \in (0,1)$ is a discount factor.

These above components describe a discrete-time dynamic game in which each period $t\in \mathbb {N}$ begins with a state $s_t\in S,$ and after observing $s_t,$ the players simultaneously choose their actions $a_t\in A(s_t),$ $b_t\in B(s_t)$ and obtain rewards $r_1(s_t,a_t,b_t)$ and $r_2(s_t,a_t,b_t).$ A new state $s_{t+1},$ is realised from the distribution $q(\cdot | s_t,a_t,b_t)$ and new period begins with rewards discounted by $\beta .$ The game is played with past history $h_t=(s_1,a_1,b_1,\ldots ,a_{t-1},b_{t-1},s_t)$ as common knowledge for both players, where $s_k$ is the state in the $k$-th period of the game, $a_k\in A(s_k)$ and $b_k\in B(s_k)$ are the actions taken by the players at period $k=1,\ldots ,t,$ $t\in \mathbb {N}.$ In this paper, we are interested in Nash equilibria in pure strategies. Therefore, we do not define randomised strategies. A pure strategy for player $1$ ($2$) is a sequence $\pi =(\pi _t)$ ($\sigma =(\sigma _t)$) of Borel measurable mappings, where each $\pi _t$ ($\sigma _t$) associates with each given history $h_t$ an action $a_t\in A(s_t)$ ($b_t\in B(s_t)$). Let $F_1$ ($F_2$) be the set of all Borel measurable functions $f: S\times S \mapsto A$ ($g: S\times S \mapsto B$) such that $f(s,s') \in A(s')$ ($g(s,s') \in B(s')$) for each $s,s'\in S.$ A pure stationary almost Markov strategy $\pi $ for player 1 is such that for some $f\in F_1$ we have $\pi _1(s_1)= f(s_1,s_1)$ for all $s_1\in S$ and $\pi _t(h_t)=f(s_{t-1},s_t)$ for every $h_t$ and $t\ge 2.$ In other words, a pure stationary almost Markov strategy for player 1 may only depend on the previous and current state for any $t\ge 2.$ We will identify any pure stationary almost Markov strategy for player 1 with $f\in F_1.$ Similarly, we define pure stationary almost Markov strategies for player 2 and identify them with Borel measurable mappings $g\in F_2.$ We would like to indicate that $F_1$ and $F_2$ are special cases of the classes of stationary semi-Markov strategies considered in Barelli and Duggan (2014), where dependence of $\pi _t$ (or $\sigma _t$) on the current state, previous state and previous actions is assumed. Furthermore, a pure stationary almost Markov strategy is called Markov, if it is independent of the previous state. The set of all strategies is denoted by $\varPi $ for player 1 and by $\varSigma $ for player 2.

For any strategies $\pi \in \varPi $ and $\sigma \in \varSigma $ we define the expected discounted payoff or reward function for player $n$:

$$\begin{aligned} J_n(s,\pi ,\sigma )=E_s^{\pi \sigma } \left( \sum _{k=1}^\infty \beta ^{k-1}r_n(s_k,a_k,b_k)\right) , \end{aligned}$$

where $E_s^{\pi \sigma }$ is the expectation operator corresponding to the unique probability measure $P_s^{\pi \sigma }$ defined on the space of all feasible infinite histories of the process starting in state $s=s_1\in S$ and induced by the transition probability $q$ and strategies $\pi $ and $\sigma .$

A pair of strategies $(\pi ^*,\sigma ^*)\in \varPi \times \varSigma $ is called a Nash equilibrium, if

$$\begin{aligned} J_1(s,\pi ^*,\sigma ^*)\ge J_1(s,\pi ,\sigma ^*) \text{ for } \text{ all } \pi \in \varPi \end{aligned}$$

and

$$\begin{aligned} J_2(s,\pi ^*,\sigma ^*)\ge J_2(s,\pi ^*,\sigma ) \text{ for } \text{ all } \sigma \in \varSigma . \end{aligned}$$

In the sequel, we shall refer to the game $\tilde{G}$ with the state space $S\times S$ and action spaces $\tilde{A}(s,s')= A(s'),$ $\tilde{B}(s,s')= B(s')$ for all $(s,s')\in S\times S.$ The reward functions in game $\tilde{G}$, denoted by $\tilde{r}_n,$ are defined as follows

$$\begin{aligned} \tilde{r}_n((s,s'),a,b):= r_n(s',a,b)\ \text{ for } \text{ all }\ (s,s')\in S\times S,\ a\in A(s'),\ b \in B(s'). \end{aligned}$$

The transition probability $\tilde{q}$ in game $\tilde{G}$ is defined as follows

$$\begin{aligned} \tilde{q}(C_1\times C_2|(s,s'),a,b) :=\delta _{s'}(C_1)q(C_2|s',a,b) \end{aligned}$$

for all $(s,s')\in S\times S,$ $a\in A(s'),$ $b \in B(s')$ and $C_1,C_2\in \mathcal{B}.$ Here $\delta _{s'}$ denotes the Dirac measure concentrated at $s'.$ Hence, $\tilde{q}(\{s'\}\times C_2|(s,s'),a,b) :=q(C_2|s',a,b).$

Pure strategies are defined in game $\tilde{G}$ in an obvious manner. Note that each strategy $f\in F_1$ or $g\in F_2$ in game $G$ is stationary Markov in game $\tilde{G}.$ The discounted payoffs for player $n=1,2$ is denoted by $\tilde{J}_n((s,s'),\pi ,\sigma ).$ Note that if $s=s',$ then

$$\begin{aligned} J_n(s',f,g) =\tilde{J}_n((s,s'),f,g)\ \text{ for } \text{ all } \ f\in F_1,\ g\in F_2, s'\in S,\ n=1,2. \end{aligned}$$

(1)

We can now formulate our main result.

Theorem

Every ARAT game $G$ satisfying assumptions $(i)$–$(vi)$ has a Nash equilibrium in pure stationary almost Markov strategies.

Proof

Let $B(S\times S)$ be the space of all bounded Borel measurable real-valued functions on $S\times S.$ For any $s\in S$ and $v=(v_1,v_2),$ where $v_1,v_2 \in B(S\times S)$, we consider a static game $\varGamma _v(s)$ where the payoff to player $n=1,2$ is given by

$$\begin{aligned} U_n^v(s,a,b) := r_n(s,a,b)+\beta \int _S v_n(s,s')q(ds'|s,a,b), \end{aligned}$$

(2)

where $a\in A(s),\ b\in B(s). $ We shall also consider the payoff functions of the form (2) where $v_1$ and $v_2$ depend only on $s'\in S.$ Let $N_v(s)$ be the set of all pure Nash equilibria in the game $\Gamma _v(s).$ Under the ARAT assumption $N_v(s)\not = \emptyset .$ Indeed, $(a_0,b_0)\in N_v(s) $ if and only if

$$\begin{aligned} a_0 \in \text{ arg }\max _{a\in A(s)}\left[ u_1(s,a)+\beta \int _Sv_1(s,s')q_1(ds'|s,a)\right] \end{aligned}$$

and

$$\begin{aligned} b_0 \in \text{ arg }\max \limits _{b\in B(s)}\left[ w_2(s,b)+\beta \int _Sv_2(s,s')q_2(ds'|s,b)\right] . \end{aligned}$$

By $P_v(s)$ [$coP_v(s)$] we denote the set of all payoff vectors [convex combinations of payoff vectors] corresponding to equilibria in $N_v(s).$ Let $B(S)$ be the space of all bounded Borel measurable real-valued functions on $S.$ A simple adaptation of the arguments given in Nowak and Raghavan (1992) (see also page 35 in Jaśkiewicz and Nowak (2005)) yields the existence of some $w^*=(w^*_1,w^*_2)$ with $w^*_n \in B(S)$ for $n=1,2$ such that

$$\begin{aligned} w^*(s)\in co P_{w^*}(s)\; \text{ for } \text{ all } \; s\in S. \end{aligned}$$

Assume that $A\times B$ is endowed with the lexicographic order and every $A(s)\times B(s)$ is given the induced order. Write $A\times B= \{p_1,p_2,\ldots ,p_d\}$ with $d=n_1n_2.$ Define

$$\begin{aligned} q_0(C|s,p):=\left\{ \begin{array}{ll} q(C|s,p),&{} \text{ if }\ \ p\in A(s)\times B(s)\\ \mu (C),&{} \text{ if }\ \ p\notin A(s)\times B(s), \end{array}\right. \end{aligned}$$

where $C\in \mathcal{B}.$ Define the $d$-dimensional stochastic kernel

$$\begin{aligned} K(\cdot |s):= (q_0(\cdot |s,p_1),q_0(\cdot |s,p_2),\ldots ,q_0(\cdot |s,p_d)). \end{aligned}$$

Let $w_0^*(s,p):= \int _Sw^*(s')q_0(ds'|s,p)$, $p\in A\times B.$ Put

$$\begin{aligned} \overline{w}_0^*(s):=\int _Sw^*(s')K(ds'|s)= (w_0^*(s,p_1),w_0^*(s,p_2),\ldots ,w_0^*(s,p_d)). \end{aligned}$$

The set-valued mappings $s\mapsto P_{w^*}(s)$ and $s\mapsto co P_{w^*}(s)$ are lower measurable (see Lemma 6 in Nowak and Raghavan (1992)). Let $\varPsi $ be the graph of the mapping $s\mapsto \int _SP_{w^*}(s')K(ds'|s),$ i.e.,

$$\begin{aligned} \varPsi := \left\{ (s,y) \in S\times \mathbb {R}^{2d}: s\in S,\ y \in \int _SP_{w^*}(s') K(ds'|s)\right\} . \end{aligned}$$

By part 3 in Theorem 2 of Mertens (2003), there exists a Borel measurable mapping $\phi : \varPsi \times S \mapsto \mathbb {R}^2$ such that $\phi (s,y,s')\in P_{w^*}(s')$ for all $(s,y)\in \varPsi ,$ $s\in S$ and

$$\begin{aligned} y= \int _S \phi (s,y,s')K(ds'|s), \quad s\in S. \end{aligned}$$

(3)

From Lyapunov’s theorem (see Corollary 18.1.10 in Klein and Thompson (1984)), it follows that $\int _S P_{w^*}(s')K(ds'|s)=\int _S co P_{w^*}(s')K(ds'|s)$. Hence, $\overline{w}^*_0(s)\in \int _SP_{w^*}(s')K(ds'|s)$ for each $s\in S.$ Put

$$\begin{aligned} v^*(s,s'):= \phi (s,\overline{w}^*_0(s),s'), \quad s,\ s' \in S. \end{aligned}$$

Clearly $v^*\in B(S\times S).$ By (3), we have that

$$\begin{aligned} \overline{w}^*_0(s)=\int _Sv^*(s,s')K(ds'|s)=\int _S w^*(s')K(ds'|s) \ \text{ for } \text{ every }\ s\in S,\ s'\in S. \end{aligned}$$

This fact implies that

$$\begin{aligned} \int _Sv^*(s,s')q(ds'|s,a,b)=\int _S w^*(s')q(ds'|s,a,b) \ \text{ for } \text{ every }\ (s,a,b)\in D. \end{aligned}$$

(4)

Put $v^*=(v_1^*,v_2^*)$ and let $s^-$ denote the previous state. Since $v^*(s,s') \in P_{w^*}(s')$ for every $s,s' \in S$ or equivalently, $v^*(s^-,s) \in P_{w^*}(s)$ for every $s^-,s \in S,$ by Filippov’s implicit function theorem (see Theorem 18.17 in Aliprantis and Border (2006) or Lemma 4 in Nowak and Raghavan (1992)), there exists a pair $(f^*,g^*)\in F_1\times F_2$ such that

$$\begin{aligned} v_1^*(s^-,s)= U^{w^*}_1(s,f^*(s^-,s),g^*(s^-,s))= \max _{a\in \tilde{A}(s^-,s)}U^{w^*}_1 (s,a,g^*(s^-,s)) \end{aligned}$$

(5)

and

$$\begin{aligned} v_2^*(s^-,s)= U^{w^*}_2(s,f^*(s^-,s),g^*(s^-,s))= \max _{b\in \tilde{B}(s^-,s)}U^{w^*}_2 (s,f^*(s^-,s),b) \end{aligned}$$

(6)

for all $s^-,s\in S.$ By (4), we have $U^{w^*}_n(s,a,b) = U^{v^*}_n(s,a,b)$ for each $(s,a,b)\in D$ and $n=1,2.$ Thus, from (5) and (6), we conclude that

$$\begin{aligned} v_1^*(s^-,s)=U^{v^*}_1(s,f^*(s^-,s),g^*(s^-,s))= \max _{a\in \tilde{A}(s^-,s)}U^{v^*}_1 (s,a,g^*(s^-,s)) \end{aligned}$$

and

$$\begin{aligned} v_2^*(s^-,s)= U^{v^*}_2(s,f^*(s^-,s),g^*(s^-,s))= \max _{b\in \tilde{B}(s^-,s)}U^{v^*}_2 (s,f^*(s^-,s),b) \end{aligned}$$

for every $s^-,s' \in S.$ We have obtained two Bellman equations (for players 1 and 2) in the game $\tilde{G}.$ By standard dynamic programming arguments (see Blackwell (1965)), these equations imply that

$$\begin{aligned} \tilde{J}_1((s^-,s),f^*,g^*) = \max _{f\in F_1}\tilde{J}_1((s^-,s),f,g^*) \end{aligned}$$

(7)

and

$$\begin{aligned} \tilde{J}_2((s^-,s),f^*,g^*) = \max _{g\in F_2}\tilde{J}_2((s^-,s),f^*,g) \end{aligned}$$

(8)

for all $s^-,s \in S.$ Putting $s^-=s$ in (7), (8) and using (1), we obtain

$$\begin{aligned} J_1(s,f^*,g^*) = \max _{f\in F_1}J_1(s,f,g^*), \quad J_2(s,f^*,g^*) = \max _{g\in F_2}J_2(s,f^*,g) \end{aligned}$$

for each $s\in S.$ These equations and standard dynamic programming arguments (see Blackwell (1965)) imply that $(f^*,g^*)$ is a Nash equilibrium in the class of all strategies of the players. $\square $

Remark 1

From the proof, it follows that $(f^*,g^*)$ is subgame perfect in the sense of Selten (1975). Pure stationary Markov Nash equilibria can be shown to exist in the same manner as in Theorem 2 in Nowak (2006), if the transition probability is a convex combination of finitely many nonatomic measures on $S.$ The existence of pure stationary Markov Nash equilibria under assumptions made in this paper (i.e. with non-atomic additive transitions and additive rewards) is an open problem. Studying stochastic ARAT games Küenle (1999) did not assume that the transition probability is dominated by any probability measure. He obtained a pure Nash equilibrium $(\pi ^*,\sigma ^*)$ where each $\pi ^*_t$ and $\sigma ^*_t$ depends on the entire history $h_t.$ We assume the dominance of $q$ with respect to an nonatomic measure $\mu $ and obtain an equilibrium in the simplest possible class of strategies. Our result cannot be extended to the case where $\mu $ has some atoms, which is illustrated in the next section. The survey of the existing results on randomised Nash equilibria in stochastic games without ARAT structure the reader may find in Nowak (2003), Jaśkiewicz and Nowak (2005) and Barelli and Duggan (2014). Related result to our theorem on randomised equilibria in games with finite state independent action sets is mentioned on page 147 in Mertens and Parthasarathy (1991).

Remark 2

The Nash equilibrium strategy for each player considered in this paper is called “stationary”, since it is determined by a single function independent of calendar time. The term “almost Markov”, on the other hand, refers to the property that a strategy does not depend only on the current state (at any stage $t\ge 2$), but also on the previous state. The fact that our strategies in Nash equilibrium depend on the current and previous state follows from applying a parametrised version of Lyapunov’s theorem given by Mertens (2003).

3 A counterexample

Below we give an example of a stochastic ARAT game with finite state and action spaces having no pure stationary almost Markov Nash equilibrium.

Let $S=\{1,2\}$, $A(1)=B(2)=\{1,2\}$, $A(2)=B(1)=\{1\}.$ Assume that $r_1(1,a,1)=0$ for $a\in A(1)$ and $r_1(2,1,b)= 6$ if $b=1,$ and $r_1(2,1,b)= -6$ if $b=2.$ Let

$$\begin{aligned} r_2(1,a,1)= u_2(1,a)= \left\{ \begin{array}{ll} 7&{}\ \text{ for }\ \ a=1\\ 0&{} \ \text{ for }\ \ a=2 \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} r_2(2,1,b)= w_2(1,b)= \left\{ \begin{array}{ll} 0&{} \ \text{ for }\ \ b=1\\ 1&{} \ \text{ for }\ \ b=2. \end{array}\right. \end{aligned}$$

We assume that the transition probability in state $s=1$ is controlled by player $1$ and in state $s=2$ is controlled by player 2. We define

$$\begin{aligned} q(1|1,a,1)=\left\{ \begin{array}{ll} \frac{2}{3}&{} \text{ if }\ \ a=1\\ \frac{1}{3}&{} \text{ if }\ \ a=2 \end{array}\right. \quad \text{ and } \quad q(1|2,1,b)= \left\{ \begin{array}{ll} \frac{2}{3}&{} \text{ if }\ \ b=1\\ \frac{1}{3}&{} \text{ if }\ \ b=2 \end{array}\right. \end{aligned}$$

and $q(2|s,a,b)=1-q(1|s,a,b)$ for each $s\in S$, $a\in A(s),$ $b\in B(s).$

Let $\tilde{S}=\{s_1,s_2,s_3,s_4\}$ where $s_1=(1,1)$, $s_2=(1,2),$ $s_3= (2,1)$, $s_4=(2,2).$ Note that any pure stationary almost Markov strategy for player $1$ can be defined as $f_{ij}(s_1) = i,$ $f_{ij}(s_2) = j,$ and $f_{ij}(s_3) = f_{ij}(s_4) = 1.$ Thus, player 1 has four pure stationary almost Markov strategies. A pure stationary almost Markov strategy for player 2 is denoted by $g_{ij},$ where $g_{ij}(s_3) = i,$ $g_{ij}(s_4) = j,$ and $g_{ij}(s_1) = g_{ij}(s_2) = 1.$ In order to compute the discounted expected rewards to the players for any pair $(f_{ij},g_{kl})$ of strategies we consider an auxiliary game $\tilde{G}$ (defined in Sect. 2), in which $f_{ij}$ and $g_{kl}$ are pure stationary strategies (or pure stationary Markov strategies). For computational purposes, it is convenient to use the standard matrix notation that is common in the finite state space case. By $Q(f_{ij},g_{kl})$ we denote the transition probability matrix induced by $\tilde{q}$ and strategies $f_{ij},$ $g_{kl}.$ We assume that the rows and columns of $Q(f_{ij},g_{kl})$ are labeled by $s_1, s_2, s_3$ and $s_4.$ Let

$$\begin{aligned} \tilde{R}_n(f_{ij},g_{kl}):=\left[ \begin{array}{rrrr} \tilde{r}_n(s_1,f_{ij},g_{kl})&\tilde{r}_n(s_2,f_{ij},g_{kl})&\tilde{r}_n(s_3,f_{ij},g_{kl})&\tilde{r}_n(s_4,f_{ij},g_{kl}) \end{array} \right] ^T \end{aligned}$$

be the vector of rewards of player $n=1,2$ in the auxiliary game, induced by strategies $f_{ij}$ and $g_{kl}.$ By $\tilde{J}_n(s_m,f_{ij},g_{kl})$, we denote the discounted expected payoff to player $n$ in game $\varGamma $. Note that

$$\begin{aligned} \tilde{J}_n(s_1,f_{ij},g_{kl})=J_n(1,f_{ij},g_{kl})\quad \text{ and }\quad \tilde{J}_n(s_4,f_{ij},g_{kl})=J_n(2,f_{ij},g_{kl}) \end{aligned}$$

for each strategy pair $(f_{ij},g_{kl}).$ For any player $n$, define

$$\begin{aligned} \tilde{J}_n(f_{ij},g_{kl}):= \left[ \begin{array}{rrrr} \tilde{J}_n(s_1,f_{ij},g_{kl})&\tilde{J}_n(s_2,f_{ij},g_{kl})&\tilde{J}_n(s_3,f_{ij},g_{kl})&\tilde{J}_n(s_4,f_{ij},g_{kl}) \end{array} \right] ^T. \end{aligned}$$

The standard formula yields that

$$\begin{aligned} \tilde{J}_n(f_{ij},g_{kl})= \left[ I-\beta Q(f_{ij},g_{kl})\right] ^{-1}\tilde{R}_n(f_{ij},g_{kl}). \end{aligned}$$

If $[I-\beta Q(f_{ij},g_{kl})]^{-1}_m$ is the $m$-th row of the matrix $[I-\beta Q(f_{ij},g_{kl})]^{-1},$ then we have

$$\begin{aligned} J_n(1,f_{ij},g_{kl})= \left[ I-\beta Q(f_{ij},g_{kl})\right] ^{-1}_1 \tilde{R}_n(f_{ij},g_{kl}) \end{aligned}$$

(9)

and

$$\begin{aligned} J_n(2,f_{ij},g_{kl})= \left[ I-\beta Q(f_{ij},g_{kl})\right] ^{-1}_4 \tilde{R}_n(f_{ij},g_{kl}). \end{aligned}$$

(10)

For an illustration assume that $\beta =3/4$ and consider the pair $(f_{21},g_{12}).$ Then

$$\begin{aligned}&\tilde{R}_1(f_{21},g_{12})=\left[ \begin{array}{rrrr} 0&0&6&-6 \end{array}\right] ^T, \quad \tilde{R}_2(f_{21},g_{12})= \left[ \begin{array}{rrrr} 0&7&0&1 \end{array}\right] ^T,\\&Q(f_{21},g_{12})=\frac{1}{3}\left[ \begin{array}{rrrr} 1&{}\quad 0&{}\quad 2 &{}\quad 0\\ 2&{}\quad 0&{}\quad 1&{}\quad 0\\ 0&{}\quad 2&{}\quad 0&{}\quad 1\\ 0&{}\quad 1&{}\quad 0&{}\quad 2 \end{array}\right] ,\quad I-\beta Q(f_{21},g_{12})= \frac{1}{4}\left[ \begin{array}{rrrr} 3&{}0&{}-2&{}0\\ -2&{}4&{}-1&{}0\\ 0&{}-2&{}4&{}-1\\ 0&{}-1&{}0&{}2 \end{array}\right] , \end{aligned}$$

and

$$\begin{aligned} \left[ I-\beta Q(f_{21},g_{12})\right] ^{-1}= \frac{4}{61}\left[ \begin{array}{rrrr} 27&{}\quad 10&{}\quad 16&{}\quad 8\\ 16&{}\quad 24&{}\quad 14&{}\quad 7\\ 10&{}\quad 15&{}\quad 24&{}\quad 12\\ 8&{}\quad 12&{}\quad 7&{}\quad 34 \end{array}\right] . \end{aligned}$$

Using these data and (9)–(10), we can easily obtain that

$$\begin{aligned} J_1(1,f_{21},g_{12}) = 192/61, \quad J_1(2,f_{21},g_{12})= -648/61 \end{aligned}$$

and

$$\begin{aligned} J_2(1,f_{21},g_{12}) = 312/61,\quad J_2(2,f_{21},g_{12})= 472/61. \end{aligned}$$

We can similarly compute the discounted expected payoffs to the players for each pair $(f_{ij},g_{kj})$ of strategies and consider two bimatrix games corresponding to states $1$ and $2$, respectively. The rows (columns) of the matrices given below are labeled by $f_{11}, f_{12}, f_{21}, f_{22}$ ($g_{11}, g_{12}, g_{21}, g_{22}$). The payoff matrices in state $s=1$ are:

$$\begin{aligned} M_1(1)=\left[ \begin{array}{rrrr} 6 &{}\quad \frac{24}{11} &{}\quad -\frac{24}{17}&{}\quad -8\\ \frac{48}{7}&{}\quad \frac{32}{13}&{}\quad -\frac{96}{61}&{}\quad -\frac{96}{11}\\ \frac{96}{11}&{}\quad \frac{192}{61}&{}\quad -\frac{192}{95}&{}\quad -\frac{192}{17}\\ \frac{48}{5}&{}\quad \frac{24}{7}&{}\quad -\frac{24}{11}&{}\quad -12 \end{array}\right] \quad \text{ and } \!\!\quad M_2(1)=\left[ \begin{array}{rrrr} 21 &{}\quad \frac{228}{11}&{}\quad \frac{348}{17}&{}\quad 20\\ 16&{}\quad \frac{632}{39}&{}\quad \frac{1000}{61}&{}\quad \frac{184}{11}\\ \frac{56}{11}&{}\quad \frac{312}{61}&{}\quad \frac{488}{95}&{}\quad \frac{88}{17}\\ 0&{}\quad \frac{4}{7}&{}\quad \frac{12}{11}&{}2 \end{array}\right] . \end{aligned}$$

Observe that this bimatrix game has no pure Nash equilibrium. In state $s=2$, the payoff matrices are:

$$\begin{aligned} M_1(2)=\left[ \begin{array}{rrrr} 12 &{}\quad -\frac{120}{11}&{}\quad \frac{120}{17}&{}\quad -16\\ \frac{96}{7}&{}\quad -\frac{136}{13}&{}\quad \frac{408}{61}&{}\quad -\frac{192}{11}\\ \frac{144}{11}&{}\quad -\frac{648}{61}&{}\quad \frac{648}{95}&{}\quad -\frac{288}{17}\\ \frac{72}{5}&{}\quad -\frac{72}{7}&{}\quad \frac{72}{11}&{}\quad -18 \end{array}\right] \quad \text{ and } \quad M_2(2)=\left[ \begin{array}{rrrr} 14 &{}\quad \frac{136}{11}&{}\quad \frac{232}{17}&{}\quad 12\\ 4&{}\quad \frac{200}{39}&{}\quad \frac{264}{61}&{}\quad \frac{60}{11}\\ \frac{84}{11}&{}\quad \frac{472}{61}&{}\quad \frac{728}{95}&{}\quad \frac{132}{17}\\ 0&{}\quad \frac{16}{7}&{}\quad \frac{8}{11}&{}\quad 3 \end{array}\right] . \end{aligned}$$

This bimatrix game has no pure Nash equilibrium either.

We conclude this section by pointing out that the above game has a randomised stationary Markov Nash equilibrium $(f^*,g^*)$ where $f^*(1) =(\frac{1}{2},\frac{1}{2})$ is a mixed strategy for player 1 in $s=1$ and $g^*(2)=(\frac{5}{8},\frac{3}{8})$ is a mixed strategy for player 2 in state $s=2.$

References

Aliprantis C, Border K (2006) Infinite dimensional analysis: a hitchhiker’s guide. Springer, New York
Google Scholar
Artstein Z (1989) Parametrized integration of multifunctions with applications to control and optimization. SIAM J Control Optim 27:1369–1380
Article MATH MathSciNet Google Scholar
Barelli P, Duggan J (2014) A note on semi-Markov perfect equilibria in discounted stochastic games. J Econ Theory 151:596–604
Article MATH MathSciNet Google Scholar
Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36:226–235
Article MATH MathSciNet Google Scholar
Flesch J, Thuijsman F, Vrieze OJ (2007) Stochastic games with additive transitions. Eur J Oper Res 179:483–497
Article MATH MathSciNet Google Scholar
Himmelberg CJ, Parthasarathy T, Raghavan TES, Van Vleck FS (1976) Existence of p-equilibrium and optimal stationary strategies in stochastic games. Proc Am Math Soc 60:245–251
Google Scholar
Jaśkiewicz A, Nowak AS (2005) Nonzero-sum semi-Markov games with the expected average payoffs. Math Methods Oper Res 62:23–40
Article MATH MathSciNet Google Scholar
Klein E, Thompson AC (1984) Theory of correspondences. Wiley, New York
MATH Google Scholar
Küenle HU (1999) Equilibrium strategies in stochastic games with additive cost and transition structure and Borel state and action spaces. Int Game Theory Rev 1:131–147
Article MATH MathSciNet Google Scholar
Mertens JF (2003) A measurable measurable choice theorem. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 107–130
Chapter Google Scholar
Mertens JF, Parthasarathy T (1991) Nonzero-sum stochastic games. In: Raghavan et al (eds) Stochastic games and related topics. Kluwer, Dordrecht, pp 145–148
Chapter Google Scholar
Mertens JF, Parthasarathy T (2003) Equilibria for discounted stochastic games. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 131–172
Chapter Google Scholar
Nowak AS (1987) Nonrandomized strategy equilibria in noncooperative stochastic games with additive transition and reward structure. J Optim Theory Appl 52:429–441
Article MATH MathSciNet Google Scholar
Nowak AS (2003) N-person stochastic games: extensions of the finite state space case and correlation. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 93–106
Chapter Google Scholar
Nowak AS (2006) Remarks on sensitive equilibria in stochastic games with additive reward and transition structure. Math Methods Oper Res 64:481–494
Article MATH MathSciNet Google Scholar
Nowak AS, Raghavan TES (1992) Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math Oper Res 17:519–526
Article MATH MathSciNet Google Scholar
Parthasarathy T (1982) Existence of equilibrium stationary strategies in discounted stochastic games. Sankhya Ser A 44:114–127
MATH MathSciNet Google Scholar
Raghavan TES, Tijs SH, Vrieze OJ (1985) On stochastic games with additive reward and transition structure. J Optim Theory Appl 47:451–464
Article MATH MathSciNet Google Scholar
Rieder U (1979) Equilibrium plans for non-zero sum Markov games. In: Moeschlin O, Pallaschke D (eds) Game theory and related topics. North-Holland, Amsterdam, pp 91–102
Google Scholar
Selten R (1975) Re-examination of the perfectness concept for equilibrium points in extensive games. Int J Game Theory 4:25–55
Article MATH MathSciNet Google Scholar
Thuijsman F, Raghavan TES (1997) Perfect information stochastic games and related classes. Int J Game Theory 26:403–408
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Wrocław University of Technology, 50-370, Wrocław, Poland
Anna Jaśkiewicz
Faculty of Mathematics, Computer Science and Econometrics, University of Zielona Góra, Podgórna 50, 65-246, Zielona Góra, Poland
Andrzej S. Nowak

Authors

Anna Jaśkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej S. Nowak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Jaśkiewicz.

Additional information

The authors gratefully acknowledge the financial support of the National Science Center under grant DEC-2011/03/B/ST1/00325.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Reprints and permissions

About this article

Cite this article

Jaśkiewicz, A., Nowak, A.S. On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games. Math Meth Oper Res 81, 169–179 (2015). https://doi.org/10.1007/s00186-014-0491-8

Download citation

Received: 30 April 2014
Accepted: 22 December 2014
Published: 04 January 2015
Issue Date: April 2015
DOI: https://doi.org/10.1007/s00186-014-0491-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

Abstract

Similar content being viewed by others

On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Nash Equilibria for Total Expected Reward Absorbing Markov Games: The Constrained and Unconstrained Cases

Stationary equilibrium in stochastic dynamic models: Semi-Markov strategies

1 Introduction

2 The model and main result

Theorem

Proof

Remark 1

Remark 2

3 A counterexample

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

Abstract

Similar content being viewed by others

On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Nash Equilibria for Total Expected Reward Absorbing Markov Games: The Constrained and Unconstrained Cases

Stationary equilibrium in stochastic dynamic models: Semi-Markov strategies

1 Introduction

2 The model and main result

Theorem

Proof

Remark 1

Remark 2

3 A counterexample

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation