Weighted-average stochastic games with constant payoff

Abstract

In a zero-sum stochastic game, at each stage, two opponents make decisions which determine a stage reward and the law of the state of nature at the next stage, and the aim of the players is to maximize the weighted-average of the stage rewards. In this paper we solve the constant-payoff conjecture formulated by Sorin, Venel and Vigeral in 2010 for two classes of stochastic games with weighted-average rewards: (1) absorbing games, a well-known class of stochastic games where the state changes at most once during the game, and (2) smooth stochastic games, a newly introduced class of stochastic games where the state evolves smoothly under optimal play.

Introduction

Model Stochastic games were introduced by  Shapley (1953) in order to model a repeated interaction between two opponent players in a changing environment. The game proceeds in stages. At each stage \(m\in \mathbb N\) of the game, players play a zero-sum game that depends on a state variable. Formally, knowing the current state \(k_m\), Player 1 chooses an action \(i_m\) and Player 2 chooses an action \(j_m\). Their choices occur independently and have two consequences: first, they produce a stage reward \(g(k_m,i_m,j_m)\) which is observed by the players and, second, they determine the law \(q(k_m,i_m,j_m)\) of the next period’s state \(k_{m+1}\). Thus, the sequence of states follows a Markov chain controlled by the actions of both players. To any sequence of nonnegative weights \(\theta =(\theta _m)\) and any initial state k corresponds the \(\theta\)-weighted average stochastic game which is one where Player 1 maximizes the expectation of \(\sum \nolimits _{m \ge 1} \theta _m g(k_m,i_m,j_m)\), given that \(k_1=k\), while Player 2 minimizes the same amount. A crucial aspect in this model is that the current state is commonly observed by the players at every stage. Another one is stationarity: the transition and stage reward functions do not change over time.

A \(\theta\)-weighted average stochastic game is thus described by a tuple \((K,I,J,g,q,k,\theta )\) where K is a set of states, I and J are the sets of actions of both players, \(g:K\times I\times J\rightarrow \mathbb R\) is the reward function, \(q:K\times I\times J\rightarrow \Delta (K)\) is the transition function, k is the initial state and \(\theta\) is a sequence of nonnegative weights so that \(\sum _{m\ge 1}\theta _m=1\). Like in Shapley’s seminal paper  (1953), we assume throughout this paper that K, I, J are finite sets, and identify the set K with \(\{1,\dots ,n\}\).

Discounted and finitely repeated stochastic games The cases where, for all \(m\in \mathbb N\), one has \(\theta _m=\lambda (1-\lambda )^{m-1}\) for some \(\lambda \in (0,1)\) or \(\theta _m=\frac{1}{T} \mathbb {1}_{\{m\le T\}}\) for some \(T\in \mathbb N\) are referred to as \(\lambda\)-discounted stochastic games and T-stage repeated stochastic games, respectively.

An example: the “Big Match” Introduced by Gillette (1957), the Big Match is the most famous stochastic game. The state space is \(K=\{k,0^*,1^*\}\), where states \(0^*\) and \(1^*\) are absorbing with payoff 0 and 1 respectively, and the action sets \(I=\{i,i'\}\) and \(J=\{j,j'\}\). That is:

$$\begin{aligned} \forall (i,j)\in I\times J,\quad {\left\{ \begin{array}{ll} q(0^*| 0^*,i,j)=q(1^*| 1^*,i,j)=1,\\ g(0^*,i,j)=0\quad \text { and } \quad g(1^*,i,j)=1\,.\end{array}\right. } \end{aligned}$$

The game with initial state k, the non-absorbing state, can be represented as follows:

figurea

As long as Player 1 plays action \(i'\), the state remains the same, and the stage rewards are 0 or 1 depending on Player 2’s action. On the contrary, when Player 1 plays i, the state moves to an absorbing state, either \(0^*\) or \(1^*\) depending on Player 2’s action, where the future stage rewards are fixed once and for all, i.e.:

$$\begin{aligned} {\left\{ \begin{array}{ll} q(k|k, B, L)=q(k| k, B, R)=q(1^* | k, T, L)= q(0^*| k, T, R)=1,\\ g(k,T,L)=g(k,B,R)=1, \quad \text {and}\quad g(k,T,R)=g(k,B,L)=0\,.\end{array}\right. } \end{aligned}$$

An asymptotically optimal strategy for Player 1. The \(\lambda\)-discounted Big Match has a value \(v_\lambda ^k=\frac{1}{2}\) and it is optimal for Player 1 to play the action i with the same probability \(\frac{\lambda }{1+\lambda }\) at every stage. Similarly, the T-stage game has a value \(v_T^k=\frac{1}{2}\). The optimal strategy of Player 1 consists in playing, at every stage \(1\le m\le T\), the action i with probability \(\frac{1}{T-m+2}\). More generally, the value \(v^k_\theta\) exists for all sequence of weights \(\theta\) and is equal to \(\frac{1}{2}\). An optimal strategy for Player 2 consists in playing actions j and \(j'\) with equal probability at every stage. An optimal strategy for Player 1 can be obtained recursively, using the so-called Shapley operator, which relies on the stationarity of the model. However, an even simpler strategy exists, which works well when \(\Vert \theta \Vert :=\max _m \theta _m\) is small enough: at stage \(m\ge 1\), play action i with probability \(\frac{\lambda _m}{1+\lambda _m}\), where \(\lambda _m=\frac{\theta _m}{\sum _{m'\ge m}\theta _m}\) is the relative weight of the current stage with respect to the remaining weights of the game. Indeed, for every \(\varepsilon >0\), there exists \(\delta >0\) so that for any \(\theta\) satisfying \(\Vert \theta \Vert \le \delta\), the strategy guarantees at least \(v^k_\theta -\varepsilon\).

The constant-payoff property. Consider the T-stage game again, and suppose that both player use their optimal strategies. What is the average reward after M stages ? In fact, the average reward over the first M stages is very close to \(v^k_T\) as long as M and T are large. This property is remarkable, as it means a constant flow of rewards for the players, under optimal play. More precisely, and more generally, the constant-payoff property holds if, for any \(\varepsilon >0\), there exists \(\delta >0\) so that for any \(\theta\) satisfying \(\Vert \theta \Vert \le \delta\), the cumulated rewards over the first M stages are equal to \((\sum _{m=1}^M \theta _m) v^k_\theta\) up to an error of at most \(\varepsilon\).

Selected past results Every stochastic game \((K,I,J,g,q,k,\theta )\) has a value, denoted by \(v_\theta ^k\). Although only stated for the discounted case, this result follows from  Shapley (1953). Further, both players have optimal strategies that depend, at each stage, only on the current state and the current stage, and the dependence on the stage is not needed in the discounted case.) The Big Match was solved by  Blackwell and Ferguson (1968), who considered the T-stage stochastic, where T is sufficiently large. In the sequel, \(v_\lambda ^k\) and \(v^k_T\) refer, respectively, to the value of a \(\lambda\)-discounted and a T-stage stochastic game.  Bewley and Kohlberg (1976) proved the convergence of \(v_\lambda ^k\) as \(\lambda\) goes to 0 and the convergence of \(v_T^k\) as T goes to infinity, and the equality of the limits. Further, Bewley and Kohlberg (1976) proved that the map \(\lambda \mapsto v_\lambda\) admits a Puiseux series expansion near 0, and the existence of an optimal strategy profile with the same property (see Sect. 2.1.2 below). Mertens and Neyman  (1981, 1982) proved the existence of the value \(v^k\), that is, Player 1 can ensure that the average reward is at least \(v^k\) in any T-stage stochastic game so that T is large enough, and similarly Player 2 can ensure that the average reward is at most \(v^k\) in all such games.  Neyman and Sorin (2010) studied stochastic games with a random number of stages, and proved that their values converge to \(v^k\) as the expected number of stages tends to \(+\infty\), under a monotonicity condition. Ziliotto  (2016) proved that the values of \(\theta\)-weighted stochastic games \(v_\theta ^k\) converge to \(v^k\) as \(\Vert \theta \Vert :=\max _{m\ge 1}\theta _m\) goes to 0 provided that \(\sum _{m\ge 1} |\theta _{m +1}^p - \theta _m^p |\) converges to zero for some \(p > 0\). The value of a stochastic game \(v^k\) was recently characterized by Attia and Oliu-Barton  (2019), as the unique solution of a single equation.

Notation In the sequel, \(\mathbb E^k_{\sigma ^\theta ,\tau ^\theta }\) denotes the expectation under the unique law induced by \((\sigma ^\theta ,\tau ^\theta )\) and the initial state k, as explained below in Sect. 2.1. Further, quantities \(d_\theta\), indexed by sequences \(\theta\), are said to have a limit d as \(\Vert \theta \Vert\) goes to 0 if for every \(\varepsilon >0\) there exists \(\delta >0\) so that for all \(\theta\) satisfying \(\Vert \theta \Vert \le \delta\) one has \(|d_\theta -d |\le \varepsilon\).

The constant-payoff property The constant-payoff property was first noted by Sorin, Venel and Vigeral (2010) in the framework of single decision-maker problems, and conjectured to hold in finite two-player zero-sum stochastic games (henceforth stochastic games). Their conjecture can be stated as follows.

Conjecture 1.1

(Sorin, Venel and Vigeral  (2010)) Let \((K,I,J,g,q,k,\theta )\) be a family of stochastic games indexed by a sequence of weights \(\theta\). Then, there exists a family of strategy profiles \((\sigma ^\theta ,\tau ^\theta )\) indexed by \(\theta\) so that the following two properties hold.

  • The strategy profile \((\sigma ^\theta ,\tau ^\theta )\) is asymptotically optimal,i.e. \(\sigma ^\theta\) guarantees that the \(\theta\)-weighted reward is at least \(v^k-\varepsilon _1(\theta )\), and \(\tau ^\theta\) guarantees that the \(\theta\)-weighted reward is at most \(v^k+\varepsilon _2(\theta )\), for some error functions satisfying \(\lim _{\Vert \theta \Vert \rightarrow 0}\varepsilon _1(\theta )=\lim _{\Vert \theta \Vert \rightarrow 0}\varepsilon _2(\theta )=0\).

  • The strategy profile \((\sigma ^\theta ,\tau ^\theta )\) induces a constant average reward throughout the game. Formally, for any family of integers \((M_\theta )\), indexed by \(\theta\), satisfying \(\lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=1}^{M_\theta }\theta _m>0\), one has

    $$\begin{aligned} \lim _{\Vert \theta \Vert \rightarrow 0} \frac{\mathbb E^k_{\sigma ^\theta ,\tau ^\theta }\left[ \sum _{m=1}^{M_\theta } \theta _m g(k_m,i_m,j_m)\right] }{\sum _{m=1}^{M_\theta } \theta _m}= v^k\,.\end{aligned}$$
    (1.1)

The constant-payoff property was established for discounted absorbing games by Sorin and Vigeral  (2020), that is for the well-known class of stochastic games introduced by Kohlberg (1974) where the state changes at most once. The constant-payoff property was established for discounted stochastic games by Oliu-Barton and Ziliotto  (2021). So far, there is no such result for general weighted-average stochastic games.

Main result In this paper we solve Conjecture 1.1 for two classes of general weighted-average stochastic games: the well-studied class of absorbing games, and the newly introduced class of smooth stochastic games. (A precise statement of our results is condensed in Theorem 2.7 below.) Before we introduce the latter, let us point out that when both players use stationary strategies (that is, ones that depend only on the current state) the state follows a Markov chain. Further, both players have optimal stationary strategies exists in \(\lambda\)-discounted stochastic games.

Absorbing games A stochastic game (KIJgqk) is absorbing if, for every \(\ell \ne k\) and every \((i,j)\in I\times J\) one has \(q(\ell \,|\, \ell ,i,j)=1\). Absorbing games are stochastic games with at most one transition. These games have been extensively studied since the term was coined by Kohlberg  (1974). Without loss of generality, we assume that states \(\ell \ne k\) are non-strategic, i.e. there exits \(g^\ell \in \mathbb R\) so that \(g(\ell ,i,j)=g^\ell\) for all \((i,j)\in I\times J\).

Smooth stochastic games A stochastic game (KIJgqk) is smooth if there exists a family of stationary strategy profiles \((x_\lambda ,y_\lambda )_{\lambda \in (0,1)}\), so that \((x_\lambda ,y_\lambda )\) is an optimal strategy profile in the \(\lambda\)-discounted case, and if the corresponding family of stochastic matrices \((Q_\lambda )_{\lambda \in (0,1)}\) satisfies \(\lim _{\lambda \rightarrow 0} \frac{Q_\lambda -{\text {Id}}}{\lambda }=A\), for some matrix \(A\in \mathbb R^{n\times n}\).

Comments

  • The Big Match is a prominent example of an absorbing game. Further, this game is also a smooth stochastic game. Indeed, for any discount rate \(\lambda \in (0,1)\), any stationary strategy \((x_\lambda ,y_\lambda )\) satisfying \(x_\lambda ^k(T)=\frac{\lambda }{1+\lambda }\) and \(y_\lambda ^k(L)=\frac{1}{2}\) is optimal in the \(\lambda\)-discounted game, and induces a Markov chain over \(\{k,0^*,1^*\}\) whose transition is given by

    $$\begin{aligned} Q_\lambda =\begin{pmatrix} \frac{1}{1+\lambda } &{} \frac{\lambda }{2(1+\lambda )}&{} \frac{\lambda }{2(1+\lambda )} \\ 0 &{} 1 &{} 0\\ 0 &{} 0 &{} 1 \end{pmatrix}. \end{aligned}$$

    Clearly, this matrix satisfies the condition

    $$\begin{aligned} \lim _{\lambda \rightarrow 0} \frac{Q_\lambda -{\text {Id}}}{\lambda }=A:=\begin{pmatrix} -1&{} \frac{1}{2}&{} \frac{1}{2} \\ 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 \end{pmatrix}. \end{aligned}$$
  • Albeit never introduced before, smooth stochastic games appear naturally in the asymptotic study of discounted stochastic games, as those where the players are restricted to use stationary strategies which are power series in \(\lambda\), i.e. \(x_\lambda =x_0+\sum _{m\ge 1}\lambda ^m \alpha _m\) and \(y_\lambda =y_0+\sum _{m\ge 1}\lambda ^m \beta _m\). By  Laraki (2010), both players have \(\varepsilon\)-optimal strategies of this form in absorbing games, for all \(\varepsilon >0\). This result can be easily extended to irreversible games, a broader class of stochastic games where states that are left can never be reached again. Further, smooth stochastic games are related to the continuous-time stochastic games of  Neyman (2017), where the state variable is a continuous-time Markov chain which is controlled by both players. Similarly, the state process of a smooth stochastic games converges to continuous-time Markov chain when both players play optimally, and as the weights tend to 0 (see Corollary 4.4 below).

Stochastic games

In the sequel \((K,I,J,g,q,k,\theta )\) denotes a fixed \(\theta\)-weighted average stochastic game with initial state k. In order to state our results formally, we start by recalling some definitions.

Strategies, payoff function and value

The sequence \((k_1,i_1,j_1,...,k_m,i_m,j_m,...)\) generated along the game is called a play. The set of plays is \((K\times I\times J)^{\mathbb N}\).

Definition 2.1

  • A strategy for a player specifies a mixed action to each possible set of past observations. Formally, a strategy for Player 1 is a collection of maps \(\sigma =(\sigma _m){m \ge 1}\), where \(\sigma _m:(K \times I \times J)^{m-1} \times K \rightarrow \Delta (I)\). Similarly, a strategy for Player 2 is a collection of maps \(\tau =(\tau _m)_{m \ge 1}\), where \(\tau _m:(K \times I \times J)^{m-1} \times K \rightarrow \Delta (J)\).

  • A stationary strategy is one that plays according to the current state only. Formally, a stationary strategy for Player 1 is a mapping \(x:K\rightarrow \Delta (I)\), and a stationary strategy for Player 2 is a mapping \(y:K\rightarrow \Delta (J)\). (Note the use of different letters for stationary strategies.)

Notation The sets of strategies for Players 1 and 2 are denoted by \(\Sigma\) and \(\mathcal {T}\), respectively, and the sets of stationary strategies by \(\Delta (I)^n\) and \(\Delta (J)^n\).

For any pair \((\sigma ,\tau ) \in \Sigma \times \mathcal {T}\) we denote by \(\mathbb P^{k}_{\sigma ,\tau }\) the unique probability measure on the set of plays \((K\times I\times J)^\mathbb N\) induced by \((\sigma ,\tau )\), \(k_1=k\) and q. Note that the dependence on the transition function q is omitted. This probability is well-defined by the Kolmogorov extension theorem, and the expectation with respect to the probability \(\mathbb P^k_{\sigma ,\tau }\) is denoted by \(\mathbb E^k_{\sigma ,\tau }\).

The payoff function For each \((\sigma ,\tau )\in \Sigma \times \mathcal {T}\) the payoff function is defined by

$$\begin{aligned}\gamma _\theta ^k(\sigma ,\tau ):=\mathbb E_{\sigma ,\tau }^k\left[ \sum \nolimits _{m\ge 1}\theta _m g(k_m,i_m,j_m)\right] \,. \end{aligned}$$

The value The game has a value, denoted by \(v^k_\theta\). That is

$$\begin{aligned} v_\theta ^k=\sup _{\sigma \in \Sigma }\inf _{\tau \in \mathcal {T}} \gamma _\theta ^k(\sigma ,\tau )= \inf _{\tau \in \mathcal {T}}\sup _{\sigma \in \Sigma } \gamma _\theta ^k(\sigma ,\tau )\,.\end{aligned}$$

This result can be attributed to  Shapley (1953).

Stationarity in the discounted case By stationarity, to every stationary strategy profile \((x,y)\in \Delta (I)^n\times \Delta (J)^n\) corresponds a Markov chain \((k_m)_{m\ge 1}\) with transition matrix \(Q_{xy}\in \mathbb R^{n\times n}\) and payoff vector \(g_{xy}\in \mathbb R^n\):

  • \(Q_{xy}(k,\ell ):=\sum _{(i,j)\in I\times J}x(k,i)y(k,j)q(\ell |k,i,j)\), for all \((k,\ell )\in K^2\).

  • \(g_{xy}(k):=\sum _{(i,j)\in I\times J}x(k,i)y(k,j)g(k,i,j)\), for all \(k\in K\).

For all \((k,\ell )\in K^2\) and \(m\ge 1\) the following equalities trivially hold:

$$\begin{aligned} \mathbb P^k_{x,y}(k_m=\ell )=Q_{xy}^{m-1}(k,\ell )\quad \text {and}\quad \mathbb E^k_{x,y}[g(k_m,i_m,j_m)]=(Q_{xy}^{m-1}g_{xy})(k)\,.\end{aligned}$$
(2.1)

Optimal and asymptotically optimal strategies

Definition 2.2

An optimal strategy of Player 1 is an element \(\sigma ^*\in \Sigma\) so that, for all \(\tau \in \mathcal {T}\),

$$\begin{aligned}\gamma _\theta ^k(\sigma ^*,\tau )\ge v_\theta ^k\,.\end{aligned}$$

An optimal strategy of Player 2 is defined in a similar way. That is, it is an element \(\tau ^* \in \Sigma\) so that, for all \(\sigma \in \Sigma\) one has \(\gamma _\theta ^k(\sigma ,\tau ^*)\le v_\theta ^k\).

Definition 2.3

An asymptotically optimal strategy for Player 1 is family of strategies \((\sigma _\theta )\) indexed by \(\theta\) so that for all \(\varepsilon >0\) there exists \(\delta >0\) so that for all sequences of weights \(\theta\) so that \(\Vert \theta \Vert \le \delta\),

$$\begin{aligned}\gamma _\theta ^k(\sigma _\theta ,\tau )\ge v_\theta ^k-\varepsilon \qquad \forall \tau \in \mathcal {T}\,. \end{aligned}$$

Asymptotically optimal strategies for Player 2 are defined in a symmetric way.

Puiseux strategies

A real map \(f:(a,b) \rightarrow \mathbb R\) is a Puiseux series if either \(f\equiv 0\) or there exists \(m_0\in \mathbb Z\), \(N\in \mathbb N\) and a real sequence \((a_m)_{m\ge m_0}\) so that \(a_{m_0}\ne 0\) and

$$\begin{aligned}f(\lambda )=\sum _{m\ge m_0} a_m \lambda ^{m/N}\qquad \forall \lambda \in (a,b). \end{aligned}$$

A function \(f:(0,1] \rightarrow \mathbb R\) admits a Puiseux expansion at 0 if there exists \(\lambda _0\) so that f is a Puiseux series on \((0,\lambda _0)\).

Definition 2.4

A Puiseux strategy profile is a family of stationary strategy profiles \((x_\lambda ,y_\lambda )_{\lambda \in (0,1]}\) so that for all \((k,i,j)\in K\times I\times J\) the mappings \(\lambda \mapsto x_\lambda (k,i)\) and \(\lambda \mapsto y_\lambda (k,j)\) admit Puiseux expansions near 0. An optimal Puiseux strategy profile is one where, in addition, \(x_\lambda\) and \(y_\lambda\) are optimal strategies for \(\lambda\) sufficiently small.

Fact Bewley and Kohlberg  (1976) proved that every finite stochastic game admits an optimal Puiseux strategy profile.

A concrete family of asymptotically optimal strategies

Let \((x_\lambda ,y_\lambda )\) be a fixed optimal Puiseux strategy profile of the game. For each sequence of nonnegative weights \(\theta =(\theta _m)_{m\in \mathbb N}\), and all \(m\in \mathbb N\), define the pair of strategies \((\sigma ^\theta ,\tau ^\theta )\) by setting

$$\begin{aligned} (\sigma ^\theta _m,\tau ^\theta _m):=(x_{\lambda ^\theta _m},y_{\lambda ^\theta _m}),\quad \text {where} \ \lambda ^\theta _m=\frac{\theta _m}{\sum _{m' \ge m} \ \theta _{m'}}\,.\end{aligned}$$
(2.2)

The family \((\sigma ^\theta ,\tau ^\theta )\), indexed by any possible sequence of weights \(\theta\), is asymptotically optimal by Ziliotto  (2016). It is thus a good candidate for tackling the constant-payoff conjecture. In the sequel, the superscript \((\sigma ^\theta ,\tau ^\theta )\) denotes this concrete family of strategy profiles, while the subscript \((\sigma _\theta ,\tau _\theta )\) denotes any family of strategy profiles.

Remark 2.5

If \(\theta\) is a \(\lambda\)-discounted sequence of weights, i.e. \(\theta _m=\lambda (1-\lambda )^{m-1}\) for all \(m\in \mathbb N\) for some \(\lambda \in (0,1)\), then \(\lambda ^\theta _m=\lambda\) for all \(m\in \mathbb N\). In this case, \((\sigma ^\theta _m,\tau ^\theta _m)=(x_\lambda ,y_\lambda )\) for all \(m\in \mathbb N\). This property makes the discounted case central in the study of all weights.

Notation If \(\theta\) is the \(\lambda\)-discounted sequence of weights for some \(\lambda \in (0,1)\) we denote the clock function by \(\varphi (\lambda ,t)\) and the cumulated payoff by \(\gamma _\lambda ^k(\sigma ,\tau ;t)\), for all \(t\in [0,1]\).

The cumulated payoffs

For each sequence of nonnegative weights \(\theta\) so that \(\sum _{m\ge 1}\theta _m=1\) we introduce the clock function \(\varphi (\theta ,\, \cdot \,):[0,1]\rightarrow \mathbb N\cup \{+\infty \}\) by setting

$$\begin{aligned}\varphi (\theta ,t):=\inf \{M\ge 1,\ \sum \nolimits _{m=1}^M \theta _m\ge t\}\qquad \forall t \in [0,1]\,. \end{aligned}$$

Note that \(\varphi (\theta ,0)=1\) for all \(\theta\) and \(\varphi (\theta ,1)=\sup \{m\ge 1, \ \theta _m>0\}\). Hence, \(\varphi (\theta ,1)=+\infty\) if and only if \(\theta\) has infinite support. We now introduce the cumulated payoff at time t. For any pair of strategies \((\sigma ,\tau )\in \Sigma \times \mathcal {T}\),

$$\begin{aligned}\gamma _\theta ^k(\sigma ,\tau ;t):=\mathbb E_{\sigma ,\tau }^k\left[ \sum \nolimits _{m= 1}^{\varphi (\theta ,t)} \theta _m g(k_m,i_m,j_m)\right] \,. \end{aligned}$$

The case \(t=1\) corresponds to the expectation of the (total) \(\theta\)-weighted average of the stage rewards. For simplicity, we use the notation \(\gamma _\theta ^k(\sigma ,\tau )\) in this case.

Discounted case If \(\theta\) is the \(\lambda\)-discounted sequence of weights, and (xy) is a stationary strategy profile, then using the expressions in (2.1) yields the following expression:

$$\begin{aligned}\gamma _\lambda ^k(x,y)=\sum \nolimits _{m\ge 1} \lambda (1-\lambda )^{m-1} (Q_{xy}^{m-1}g_{xy})(k)\,. \end{aligned}$$

smooth

Remark 2.6

The term “time t” is an artificial notion, that should not be confused with the stage number. It refers to the fraction of the game that has been played. For instance, for a game with T stages (each of which has weight \(\frac{1}{T}\)), the cumulated payoff at time t is the reward obtained in stages \(1, 2,\dots , \lceil t T\rceil\).

Main result

Theorem 2.7

Any stochastic game (KIJgqk) that is either absorbing or smooth satisfies the constant-payoff property. More precisely, for these games the family of asymptotically optimal strategies defined in (2.2) satisfies

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)=t v^k \qquad \forall t\in [0,1]\,.\end{aligned}$$

This result is established separately for absorbing and smooth stochastic games. The proof proceeds as follows.

  • First, the constant-payoff property holds in discounted case. That is, there exists an optimal Puiseux strategy profile \((x_\lambda ,y_\lambda )\) so that

    $$\begin{aligned}\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)=t v^k\qquad \forall t\in [0,1]\,. \end{aligned}$$
  • Second, for the strategy profile \((\sigma ^\theta ,\tau ^\theta )\), the limit cumulated payoff \(\lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)\) exists for all \(t\in [0,1]\). Notably, this implies

    $$\begin{aligned} \lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)=\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)\qquad \forall t\in [0,1], \end{aligned}$$

    as the discounted weights \((\lambda (1-\lambda )^{m-1})_m\) are a particular family of weights.

The first step has recently been extended to all stochastic games by Oliu-Barton and Ziliotto  (2021). For the sake of completeness, we have preferred to prove this result here. Our approach is particularly suited for absorbing and smooth stochastic games. The second step was never achieved before, and its extension to all stochastic games is the next challenge. To do so, however, several difficulties need to be overcome (see the discussion at the end of Sect. 4).

Outline of the paper

We start by considering the class of absorbing games in Sect. 3. First, we establish the constant-payoff property in the discounted case (Sect. 3.1), and then extend it to the case of a general sequence of weights (Sect. 3.2). Section 4 is devoted to the class of smooth stochastic games: the discounted case in treated in Sect. 4.1, and the general case in Sect. 4.2.

The constant-payoff property in absorbing games

Throughout this section, we consider a fixed an optimal Puiseux strategy profile \((x_\lambda ,y_\lambda )\) of the absorbing game (KIJgqk). We denote the corresponding family of stochastic matrices by \((Q_\lambda )\), and \(g_0:=\lim _{\lambda \rightarrow 0}g(\,\cdot \,,x_\lambda ,y_\lambda )\in \mathbb R^n\).

The discounted case

We start by proving the following result.

Proposition 3.1

Suppose that the stochastic game (KIJgqk) is absorbing. Then,

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)=t v^k\qquad \forall t\in [0,1]\,. \end{aligned}$$

This result can be found in Sorin and Vigeral  (2019), and is also a particular case of Oliu-Barton and Ziliotto  (2021). For this reason, and also because of its interest, we provide below a new proof which is specific to the class of absorbing games.

We start with an elementary probabilistic result, and an immediate consequence.

Lemma 3.2

For each \(\lambda \in (0,1]\), let \((X^\lambda _m)_{m\ge 1}\) be a Markov chain with transition matrix \(Q_\lambda \in \mathbb R^{n\times n}\). Suppose that \(\lambda \mapsto Q_\lambda (k,\ell )\) admits a Puiseux expansion at 0 for all \(\ell \ne k\), so in particular the limits \(p_{k\ell }:=\lim _{\lambda \rightarrow 0} \frac{Q_\lambda (k,\ell )}{\sum _{\ell '\ne k}Q_\lambda (k,\ell ')}\in [0,1]\) exist (with the convention \(\frac{0}{0}=1\)). Further, suppose that \(Q_{\lambda }(\ell ,\ell )=1\) for all \(\ell \ne k\) and all \(\lambda\) sufficiently small (so all states other than k are absorbing states). Then, there exist \(c\ge 0\) and \(e\ge 0\) so that \(\sum _{\ell \ne k} Q_\lambda (k,\ell )=c\lambda ^e+o(\lambda ^e)\) as \(\lambda \rightarrow 0\), and for all \(t\in (0,1]\)

$$\begin{aligned} \lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}=\ell \,|\, X^\lambda _1=k)={\left\{ \begin{array}{ll}p_{k\ell } &{} \text { if } c>0 \text { and } e<1\\ (1-(1-t)^c) p_{k\ell } &{} \text { if } c>0 \text { and } e=1\\ 0 &{} \text { if } c=0 \text { or } e>1\,.\end{array}\right. } \end{aligned}$$

Proof

By assumption, there exists \(\lambda _0>0\) so that the map \(\lambda \mapsto \sum _{\ell \ne k} Q_\lambda (k,\ell )\) is a Puiseux series on \((0,\lambda _0)\). By boundedness, this map can be expressed as \(\sum _{m\ge 0} a_m \lambda ^{e_m}\), where \((a_m)\) is a real sequence and \((e_m)\) is a strictly increasing nonnegative sequence. Let \(m_0:=\inf \{m\ge 0, a_m\ne 0\}\). Then, either \(m_0=\infty\), in which case \(\sum _{\ell \ne k} Q_\lambda (k,\ell )\) is equal to 0 on \((0,\lambda _0)\), or \(m_0<+\infty\) and \(a_{m_0} \lambda ^{e_{m_0}}\) is the leading term of the Puiseux series on \((0,\lambda _0)\). Note that \(a_{m_0}>0\) in this case, since \(a_{m_0}<0\) would yield a negative expression for \(\sum _{m\ge 0} a_m \lambda ^{e_m}=\sum _{\ell \ne k} Q_\lambda (k,\ell )\) as \(\lambda\) goes to 0. Hence,

$$\begin{aligned}(c,e):={\left\{ \begin{array}{ll} (a_{m_0}, e_{m_0}) &{} \text { if }m_0<+\infty \\ (0,0) &{} \text { if }m_0=+\infty \end{array}\right. } \end{aligned}$$

satisfies \((c,e)\ge 0\) and \(\sum _{\ell \ne k} Q_\lambda (k,\ell )=c\lambda ^e+o(\lambda ^e)\) as \(\lambda \rightarrow 0\). All states except k being absorbing, one can easily compute \(\lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}=k\,|\, X^\lambda _1=k)\). First, \(Q_\lambda (k,k)=1- c\lambda ^e+o(\lambda ^e)\) as \(\lambda \rightarrow 0\) by complementarity. Second, \(\lim _{\lambda \rightarrow 0} \lambda \varphi (\lambda ,t)=-\ln (1-t)\) for all \(t\in [0,1)\). Hence,

$$\begin{aligned} \begin{aligned} \lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}=k\,|\, X^\lambda _1=k)= & {} \lim _{\lambda \rightarrow 0} \left( 1- c\lambda ^e+o(\lambda ^e)\right) ^{\varphi (\lambda ,t)-1}\\= & {} {\left\{ \begin{array}{ll}0 &{} \text { if } c>0 \text { and } e<1\\ (1-t)^c &{} \text { if } c>0 \text { and } e=1\\ 1 &{} \text { if } c=0 \text { or } e>1\,.\end{array}\right. } \end{aligned} \end{aligned}$$
(3.1)

On the other hand, the equality \(\mathbb P(X^\lambda _{m}=\ell \,|\, X^\lambda _{m}\ne k,\ X^\lambda _1=k)= \frac{Q_\lambda (k,\ell )}{\sum _{\ell '\ne k}Q_\lambda (k,\ell ')}\) holds for all \(m\ge 2\), and in particular for \(m=\varphi (\lambda ,t)\) when \(\lambda\) is small and \(t>0\). Thus, \(\lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}=\ell \,|\, X^\lambda _{\varphi (\lambda ,t)}\ne k,\, X^\lambda _1=k)=p_{k\ell }\). The desired result follows directly from (3.1), since

$$\begin{aligned} \lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}=\ell \,|\, X^\lambda _1=k)=\, & {} \lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}\ne k\,|\, X^\lambda _1=k) \mathbb P(X^\lambda _{\varphi (\lambda ,t)}\\=\, & {} \ell \,|\, X^\lambda _{\varphi (\lambda ,t)}\ne k,\, X^\lambda _1=k)\\= \,& {} \left( 1- \lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}=k\,|\, X^\lambda _1=k)\right) p_{k\ell }\,. \end{aligned}$$

\(\square\)

The following consequence is straightforward.

Corollary 3.3

Let \((X^\lambda _m)_{m\ge 1}\) be a family of Markov chains with transition matrices \((Q_\lambda )\) satisfying the assumptions of Lemma 3.2, and let (ce) and \((p_{k\ell })_{\ell \ne k}\) be the same as therein. Then, for all \(t\in [0,1]\) and \(\ell \ne k\),

$$\begin{aligned}\lim _{\lambda \rightarrow 0} \sum \nolimits _{m=1}^{\varphi (\lambda ,t)} \lambda (1-\lambda )^{m-1}Q_\lambda ^{m-1}(k,\ell )= {\left\{ \begin{array}{ll} t p_{k\ell } &{} \text { if } c>0 \text { and } e<1\\ (t-\alpha (c,t))p_{k\ell } &{} \text { if } c>0 \text { and } e=1\\ 0 &{} \text { if } c=0 \text { or } e>1\,.\end{array}\right. } \end{aligned}$$

where \(\alpha (c,t)= \frac{1-(1-t)^{1+c}}{1+c} \in [0,t]\).

Proof

It follows directly from Lemma 3.2 and the fact that all states \(\ell \ne k\) are absorbing. Let us explain how \(\alpha (c,t)\), the expected time spent in k in before time t when \(c>0\) and \(e=1\), is obtained. On the one hand, the state k has not been left at time t with probability \((1-t)^c\). On the other, the state k is left between time s and \(s+ds\) with probability \(c(1-s)^{c-1}ds\). These two probabilities follow from Lemma 3.2. Hence, the expected time in state k before time t is given by

$$\begin{aligned}\alpha (c,t)=\int _{0}^t s c ( 1-s)^{c-1} ds+ t (1-t)^c =\frac{1-(1-t)^{1+c}}{1+c}\,. \end{aligned}$$

\(\square\)

Proof of Proposition 3.1

Because the game is absorbing with initial state k, the family of stochastic matrices \((Q_\lambda )\) satisfies the assumptions of Lemma 3.2, so let (ce), and \((p_{k,\ell })_{\ell \ne k}\) be as therein. That is, \(\sum _{\ell \ne k}Q_\lambda (k,\ell )=c\lambda ^e+o(\lambda ^e)\) as \(\lambda \rightarrow 0\), and \(\lim _{\lambda \rightarrow 0}\frac{Q_\lambda (k,\ell )}{\sum _{\ell '\ne k}Q_\lambda (k,\ell ')}=p_{k\ell }\).

We distinguish two cases, depending on whether \(c>0\) and \(0<e<1\) hold or not.

Case 1 \(c>0\) and \(0\le e<1\). In this case, absorption occurs very soon in the game by Lemma 3.2. The constant-payoff property trivially holds in this case. On the one hand, \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;\,1)=v^k\) by the optimality of (xy). On the other, for any \(t\in (0,1]\) one has \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;t)= t \sum \nolimits _{\ell \ne k} p_{k\ell } g_0^\ell\) by Corollary 3.3. Hence, \(v^k=\sum \nolimits _{\ell \ne k} p_{k\ell } g_0^\ell\) and \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;t)= tv^k\) for all \(t\in [0,1]\).

Case 2 \(c=0\) or \(e>1\). Similarly, in this case absorption occurs too late in the game (or never) by Lemma 3.2, and again the constant-payoff property trivially holds. On the one hand, \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;1)=v^k\) by the optimality of (xy). On the other, for any \(t\in [0,1]\) one has \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;t)= t g_0^k\) by Corollary 3.3. Hence, \(v^k=g_0^k\), and \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;t)= t v^k\) for all \(t\in [0,1]\).

Case 3 \(c>0\) and \(e=1\). This is the critical case: the total probability of absorption at every stage is \(c\lambda +o(\lambda )\), so that the game absorbs before time \(t\in (0,1)\) with probability \(1-(1-t)^c>0\) by Lemma 3.2. To establish the constant-payoff property, it is enough to prove that the equality \(v^k=g_0^k= \sum \nolimits _{\ell \ne k} p_{k\ell } g_0^\ell\) holds. Indeed, in this case, both the payoff prior to absorption and the payoff after absorption are equal to \(v^k\) in expectation.

We introduce some notation in order to partition the actions sets in a convenient manner.

For all \((i,j)\in I\times J\), let \((c(i),e(i),c'(j), e'(j))\in \mathbb R^4_+\) so that \(x_\lambda ^k(i)=c(i)\lambda ^{e(i)}+o(\lambda ^{e(i)})\) and \(y_\lambda ^k(j)=c'(j)\lambda ^{e'(j)}+o(\lambda ^{e'(j)})\). Suppose, without loss of generality, that \(c(i)=0\) implies \(e(i)=0\), and similarly \(c(j)=0\) implies \(e(j)=0\). Partition the actions sets I and J in four (possibly empty) sets as follows:

$$\begin{aligned}{\left\{ \begin{array}{ll} I_0=\{i\in I\,|\, e(i)=0\}\\ I_*=\{i\in I\,|\, e(i)\in (0,1)\}\\ I_1=\{i\in I\,|\, e(i)=1\}\\ I_+=\{i\in I\,|\, e(i)>1\}\,. \end{array}\right. } \quad {\left\{ \begin{array}{ll} J_0=\{j\in J\,|\, e'(j)=0\}\\ J_*=\{j\in J\,|\, e'(j)\in (0,1)\}\\ J_1=\{j\in J\,|\, e'(j)=1\}\\ J_+=\{j\in J\,|\, e'(j)>1\}\,.\end{array}\right. } \end{aligned}$$

For each \(\ell \ne k\), define the transition rate from k to \(\ell\) by setting \(A^{\ell }:=A^{\ell }_{01}+ A^{\ell }_*+ A^{\ell }_{10}\) where

$$\begin{aligned} A^{\ell }_{01}:= & {} \sum _{(i,j)\in I_0\times J_1} c(i)c(j),\\ A^{\ell }_*:= & {} \sum _{(i,j)\in I_*\times J_*, \, e(i)+e(j)=1} c(i)c(j),\\ A^{\ell }_{10}:= & {} \sum _{(i,j)\in I_1\times J_0} c(i)c(j)\,. \end{aligned}$$

Let \(A,A_{01}, A_*,A_{10}\) denote the corresponding vectors in \(\mathbb R^{n-1}\). Note that, by definition, these are nonnegative vectors. Also, \(\sum _{\ell \ne k} A^\ell =c>0\) and \(\frac{A^{\ell }}{\sum \nolimits _{\ell '\ne k}A^{\ell '}}=p_{k\ell }\) for all \(\ell \ne k\). The situation can be pictured as follows.

figureb

The actions in \(I_0\times J_0\) determine the payoff \(g_0^k\), while the shaded areas correspond to the pairs of actions which determine the transitions from k to the set of absorbing states \(K\backslash \{k\}\). The actions in \(I_+\) and \(J_+\) are irrelevant because, on the one hand, they do not affect the limit payoff \(g_0^k\); on the other, they induce lower order probability transitions so that the probability that an absorption occurs when an action in \(I_+\) or \(J_+\) was played goes to 0 as \(\lambda\) goes to 0. The limit payoff thus satisfies (see also Corollary 3.3) the following relation:

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda )= \frac{g^k_0+\sum _{\ell \ne k} A^{\ell }g_0^\ell }{1+\sum \nolimits _{\ell '\ne k} A^{\ell '}}\,.\end{aligned}$$
(3.2)

Case 3a Suppose by contradiction that \(g^k_0>v^k\) and \(\sum _{\ell \ne k}A^{\ell }_{01}>0\). In this case, Player 2 can deviate from \(y_\lambda\) to a strategy \(\widetilde{y}_\lambda\) which changes the probabilities of playing actions in \(J_1\) to \(c'(j)\lambda ^{1-\varepsilon }\) for a sufficiently small \(\varepsilon\) (say, smaller than all nonzero e(i) and \(e'(j)\)). By doing so, the probability that the state k is left before stage \(t/\lambda\) goes to 1 for any \(t>0\). Consequently,

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,\widetilde{y}_\lambda )=\sum _{\ell \ne k} g_0^\ell \frac{A_{01}^{\ell }}{\sum \nolimits _{\ell '\ne k}A_{01}^{\ell '}}\,.\end{aligned}$$
(3.3)

On the other hand, if Player 1 deviates from \(x_\lambda\) to a strategy \(\widetilde{x}_\lambda\) which plays actions outside \(I_0\) to 0, then the transition from k to the set of absorbing states depends only on \(A_{01}\), and one has

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(\widetilde{x}_\lambda ,y_\lambda )=\frac{g_0^k+\sum _{\ell \ne k} A_{01}^{\ell } g_0^\ell }{1+\sum \nolimits _{\ell \ne k}A^{\ell }_{01}}\,.\end{aligned}$$
(3.4)

Yet, the optimality of \((x_\lambda )\) and \((y_\lambda )\) implies that

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(\widetilde{x}_\lambda ,y_\lambda )\le v^k \le \lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,\widetilde{y}_\lambda )\,.\end{aligned}$$
(3.5)

The relations (3.3), (3.4) and (3.5) are not compatible with \(g_0^k>v^k\), a contradiction.

Case 2b Suppose by contradiction that \(g^k_0>v^k\) and \(\sum \nolimits _{\ell \ne k}A^{\ell }_{01}=0\). In this case, for the strategy \((\widetilde{x}_\lambda )\) described in the previous case, one has

$$\begin{aligned}\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(\widetilde{x}_\lambda ,y_\lambda )=g_0^k\,. \end{aligned}$$

This contradicts the optimality of \((y_\lambda )\).

Together, cases 3a and 3b imply \(g_0^k\le v^k\). Similarly, reversing the roles of the players one obtains \(g_0^k\ge v^k\). Together with (3.2), the equality \(v^k=g_0^k\) implies \(v^k= \sum \nolimits _{\ell \ne k} p_{k\ell } g_0^\ell\) as well. The result follows from Corollary 3.3, since there exists \(\alpha _t:=\alpha (c,t)\in [0,t]\) so that

$$\begin{aligned}\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)= \alpha _t g_0^k+(t-\alpha _t) \sum \nolimits _{\ell \ne k} p_{k\ell }g_0^\ell \,. \end{aligned}$$

\(\square\)

The general case

Proposition 3.4

Suppose that (KIJgqk) is absorbing. Let \((\sigma ^\theta ,\tau ^\theta )\) be the family of asymptotically optimal strategies defined in (2.2). Then, for all \(t\in [0,1]\), \(\lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)\) exists.

We start by two technical lemmas.

Lemma 3.5

Let \(0\le t<1\) and \(e\ge 0\). Then, as \(h>0\) tends to 0:

$$\begin{aligned} \lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)} (\lambda _m^\theta )^e = {\left\{ \begin{array}{ll} +\infty &{} if e<1\\ \frac{h}{1-t}+o(h)&{} if e=1\\ 0 &{} if e>1\,. \end{array}\right. } \end{aligned}$$

Proof

For each \(m\ge 1\) set \(t^\theta _m:=\sum _{r=1}^{m-1}\theta _m\) so that \(\lambda ^\theta _m= \frac{\theta _{m+1}}{1-t^\theta _m}\). Then, for any m between \(\varphi (\theta ,t)\) and \(\varphi (\theta ,t+h)\) one has \(t\le t^\theta _m\le t+h\), so that

$$\begin{aligned} \left( \frac{\theta _{m+1}}{1-t}\right) ^e\ \le \left( \frac{\theta _{m+1}}{1-t^\theta _m}\right) ^e \le \left( \frac{\theta _{m+1}}{1-t-h}\right) ^e\,.\end{aligned}$$
(3.6)

We distinguish three cases, depending on whether \(e=1\), \(e>1\) or \(e<1\).

Case \(e=1\). Note that \(\lim _{\Vert \theta \Vert \rightarrow 0}\sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)}\theta _{m+1}= h\). Adding the inequalities of (3.6) for all \(\varphi (\theta ,t)\le m\le \varphi (\theta ,t+h)\) and then taking \(\Vert \theta \Vert\) to 0, one obtains

$$\begin{aligned}\frac{h}{1-t} \le \lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)}\lambda ^\theta _m \le \displaystyle \frac{h}{1-t-h}\,. \end{aligned}$$

Hence, \(\lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)}\lambda ^\theta _m=\displaystyle \frac{h}{1-t}+o(h)\) as \(h>0\) tends to 0.

Case \(e<1\). In this case \(\theta ^e_{m+1}\ge \Vert \theta \Vert ^{e-1}\theta _{m+1}\). From (3.6) one derives

$$\begin{aligned} \sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)} (\lambda ^\theta _m)^e\ge & {} \Vert \theta \Vert ^{e-1}\sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)}\theta _{m+1}\,. \end{aligned}$$
(3.7)

The result follows, since \(\lim _{\Vert \theta \Vert \rightarrow 0} \Vert \theta \Vert ^{e-1}=+\infty\) and \(\lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)}\theta _{m+1}=h>0\).

Case \(e>1\). In this case \(\theta ^e_{m+1}\le \Vert \theta \Vert ^{e-1}\theta _{m+1}\), and the result follows from (3.6) like the previous case, since \(\lim _{\Vert \theta \Vert \rightarrow 0} \Vert \theta \Vert ^{e-1}=0\) in this case. \(\square\)

The following result follows from the convergence of Riemann sums to integrals, and its proof is omitted.

Lemma 3.6

Let \((a^\theta _m)_{m\ge 1}\) be a family of sequences in [0, 1] so that

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0}\sup _{s\in [0,1]} | a^\theta _{\varphi (s,\theta )}-f(s)|=0 \end{aligned}$$

for some function \(f:[0,1]\rightarrow [0,1]\). Then,

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=1}^{\varphi (\theta ,t)} \theta _m a^\theta _m=\int _0^t f(s)ds\,. \end{aligned}$$

We are now ready to prove Proposition 3.4.

Proof of Proposition 3.4

For each \(m\ge 1\), let \(Q_m^\theta \in \mathbb R^{n\times n}\) be the transition matrix induced by \((\sigma ^\theta ,\tau ^\theta )\) at stage m. By the definition of these strategies, there exist pairs (ce) and \((c_\ell ,e_\ell )_{\ell \ne k}\) of nonnegative numbers so that

$$\begin{aligned} Q_m^\theta (k,\ell )={\left\{ \begin{array}{ll} c_\ell (\lambda ^\theta _m)^{e_\ell }+o ((\lambda ^\theta _m)^{e_\ell })&{} \text { if } \ell \ne k\\ 1- c (\lambda ^\theta _m)^{e }+o((\lambda ^\theta _m)^{e })&{} \text { if } \ell =k\,.\end{array}\right. } \end{aligned}$$

Moreover one can assume without loss of generality that \(e_\ell =+\infty\) whenever \(c_\ell =0\) for all \(\ell \ne k\), so that \(e_\ell \ge e\) for all \(\ell \ne k\). Then, for all \(t\in (0,1)\), the probability of being at k after \(\varphi (\theta ,t)\) stages is given by

$$\begin{aligned} \mathbb P_{\sigma ^\theta ,\tau ^\theta }^k\left( k_{\varphi (\theta ,t)}=k\right) = \prod _{m=1}^{\varphi (\theta ,t)-1}Q_m^\theta (k,k)= \prod _{m=1}^{\varphi (\theta ,t)-1}\left( 1-c (\lambda ^\theta _m)^{e}+ o((\lambda ^\theta _m)^{e })\right) \,. \end{aligned}$$

Taking \(\Vert \theta \Vert\) to 0, and using Lemma 3.5, one obtains

$$\begin{aligned}&p_t(k):=\lim _{\Vert \theta \Vert \rightarrow 0}\mathbb P_{\sigma ^\theta ,\tau ^\theta }^k\left( k_{\varphi (\theta ,t)}=k\right) =\lim _{\Vert \theta \Vert \rightarrow 0} \exp \left( -c \sum _{m=1}^{\varphi (\theta ,t)} (\lambda ^\theta _m)^{e} \right) \nonumber \\&= {\left\{ \begin{array}{ll} 0 &{} \text { if } e<1\\ (1-t)^c &{} \text { if } e=1\\ 1 &{} \text { if } e>1 \end{array}\right. }\qquad \forall t\in (0,1)\,. \end{aligned}$$
(3.8)

That is, the limit exists. Similarly, conditional on reaching an absorbing at stage \(m+1\), the probability that \(k_{m+1}=\ell \ne k\) is given by

$$\begin{aligned} \mathbb P_{\sigma ^\theta ,\tau ^\theta }^k\left( k_{m+1}=\ell \, | \, k_{m}= k\right) = \frac{Q_m^\theta (k,\ell )}{\sum _{\ell '\ne k} Q_m^\theta (k,\ell ')}= \frac{c_\ell (\lambda ^\theta _m)^{e_\ell }+ o ((\lambda ^\theta _m)^{e_\ell })}{c(\lambda ^\theta _m)^{e}+o((\lambda ^\theta _m)^{e})}\qquad \forall \ell \ne k\,.\end{aligned}$$

Therefore, the following limits exist and do not depend on the family of vanishing weights:

$$\begin{aligned} \lim _{\Vert \theta \Vert \rightarrow 0}\mathbb P_{\sigma ^\theta ,\tau ^\theta }^k\left( k_{m+1}=\ell \, | \, k_{m}=k\right) =\frac{c_\ell }{c}\mathbb {1}_{\{e_\ell =e\}}\qquad \ell \ne k\,.\end{aligned}$$
(3.9)

Together, (3.8) and (3.9) imply the existence of the following limits

$$\begin{aligned} p_t(\ell ):=\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P_{\sigma ^\theta ,\tau ^\theta }^k\left( k_{\varphi (\theta ,t)}=\ell \right) = \left( 1-p_t(k)\right) \frac{c_\ell }{c}\mathbb {1}_{\{e_\ell =e\}}\qquad \forall \ell \ne k,\ \forall t\in (0,1)\,.\end{aligned}$$
(3.10)

Finally, \(\lim _{\Vert \theta \Vert \rightarrow 0} g(k,\sigma ^\theta ,\tau ^\theta )=g^k_0\in \mathbb R\) exists. Using Lemma 3.6, a similar computation than that of Corollary 3.3 gives then

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0} \gamma ^k_\theta (\sigma ^\theta ,\tau ^\theta ; t)=\alpha _t g_0^k + (t-\alpha _t)\sum \nolimits _{\ell \ne k} \frac{c_\ell }{c}\mathbb {1}_{\{e_\ell =e\}} g_0^\ell \qquad \forall t\in [0,1], \end{aligned}$$

where \(\alpha _t:= \int _{0}^t s p_s(k)ds +t p_t(k)\). As none of the quantities involved in this expression depend on the family of vanishing weights, the proof in complete. \(\square\)

NB Together, Propositions 3.1 and 3.4 establish Theorem 2.7 for absorbing games, that is:

$$\begin{aligned}\lim _{\lambda \rightarrow 0}\gamma ^k_\lambda (x_\lambda ,y_\lambda ;t)= t v^k =\lim _{\Vert \theta \Vert \rightarrow 0} \gamma ^k_\theta (\sigma ^\theta ,\tau ^\theta ; t), \qquad \forall t\in [0,1]\,. \end{aligned}$$

To complete the proof of Theorem 2.7, we need to tackle smooth stochastic games now.

The constant-payoff property in smooth stochastic games

In this section we prove that the constant-payoff property holds for smooth stochastic games. That is, when there exists a family of optimal stationary strategy profiles \((x_\lambda ,y_\lambda )\) for the discounted game, so that the corresponding family of stochastic matrices \((Q_\lambda )\) satisfies \(\lim _{\lambda \rightarrow 0} \frac{Q_\lambda -{\text {Id}}}{\lambda }=A\) for some real matrix \(A\in \mathbb R^{n\times n}\).

The discounted case

Consider a fixed optimal Puiseux strategy profile \((x_\lambda ,y_\lambda )\), and suppose that \(Q_\lambda\) satisfies \(\lim _{\lambda \rightarrow 0} \frac{Q_\lambda -{\text {Id}}}{\lambda }=A\) for some real matrix \(A\in \mathbb R^{n\times n}\). Let \(g_0:=\lim _{\lambda \rightarrow 0}g(\,\cdot \,,x_\lambda ,y_\lambda )\in \mathbb R^n\) be the corresponding limit payoff vector.

We start by the following result.

Proposition 4.1

Suppose that the stochastic game (KIJgqk) is smooth. Then,

$$\begin{aligned}\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)=t v^k\qquad \forall t\in [0,1]\,. \end{aligned}$$

Proof

The proof is similar to the one for the class of absorbing games. By assumption, \(Q_\lambda ={\text {Id}}+ A\lambda + o(\lambda )\). Recall also that \(\lim _{\lambda \rightarrow 0} \lambda \varphi (\lambda ,t)=-\ln (1-t)\) holds for all \(t\in (0,1)\). Consequently,

$$\begin{aligned}\lim _{\lambda \rightarrow 0}Q_\lambda ^{\varphi (\lambda ,t)}=e^{-\ln (1-t)A}\qquad \forall t\in (0,1)\,. \end{aligned}$$

Thus, by Lemma 3.6, \(\lim _{\lambda \rightarrow 0} \sum \nolimits _{m= 1}^{\varphi (\lambda ,t)} \lambda (1-\lambda )^{m-1} Q_\lambda ^{m-1}=\int _{0}^t e^{-\ln (1-s)A}ds\) for all \(t\in [0,1]\). Equivalently, we obtain

$$\begin{aligned}f(t):=\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)=\int _{0}^t (e^{-\ln (1-s)A} g_0)(k)ds\qquad \forall t\in [0,1]\,. \end{aligned}$$

The exact same proof as Case 3 of Proposition 3.1 can be used to establish that \(Ag_0=0\). That is, first partition the players’ actions sets I and J into \(I_0,I_*,I_1,I_+\) and \(J_0,J_*,J_1,J_+\) respectively and define matrices \(A_{01},A_*,A_{10}\in \mathbb R^{n\times n}\) so that \(A=A_{01}+A_*+A_{10}\). Second, use the deviations used therein to prove that one has

$$\begin{aligned}v^k= \frac{\sum _{\ell \ne k} A(k,\ell ) v^\ell }{\sum _{\ell \ne k} A(k,\ell )}\quad \text {and }\quad v^k=g_0^k\qquad \forall k\in K\,. \end{aligned}$$

These equalities imply \(Ag_0=0\). But then, \(f''(t)=\frac{1}{1-t}e^{-\ln (1-t)A} A g_0=0\). Hence, \(f'(t)\) is constant. As \(f(0)=0\) and \(f(1)=v^k\), this implies that \(f(t)=t v^k\) for all t.\(\square\)

The general case

We now establish Theorem 2.7 for smooth stochastic games. By Proposition 4.1, it is sufficient to prove the following result.

Proposition 4.2

Suppose that (KIJgqk) is smooth. Let \((\sigma ^\theta ,\tau ^\theta )\) be the family of asymptotically optimal strategies defined in (2.2). Then, for all \(t\in [0,1]\), \(\lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)\) exists.

The idea is to introduce a family of continuous-time stochastic processes indexed by \(\theta\), and prove that these processes converge as \(\Vert \theta \Vert\) goes to 0 (along any possible sequence of vanishing weights).

Let \((Y^{k,\theta }_m)_{m\ge 1}\) be the random process of states \((k_m)\) under the law \(\mathbb P^k_{\sigma ^\theta ,\tau ^\theta }\), which is a inhomogeneous Markov chain with transition matrices \((Q^\theta _m)\). Define the piece-wise constant process \((X^{k,\theta }_t)_{t\in [0,1)}\) with values in K as follows:

$$\begin{aligned}X^{k,\theta }_t:= Y^{k,\theta }_m\qquad \forall t\in [t_m^\theta , t_{m+1}^\theta )\,.\end{aligned}$$

The process \((X^{k,\theta }_t)_{t\in [0,1)}\) is clearly càdlàg. Note also that \(X^{k,\theta }_t=Y^{k,\theta }_1=k\) for all \(t\in [0,\theta _1)\).

In the sequel, we will use the following notation.

  • For any \(t,h\ge 0\) so that \(0\le t\le t+h< 1\), let \(J^\theta _{[t,t+h)}\) be the number of jumps (i.e. state changes) of the process \((Y^{k,\theta }_m)_{m\ge 1}\) between stage \(\varphi (\theta ,t)\) and stage \(\varphi (\theta ,t+h)\).

  • For any \(0\le t\le t+h<1\) define the matrix

    $$\begin{aligned}P_{t,t+h}^\theta :=\prod _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)-1}Q_m^\theta \in \mathbb R^{n\times n}\,. \end{aligned}$$
  • For all \(t\in [0,1)\) and \(1\le \ell \le n\), let \(\mathbb P^\ell _t\) denote the conditional probability on \(\{X_t^{k,\theta }=\ell \}\). Thus, for all \(1\le \ell , \ell '\le n\) and \(0\le t\le t+h< 1\),

    $$\begin{aligned}\mathbb P^\ell _t(X^{k,\theta }_{t+h}=\ell '):= \mathbb P(X^{k,\theta }_{t+h}=\ell '\,|\, X_t^{k,\theta }=\ell )=P_{t,t+h}^\theta (\ell ,\ell ')\,. \end{aligned}$$

Proposition 4.3

For any \(t\in [0,1)\), as \(h>0\) goes to 0 one has

(i):

\(\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(X^{k,\theta }_{t+h}=\ell )=\displaystyle 1+\frac{A(\ell ,\ell ) }{1-t}h + o(h)\).

(ii):

\(\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(X^{k,\theta }_{t+h}=\ell ')=\displaystyle \frac{A(\ell ,\ell ')}{1-t}h + o(h)\).

Proof

(i)

Conditional to \(\{X^{k,\theta }_{t}= \ell \}\), the event \(\{X^{k,\theta }_{t+h}= \ell \}\) is the disjoint union of \(\{J^\theta _{[t,t+h)}=0\}\) and \(\{X^{k,\theta }_{t+h}= \ell \}\cap \{J^\theta _{[t,t+h)}\ge 2\}\). For the former, one has

$$\begin{aligned} \lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(J^\theta _{[t,t+h)}=0)=1 + \frac{A(\ell ,\ell )}{1-t}h+o(h)\,. \end{aligned}$$
(4.1)

Indeed,

$$\begin{aligned} \lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(J^\theta _{[t,t+h)}=0)= & {} \lim _{\Vert \theta \Vert \rightarrow 0} \prod _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)-1} \mathbb P^\ell _t(X^{k,\theta }_{m+1}=X^{k,\theta }_m),\\= & {} \lim _{\Vert \theta \Vert \rightarrow 0} \prod _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)-1} \left( 1-\sum _{s'\ne \ell } Q^\theta _m(\ell ,\ell ')\right) ,\\= & {} \lim _{\Vert \theta \Vert \rightarrow 0} \prod _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)-1} \left( 1-\lambda ^\theta _m |A(\ell ,\ell )|+o(\lambda ^\theta _m)\right) ,\\= & {} \lim _{\Vert \theta \Vert \rightarrow 0} \exp \left( -|A(\ell ,\ell )|\sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)-1}\lambda ^\theta _m \right) ,\\ \end{aligned}$$

and the result follows from Lemma 3.5. For the latter, that is \(\{X^{k,\theta }_{t+h}= \ell \}\cap \{J^\theta _{[t,t+h)}\ge 2\}\), one has

$$\begin{aligned} \mathbb P^\ell _t(J^\theta _{[t,t+h)}\ge 2)\le \max _{1\le \ell '\le n} \mathbb P_t^{\ell '}(J^\theta _{[0,h]}\ge 1)^2=\max _{1\le \ell '\le n}\left( 1-\mathbb P_t^{\ell '}(J^\theta _{[t,t+h)}=0)\right) ^2\,. \end{aligned}$$
(4.2)

Therefore, \(\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(J^\theta _{[t,t+h)}\ge 2)=o(h)\), which together with (4.1) proves the desired result.

(ii) Similarly, conditional on \(\{X^{k,\theta }_{t}= \ell \}\),

$$\begin{aligned}\{X^{k,\theta }_{t+h}=\ell '\}= \{X^s_{t+h}=\ell '\} \cap \left( \{ J^\theta _{[t,t+h)}=1\} \cup \{J^\theta _{[t,t+h)}\ge 2\} \right) \,. \end{aligned}$$

Together with (4.2) this equality yields

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(X^{k,\theta }_{t+h}=\ell ')=\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(J^\theta _{[t,t+h)}=1,\, X^{k,\theta }_{t+h}=\ell ')+o(h)\,. \end{aligned}$$

Conditional on leaving the state \(\ell\) at stage m, the probability of going to \(\ell '\ne \ell\) is given by

$$\begin{aligned}\mathbb P(Y^{k,\theta }_{m+1}=\ell '\,|\, Y^{k,\theta }_m= \ell , Y^{k,\theta }_{m+1}\ne \ell )=\frac{Q^\theta _m(\ell ,\ell ')}{\sum _{\ell ''\ne \ell } Q^\theta _m(\ell ,\ell '')}\,. \end{aligned}$$

By assumption, this converges to \(\frac{A(\ell ,\ell ')}{|A(\ell ,\ell )|}\) as \(\Vert \theta \Vert\) goes to 0. On the other hand, (4.2) implies

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(J^\theta _{[t,t+h)}=1)=\lim _{\Vert \theta \Vert \rightarrow 0} 1-\mathbb P^\ell _t(J^\theta _{[t,t+h)}=0)+o(h)\,.\end{aligned}$$

Consequently, using (4.1) one obtains

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(J^\theta _{[t,t+h)}=1,\, X^{k,\theta }_{t+h}=\ell ')=\, & {} \lim _{\Vert \theta \Vert \rightarrow 0} \frac{Q^\theta _m(\ell ,\ell ')}{\sum _{\ell ''\ne \ell } Q^\theta _m(\ell ,\ell '')}\left( 1-\mathbb P^\ell _t(J^\theta _{[t,t+h)}=0)+o(h)\right) \\= \,& {} \frac{A(\ell ,\ell ')}{|A(\ell ,\ell )|}\left( \frac{|A(\ell ,\ell )|}{1-t}h+o(h)\right) \\=\, & {} \frac{A(\ell ,\ell ')}{1-t}h+o(h)\,. \end{aligned}$$

\(\square\)

Corollary 4.4

The processes \((X^{k,\theta }_{t})_{t \in [0,1)}\) converge, as \(\Vert \theta \Vert\) tends to 0, to an inhomogeneous Markov process with generator \(\left( \frac{1}{1-t}A\right) _{t\in [0,1)}\).

Proof

The limit is identified by Proposition 4.3. The tightness is a consequence of (ii). Indeed, this point implies that, for any \(T>0\), uniformly in \(\theta\):

$$\begin{aligned}\lim _{\varepsilon \rightarrow 0} \mathbb P\left( \exists t_1,t_2\in [0,T] \, | \, t_1<t_2<t_1+\varepsilon ,\ X^{k,\theta }_{t_1^-}\ne X^{k,\theta }_{t_1},\ X^{k,\theta }_{t_2^-}\ne X^{k,\theta }_{t_2} \right) =0,\end{aligned}$$

This is precisely the tightness criterion for càdlàg process with discrete values.\(\square\)

The following result is a direct consequence of Corollary 4.4 and Lemma 3.6.

Corollary 4.5

For all \(t\in [0,1)\) the following limit exist:

$$\begin{aligned}\Pi _t:=\lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m= 1}^{\varphi (\theta ,t)} \theta _m \prod _{m'=1}^{m-1} Q^\theta _{m'} = \int _{0}^t e^{-\ln (1-s)A} ds\,. \end{aligned}$$

Proof of Propostion 4.2

The desired result follows directly from Proposition 4.1 and Corollary 4.5, as these two results give

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)=\int _{0}^t (e^{-\ln (1-s)A}g_0)(k)=\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)=t v^k\qquad \forall t\in [0,1]\,. \end{aligned}$$

\(\square\)

Final comments

Back to the Big Match. Recall that this stochastic game is both absorbing and smooth. The state space is \(\{k,0^*,1^*\}\) where k is the non-absorbing state and \(v^k=\frac{1}{2}\). As already discussed, any stationary strategy \((x_\lambda ,y_\lambda )\) satisfying \(x_\lambda ^k(T)=\frac{\lambda }{1+\lambda }\) and \(y_\lambda ^k(L)=\frac{1}{2}\) is optimal in the \(\lambda\)-discounted game; the corresponding limit payoff vector is given by \(g_0=(\frac{1}{2}, 0,1)\); and the corresponding Markov chain \(Q_\lambda\) satisfies

$$\begin{aligned}\lim _{\lambda \rightarrow 0} \frac{Q_\lambda -{\text {Id}}}{\lambda }=A:=\begin{pmatrix} -1&{} \frac{1}{2}&{} \frac{1}{2} \\ 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 \end{pmatrix}. \end{aligned}$$

For any sequence of nonnegative weights \(\theta =(\theta )_m\) so that \(\sum _{m\ge 1} \theta _m=1\) and any \(m\ge 1\), recall that \(\lambda ^\theta _m=\theta _m(\sum _{m' \ge m} \ \theta _{m'})^{-1}\). Then, the asymptotically optimal strategy profile \((\sigma ^\theta , \tau ^\theta )\) is such that, at every \(m\ge 1\), Player 1 plays T at state k with probability \(\frac{\lambda ^\theta _m}{1+\lambda ^\theta _m}\) and Player 2 plays L with probability \(\frac{1}{2}\). The constant payoff property is then as follows:

$$\begin{aligned}\lim _{\Vert \theta \Vert \rightarrow 0} \gamma ^k_\theta (\sigma ^\theta , \tau ^\theta ; t)= (\Pi _t g_0)^k= t v^k= \frac{t}{2}\qquad \forall t\in [0,1] \end{aligned}$$

where \(\Pi _t= \int _{0}^t e^{-\ln (1-s)A}ds\).

Beyond absorbing or smooth stochastic games. The extension of Proposition 4.2 to all stochastic games is an important challenge, as it would settle the constant-payoff conjecture of Sorin, Venel and Vigeral  (2010), i.e. Conjecture 1.1. The main difficulty to overcome is the following: while the continuous-time process \((X^{k,\theta }_t)_{t\in (0,1]}\), with values on K, can be defined for any stochastic game, its limit as \(\Vert \theta \Vert\) goes to 0 will not be smooth in general. For this reason, it may be preferable to replace the state space K with a set of disjoint subsets of states, namely cycles, in the sense of Freidlin and Wentzell (1984), where the process spends a positive amount of time at every visit. Although the aggregation of states poses several technical questions, we conjecture that our approach can be used to solve the general case.

References

  1. Attia L, Oliu-Barton M (2019) A formula for the value of a stochastic game. Proc Natl Acad Sci 116(52):26435–26443

    Article  Google Scholar 

  2. Bewley T, Kohlberg E (1976) The asymptotic theory of stochastic games. Math Op Res 1:197–208

    Article  Google Scholar 

  3. Blackwell D, Ferguson TS (1968) The big match. Ann Math Stat 39:159–163

    Article  Google Scholar 

  4. Freidlin MI, Wentzell AD (1984) Random perturbations of dynamical systems. Springer, Berlin

    Google Scholar 

  5. Gillette D (1957) Stochastic games with zero stop probabilities, contributions to the theory of games, III. In: Dresher M, Tucker AW, Wolfe P (eds) Annals of the mathematical studies. Princeton University Press, New Jersey, pp 179–187

    Google Scholar 

  6. Kohlberg E (1974) Repeated games with absorbing states. Ann Stat 2:724–738

    Article  Google Scholar 

  7. Laraki R (2010) Explicit formulas for repeated games with absorbing states. Int J Game Theory 39:53–69

    Article  Google Scholar 

  8. Mertens J-F, Neyman A (1981) Stochastic games. Int J Game Theory 10:53–66

    Article  Google Scholar 

  9. Mertens J-F, Neyman A (1982) Stochastic games. Proc Natl Acad Sci USA 79:2145–2146

    Article  Google Scholar 

  10. Neyman A, Sorin S (2010) Repeated games with public uncertain duration process. Int J Game Theory 39:29–52

    Article  Google Scholar 

  11. Neyman Abraham (2017) Continuous-time stochastic games. Games Econ Behav 104:92–130

    Article  Google Scholar 

  12. Oliu-Barton M, Ziliotto B (2021) Constant payoff in zero-sum stochastic games, to appear in Annals of IHP Probability

  13. Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100

    Article  Google Scholar 

  14. Sorin S, Venel X, Vigeral G (2010) Asymptotic properties of optimal trajectories in dynamic programming. Sankhya A 72:237–245

    Article  Google Scholar 

  15. Sorin S, Vigeral G (2020) Limit optimal trajectories in zero-sum stochastic games. Dyn Games Appl 10:555–572

    Article  Google Scholar 

  16. Ziliotto B (2016) A Tauberian theorem for nonexpansive operators and applications to zero-sum stochastic games. Math Op Res 41:1522–1534

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Miquel Oliu-Barton.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Oliu-Barton, M. Weighted-average stochastic games with constant payoff. Oper Res Int J (2021). https://doi.org/10.1007/s12351-021-00625-6

Download citation

Keywords

  • Stochastic game
  • Value
  • Markov chain
  • Constant payoff