Abstract
In a zerosum stochastic game, at each stage, two opponents make decisions which determine a stage reward and the law of the state of nature at the next stage, and the aim of the players is to maximize the weightedaverage of the stage rewards. In this paper we solve the constantpayoff conjecture formulated by Sorin, Venel and Vigeral in 2010 for two classes of stochastic games with weightedaverage rewards: (1) absorbing games, a wellknown class of stochastic games where the state changes at most once during the game, and (2) smooth stochastic games, a newly introduced class of stochastic games where the state evolves smoothly under optimal play.
Introduction
Model Stochastic games were introduced by Shapley (1953) in order to model a repeated interaction between two opponent players in a changing environment. The game proceeds in stages. At each stage \(m\in \mathbb N\) of the game, players play a zerosum game that depends on a state variable. Formally, knowing the current state \(k_m\), Player 1 chooses an action \(i_m\) and Player 2 chooses an action \(j_m\). Their choices occur independently and have two consequences: first, they produce a stage reward \(g(k_m,i_m,j_m)\) which is observed by the players and, second, they determine the law \(q(k_m,i_m,j_m)\) of the next period’s state \(k_{m+1}\). Thus, the sequence of states follows a Markov chain controlled by the actions of both players. To any sequence of nonnegative weights \(\theta =(\theta _m)\) and any initial state k corresponds the \(\theta\)weighted average stochastic game which is one where Player 1 maximizes the expectation of \(\sum \nolimits _{m \ge 1} \theta _m g(k_m,i_m,j_m)\), given that \(k_1=k\), while Player 2 minimizes the same amount. A crucial aspect in this model is that the current state is commonly observed by the players at every stage. Another one is stationarity: the transition and stage reward functions do not change over time.
A \(\theta\)weighted average stochastic game is thus described by a tuple \((K,I,J,g,q,k,\theta )\) where K is a set of states, I and J are the sets of actions of both players, \(g:K\times I\times J\rightarrow \mathbb R\) is the reward function, \(q:K\times I\times J\rightarrow \Delta (K)\) is the transition function, k is the initial state and \(\theta\) is a sequence of nonnegative weights so that \(\sum _{m\ge 1}\theta _m=1\). Like in Shapley’s seminal paper (1953), we assume throughout this paper that K, I, J are finite sets, and identify the set K with \(\{1,\dots ,n\}\).
Discounted and finitely repeated stochastic games The cases where, for all \(m\in \mathbb N\), one has \(\theta _m=\lambda (1\lambda )^{m1}\) for some \(\lambda \in (0,1)\) or \(\theta _m=\frac{1}{T} \mathbb {1}_{\{m\le T\}}\) for some \(T\in \mathbb N\) are referred to as \(\lambda\)discounted stochastic games and Tstage repeated stochastic games, respectively.
An example: the “Big Match” Introduced by Gillette (1957), the Big Match is the most famous stochastic game. The state space is \(K=\{k,0^*,1^*\}\), where states \(0^*\) and \(1^*\) are absorbing with payoff 0 and 1 respectively, and the action sets \(I=\{i,i'\}\) and \(J=\{j,j'\}\). That is:
The game with initial state k, the nonabsorbing state, can be represented as follows:
As long as Player 1 plays action \(i'\), the state remains the same, and the stage rewards are 0 or 1 depending on Player 2’s action. On the contrary, when Player 1 plays i, the state moves to an absorbing state, either \(0^*\) or \(1^*\) depending on Player 2’s action, where the future stage rewards are fixed once and for all, i.e.:
An asymptotically optimal strategy for Player 1. The \(\lambda\)discounted Big Match has a value \(v_\lambda ^k=\frac{1}{2}\) and it is optimal for Player 1 to play the action i with the same probability \(\frac{\lambda }{1+\lambda }\) at every stage. Similarly, the Tstage game has a value \(v_T^k=\frac{1}{2}\). The optimal strategy of Player 1 consists in playing, at every stage \(1\le m\le T\), the action i with probability \(\frac{1}{Tm+2}\). More generally, the value \(v^k_\theta\) exists for all sequence of weights \(\theta\) and is equal to \(\frac{1}{2}\). An optimal strategy for Player 2 consists in playing actions j and \(j'\) with equal probability at every stage. An optimal strategy for Player 1 can be obtained recursively, using the socalled Shapley operator, which relies on the stationarity of the model. However, an even simpler strategy exists, which works well when \(\Vert \theta \Vert :=\max _m \theta _m\) is small enough: at stage \(m\ge 1\), play action i with probability \(\frac{\lambda _m}{1+\lambda _m}\), where \(\lambda _m=\frac{\theta _m}{\sum _{m'\ge m}\theta _m}\) is the relative weight of the current stage with respect to the remaining weights of the game. Indeed, for every \(\varepsilon >0\), there exists \(\delta >0\) so that for any \(\theta\) satisfying \(\Vert \theta \Vert \le \delta\), the strategy guarantees at least \(v^k_\theta \varepsilon\).
The constantpayoff property. Consider the Tstage game again, and suppose that both player use their optimal strategies. What is the average reward after M stages ? In fact, the average reward over the first M stages is very close to \(v^k_T\) as long as M and T are large. This property is remarkable, as it means a constant flow of rewards for the players, under optimal play. More precisely, and more generally, the constantpayoff property holds if, for any \(\varepsilon >0\), there exists \(\delta >0\) so that for any \(\theta\) satisfying \(\Vert \theta \Vert \le \delta\), the cumulated rewards over the first M stages are equal to \((\sum _{m=1}^M \theta _m) v^k_\theta\) up to an error of at most \(\varepsilon\).
Selected past results Every stochastic game \((K,I,J,g,q,k,\theta )\) has a value, denoted by \(v_\theta ^k\). Although only stated for the discounted case, this result follows from Shapley (1953). Further, both players have optimal strategies that depend, at each stage, only on the current state and the current stage, and the dependence on the stage is not needed in the discounted case.) The Big Match was solved by Blackwell and Ferguson (1968), who considered the Tstage stochastic, where T is sufficiently large. In the sequel, \(v_\lambda ^k\) and \(v^k_T\) refer, respectively, to the value of a \(\lambda\)discounted and a Tstage stochastic game. Bewley and Kohlberg (1976) proved the convergence of \(v_\lambda ^k\) as \(\lambda\) goes to 0 and the convergence of \(v_T^k\) as T goes to infinity, and the equality of the limits. Further, Bewley and Kohlberg (1976) proved that the map \(\lambda \mapsto v_\lambda\) admits a Puiseux series expansion near 0, and the existence of an optimal strategy profile with the same property (see Sect. 2.1.2 below). Mertens and Neyman (1981, 1982) proved the existence of the value \(v^k\), that is, Player 1 can ensure that the average reward is at least \(v^k\) in any Tstage stochastic game so that T is large enough, and similarly Player 2 can ensure that the average reward is at most \(v^k\) in all such games. Neyman and Sorin (2010) studied stochastic games with a random number of stages, and proved that their values converge to \(v^k\) as the expected number of stages tends to \(+\infty\), under a monotonicity condition. Ziliotto (2016) proved that the values of \(\theta\)weighted stochastic games \(v_\theta ^k\) converge to \(v^k\) as \(\Vert \theta \Vert :=\max _{m\ge 1}\theta _m\) goes to 0 provided that \(\sum _{m\ge 1} \theta _{m +1}^p  \theta _m^p \) converges to zero for some \(p > 0\). The value of a stochastic game \(v^k\) was recently characterized by Attia and OliuBarton (2019), as the unique solution of a single equation.
Notation In the sequel, \(\mathbb E^k_{\sigma ^\theta ,\tau ^\theta }\) denotes the expectation under the unique law induced by \((\sigma ^\theta ,\tau ^\theta )\) and the initial state k, as explained below in Sect. 2.1. Further, quantities \(d_\theta\), indexed by sequences \(\theta\), are said to have a limit d as \(\Vert \theta \Vert\) goes to 0 if for every \(\varepsilon >0\) there exists \(\delta >0\) so that for all \(\theta\) satisfying \(\Vert \theta \Vert \le \delta\) one has \(d_\theta d \le \varepsilon\).
The constantpayoff property The constantpayoff property was first noted by Sorin, Venel and Vigeral (2010) in the framework of single decisionmaker problems, and conjectured to hold in finite twoplayer zerosum stochastic games (henceforth stochastic games). Their conjecture can be stated as follows.
Conjecture 1.1
(Sorin, Venel and Vigeral (2010)) Let \((K,I,J,g,q,k,\theta )\) be a family of stochastic games indexed by a sequence of weights \(\theta\). Then, there exists a family of strategy profiles \((\sigma ^\theta ,\tau ^\theta )\) indexed by \(\theta\) so that the following two properties hold.

The strategy profile \((\sigma ^\theta ,\tau ^\theta )\) is asymptotically optimal,i.e. \(\sigma ^\theta\) guarantees that the \(\theta\)weighted reward is at least \(v^k\varepsilon _1(\theta )\), and \(\tau ^\theta\) guarantees that the \(\theta\)weighted reward is at most \(v^k+\varepsilon _2(\theta )\), for some error functions satisfying \(\lim _{\Vert \theta \Vert \rightarrow 0}\varepsilon _1(\theta )=\lim _{\Vert \theta \Vert \rightarrow 0}\varepsilon _2(\theta )=0\).

The strategy profile \((\sigma ^\theta ,\tau ^\theta )\) induces a constant average reward throughout the game. Formally, for any family of integers \((M_\theta )\), indexed by \(\theta\), satisfying \(\lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=1}^{M_\theta }\theta _m>0\), one has
$$\begin{aligned} \lim _{\Vert \theta \Vert \rightarrow 0} \frac{\mathbb E^k_{\sigma ^\theta ,\tau ^\theta }\left[ \sum _{m=1}^{M_\theta } \theta _m g(k_m,i_m,j_m)\right] }{\sum _{m=1}^{M_\theta } \theta _m}= v^k\,.\end{aligned}$$(1.1)
The constantpayoff property was established for discounted absorbing games by Sorin and Vigeral (2020), that is for the wellknown class of stochastic games introduced by Kohlberg (1974) where the state changes at most once. The constantpayoff property was established for discounted stochastic games by OliuBarton and Ziliotto (2021). So far, there is no such result for general weightedaverage stochastic games.
Main result In this paper we solve Conjecture 1.1 for two classes of general weightedaverage stochastic games: the wellstudied class of absorbing games, and the newly introduced class of smooth stochastic games. (A precise statement of our results is condensed in Theorem 2.7 below.) Before we introduce the latter, let us point out that when both players use stationary strategies (that is, ones that depend only on the current state) the state follows a Markov chain. Further, both players have optimal stationary strategies exists in \(\lambda\)discounted stochastic games.
Absorbing games A stochastic game (K, I, J, g, q, k) is absorbing if, for every \(\ell \ne k\) and every \((i,j)\in I\times J\) one has \(q(\ell \,\, \ell ,i,j)=1\). Absorbing games are stochastic games with at most one transition. These games have been extensively studied since the term was coined by Kohlberg (1974). Without loss of generality, we assume that states \(\ell \ne k\) are nonstrategic, i.e. there exits \(g^\ell \in \mathbb R\) so that \(g(\ell ,i,j)=g^\ell\) for all \((i,j)\in I\times J\).
Smooth stochastic games A stochastic game (K, I, J, g, q, k) is smooth if there exists a family of stationary strategy profiles \((x_\lambda ,y_\lambda )_{\lambda \in (0,1)}\), so that \((x_\lambda ,y_\lambda )\) is an optimal strategy profile in the \(\lambda\)discounted case, and if the corresponding family of stochastic matrices \((Q_\lambda )_{\lambda \in (0,1)}\) satisfies \(\lim _{\lambda \rightarrow 0} \frac{Q_\lambda {\text {Id}}}{\lambda }=A\), for some matrix \(A\in \mathbb R^{n\times n}\).
Comments

The Big Match is a prominent example of an absorbing game. Further, this game is also a smooth stochastic game. Indeed, for any discount rate \(\lambda \in (0,1)\), any stationary strategy \((x_\lambda ,y_\lambda )\) satisfying \(x_\lambda ^k(T)=\frac{\lambda }{1+\lambda }\) and \(y_\lambda ^k(L)=\frac{1}{2}\) is optimal in the \(\lambda\)discounted game, and induces a Markov chain over \(\{k,0^*,1^*\}\) whose transition is given by
$$\begin{aligned} Q_\lambda =\begin{pmatrix} \frac{1}{1+\lambda } &{} \frac{\lambda }{2(1+\lambda )}&{} \frac{\lambda }{2(1+\lambda )} \\ 0 &{} 1 &{} 0\\ 0 &{} 0 &{} 1 \end{pmatrix}. \end{aligned}$$Clearly, this matrix satisfies the condition
$$\begin{aligned} \lim _{\lambda \rightarrow 0} \frac{Q_\lambda {\text {Id}}}{\lambda }=A:=\begin{pmatrix} 1&{} \frac{1}{2}&{} \frac{1}{2} \\ 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 \end{pmatrix}. \end{aligned}$$ 
Albeit never introduced before, smooth stochastic games appear naturally in the asymptotic study of discounted stochastic games, as those where the players are restricted to use stationary strategies which are power series in \(\lambda\), i.e. \(x_\lambda =x_0+\sum _{m\ge 1}\lambda ^m \alpha _m\) and \(y_\lambda =y_0+\sum _{m\ge 1}\lambda ^m \beta _m\). By Laraki (2010), both players have \(\varepsilon\)optimal strategies of this form in absorbing games, for all \(\varepsilon >0\). This result can be easily extended to irreversible games, a broader class of stochastic games where states that are left can never be reached again. Further, smooth stochastic games are related to the continuoustime stochastic games of Neyman (2017), where the state variable is a continuoustime Markov chain which is controlled by both players. Similarly, the state process of a smooth stochastic games converges to continuoustime Markov chain when both players play optimally, and as the weights tend to 0 (see Corollary 4.4 below).
Stochastic games
In the sequel \((K,I,J,g,q,k,\theta )\) denotes a fixed \(\theta\)weighted average stochastic game with initial state k. In order to state our results formally, we start by recalling some definitions.
Strategies, payoff function and value
The sequence \((k_1,i_1,j_1,...,k_m,i_m,j_m,...)\) generated along the game is called a play. The set of plays is \((K\times I\times J)^{\mathbb N}\).
Definition 2.1

A strategy for a player specifies a mixed action to each possible set of past observations. Formally, a strategy for Player 1 is a collection of maps \(\sigma =(\sigma _m){m \ge 1}\), where \(\sigma _m:(K \times I \times J)^{m1} \times K \rightarrow \Delta (I)\). Similarly, a strategy for Player 2 is a collection of maps \(\tau =(\tau _m)_{m \ge 1}\), where \(\tau _m:(K \times I \times J)^{m1} \times K \rightarrow \Delta (J)\).

A stationary strategy is one that plays according to the current state only. Formally, a stationary strategy for Player 1 is a mapping \(x:K\rightarrow \Delta (I)\), and a stationary strategy for Player 2 is a mapping \(y:K\rightarrow \Delta (J)\). (Note the use of different letters for stationary strategies.)
Notation The sets of strategies for Players 1 and 2 are denoted by \(\Sigma\) and \(\mathcal {T}\), respectively, and the sets of stationary strategies by \(\Delta (I)^n\) and \(\Delta (J)^n\).
For any pair \((\sigma ,\tau ) \in \Sigma \times \mathcal {T}\) we denote by \(\mathbb P^{k}_{\sigma ,\tau }\) the unique probability measure on the set of plays \((K\times I\times J)^\mathbb N\) induced by \((\sigma ,\tau )\), \(k_1=k\) and q. Note that the dependence on the transition function q is omitted. This probability is welldefined by the Kolmogorov extension theorem, and the expectation with respect to the probability \(\mathbb P^k_{\sigma ,\tau }\) is denoted by \(\mathbb E^k_{\sigma ,\tau }\).
The payoff function For each \((\sigma ,\tau )\in \Sigma \times \mathcal {T}\) the payoff function is defined by
The value The game has a value, denoted by \(v^k_\theta\). That is
This result can be attributed to Shapley (1953).
Stationarity in the discounted case By stationarity, to every stationary strategy profile \((x,y)\in \Delta (I)^n\times \Delta (J)^n\) corresponds a Markov chain \((k_m)_{m\ge 1}\) with transition matrix \(Q_{xy}\in \mathbb R^{n\times n}\) and payoff vector \(g_{xy}\in \mathbb R^n\):

\(Q_{xy}(k,\ell ):=\sum _{(i,j)\in I\times J}x(k,i)y(k,j)q(\ell k,i,j)\), for all \((k,\ell )\in K^2\).

\(g_{xy}(k):=\sum _{(i,j)\in I\times J}x(k,i)y(k,j)g(k,i,j)\), for all \(k\in K\).
For all \((k,\ell )\in K^2\) and \(m\ge 1\) the following equalities trivially hold:
Optimal and asymptotically optimal strategies
Definition 2.2
An optimal strategy of Player 1 is an element \(\sigma ^*\in \Sigma\) so that, for all \(\tau \in \mathcal {T}\),
An optimal strategy of Player 2 is defined in a similar way. That is, it is an element \(\tau ^* \in \Sigma\) so that, for all \(\sigma \in \Sigma\) one has \(\gamma _\theta ^k(\sigma ,\tau ^*)\le v_\theta ^k\).
Definition 2.3
An asymptotically optimal strategy for Player 1 is family of strategies \((\sigma _\theta )\) indexed by \(\theta\) so that for all \(\varepsilon >0\) there exists \(\delta >0\) so that for all sequences of weights \(\theta\) so that \(\Vert \theta \Vert \le \delta\),
Asymptotically optimal strategies for Player 2 are defined in a symmetric way.
Puiseux strategies
A real map \(f:(a,b) \rightarrow \mathbb R\) is a Puiseux series if either \(f\equiv 0\) or there exists \(m_0\in \mathbb Z\), \(N\in \mathbb N\) and a real sequence \((a_m)_{m\ge m_0}\) so that \(a_{m_0}\ne 0\) and
A function \(f:(0,1] \rightarrow \mathbb R\) admits a Puiseux expansion at 0 if there exists \(\lambda _0\) so that f is a Puiseux series on \((0,\lambda _0)\).
Definition 2.4
A Puiseux strategy profile is a family of stationary strategy profiles \((x_\lambda ,y_\lambda )_{\lambda \in (0,1]}\) so that for all \((k,i,j)\in K\times I\times J\) the mappings \(\lambda \mapsto x_\lambda (k,i)\) and \(\lambda \mapsto y_\lambda (k,j)\) admit Puiseux expansions near 0. An optimal Puiseux strategy profile is one where, in addition, \(x_\lambda\) and \(y_\lambda\) are optimal strategies for \(\lambda\) sufficiently small.
Fact Bewley and Kohlberg (1976) proved that every finite stochastic game admits an optimal Puiseux strategy profile.
A concrete family of asymptotically optimal strategies
Let \((x_\lambda ,y_\lambda )\) be a fixed optimal Puiseux strategy profile of the game. For each sequence of nonnegative weights \(\theta =(\theta _m)_{m\in \mathbb N}\), and all \(m\in \mathbb N\), define the pair of strategies \((\sigma ^\theta ,\tau ^\theta )\) by setting
The family \((\sigma ^\theta ,\tau ^\theta )\), indexed by any possible sequence of weights \(\theta\), is asymptotically optimal by Ziliotto (2016). It is thus a good candidate for tackling the constantpayoff conjecture. In the sequel, the superscript \((\sigma ^\theta ,\tau ^\theta )\) denotes this concrete family of strategy profiles, while the subscript \((\sigma _\theta ,\tau _\theta )\) denotes any family of strategy profiles.
Remark 2.5
If \(\theta\) is a \(\lambda\)discounted sequence of weights, i.e. \(\theta _m=\lambda (1\lambda )^{m1}\) for all \(m\in \mathbb N\) for some \(\lambda \in (0,1)\), then \(\lambda ^\theta _m=\lambda\) for all \(m\in \mathbb N\). In this case, \((\sigma ^\theta _m,\tau ^\theta _m)=(x_\lambda ,y_\lambda )\) for all \(m\in \mathbb N\). This property makes the discounted case central in the study of all weights.
Notation If \(\theta\) is the \(\lambda\)discounted sequence of weights for some \(\lambda \in (0,1)\) we denote the clock function by \(\varphi (\lambda ,t)\) and the cumulated payoff by \(\gamma _\lambda ^k(\sigma ,\tau ;t)\), for all \(t\in [0,1]\).
The cumulated payoffs
For each sequence of nonnegative weights \(\theta\) so that \(\sum _{m\ge 1}\theta _m=1\) we introduce the clock function \(\varphi (\theta ,\, \cdot \,):[0,1]\rightarrow \mathbb N\cup \{+\infty \}\) by setting
Note that \(\varphi (\theta ,0)=1\) for all \(\theta\) and \(\varphi (\theta ,1)=\sup \{m\ge 1, \ \theta _m>0\}\). Hence, \(\varphi (\theta ,1)=+\infty\) if and only if \(\theta\) has infinite support. We now introduce the cumulated payoff at time t. For any pair of strategies \((\sigma ,\tau )\in \Sigma \times \mathcal {T}\),
The case \(t=1\) corresponds to the expectation of the (total) \(\theta\)weighted average of the stage rewards. For simplicity, we use the notation \(\gamma _\theta ^k(\sigma ,\tau )\) in this case.
Discounted case If \(\theta\) is the \(\lambda\)discounted sequence of weights, and (x, y) is a stationary strategy profile, then using the expressions in (2.1) yields the following expression:
smooth
Remark 2.6
The term “time t” is an artificial notion, that should not be confused with the stage number. It refers to the fraction of the game that has been played. For instance, for a game with T stages (each of which has weight \(\frac{1}{T}\)), the cumulated payoff at time t is the reward obtained in stages \(1, 2,\dots , \lceil t T\rceil\).
Main result
Theorem 2.7
Any stochastic game (K, I, J, g, q, k) that is either absorbing or smooth satisfies the constantpayoff property. More precisely, for these games the family of asymptotically optimal strategies defined in (2.2) satisfies
This result is established separately for absorbing and smooth stochastic games. The proof proceeds as follows.

First, the constantpayoff property holds in discounted case. That is, there exists an optimal Puiseux strategy profile \((x_\lambda ,y_\lambda )\) so that
$$\begin{aligned}\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)=t v^k\qquad \forall t\in [0,1]\,. \end{aligned}$$ 
Second, for the strategy profile \((\sigma ^\theta ,\tau ^\theta )\), the limit cumulated payoff \(\lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)\) exists for all \(t\in [0,1]\). Notably, this implies
$$\begin{aligned} \lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)=\lim _{\lambda \rightarrow 0}\gamma _\lambda ^k(x_\lambda ,y_\lambda ;t)\qquad \forall t\in [0,1], \end{aligned}$$as the discounted weights \((\lambda (1\lambda )^{m1})_m\) are a particular family of weights.
The first step has recently been extended to all stochastic games by OliuBarton and Ziliotto (2021). For the sake of completeness, we have preferred to prove this result here. Our approach is particularly suited for absorbing and smooth stochastic games. The second step was never achieved before, and its extension to all stochastic games is the next challenge. To do so, however, several difficulties need to be overcome (see the discussion at the end of Sect. 4).
Outline of the paper
We start by considering the class of absorbing games in Sect. 3. First, we establish the constantpayoff property in the discounted case (Sect. 3.1), and then extend it to the case of a general sequence of weights (Sect. 3.2). Section 4 is devoted to the class of smooth stochastic games: the discounted case in treated in Sect. 4.1, and the general case in Sect. 4.2.
The constantpayoff property in absorbing games
Throughout this section, we consider a fixed an optimal Puiseux strategy profile \((x_\lambda ,y_\lambda )\) of the absorbing game (K, I, J, g, q, k). We denote the corresponding family of stochastic matrices by \((Q_\lambda )\), and \(g_0:=\lim _{\lambda \rightarrow 0}g(\,\cdot \,,x_\lambda ,y_\lambda )\in \mathbb R^n\).
The discounted case
We start by proving the following result.
Proposition 3.1
Suppose that the stochastic game (K, I, J, g, q, k) is absorbing. Then,
This result can be found in Sorin and Vigeral (2019), and is also a particular case of OliuBarton and Ziliotto (2021). For this reason, and also because of its interest, we provide below a new proof which is specific to the class of absorbing games.
We start with an elementary probabilistic result, and an immediate consequence.
Lemma 3.2
For each \(\lambda \in (0,1]\), let \((X^\lambda _m)_{m\ge 1}\) be a Markov chain with transition matrix \(Q_\lambda \in \mathbb R^{n\times n}\). Suppose that \(\lambda \mapsto Q_\lambda (k,\ell )\) admits a Puiseux expansion at 0 for all \(\ell \ne k\), so in particular the limits \(p_{k\ell }:=\lim _{\lambda \rightarrow 0} \frac{Q_\lambda (k,\ell )}{\sum _{\ell '\ne k}Q_\lambda (k,\ell ')}\in [0,1]\) exist (with the convention \(\frac{0}{0}=1\)). Further, suppose that \(Q_{\lambda }(\ell ,\ell )=1\) for all \(\ell \ne k\) and all \(\lambda\) sufficiently small (so all states other than k are absorbing states). Then, there exist \(c\ge 0\) and \(e\ge 0\) so that \(\sum _{\ell \ne k} Q_\lambda (k,\ell )=c\lambda ^e+o(\lambda ^e)\) as \(\lambda \rightarrow 0\), and for all \(t\in (0,1]\)
Proof
By assumption, there exists \(\lambda _0>0\) so that the map \(\lambda \mapsto \sum _{\ell \ne k} Q_\lambda (k,\ell )\) is a Puiseux series on \((0,\lambda _0)\). By boundedness, this map can be expressed as \(\sum _{m\ge 0} a_m \lambda ^{e_m}\), where \((a_m)\) is a real sequence and \((e_m)\) is a strictly increasing nonnegative sequence. Let \(m_0:=\inf \{m\ge 0, a_m\ne 0\}\). Then, either \(m_0=\infty\), in which case \(\sum _{\ell \ne k} Q_\lambda (k,\ell )\) is equal to 0 on \((0,\lambda _0)\), or \(m_0<+\infty\) and \(a_{m_0} \lambda ^{e_{m_0}}\) is the leading term of the Puiseux series on \((0,\lambda _0)\). Note that \(a_{m_0}>0\) in this case, since \(a_{m_0}<0\) would yield a negative expression for \(\sum _{m\ge 0} a_m \lambda ^{e_m}=\sum _{\ell \ne k} Q_\lambda (k,\ell )\) as \(\lambda\) goes to 0. Hence,
satisfies \((c,e)\ge 0\) and \(\sum _{\ell \ne k} Q_\lambda (k,\ell )=c\lambda ^e+o(\lambda ^e)\) as \(\lambda \rightarrow 0\). All states except k being absorbing, one can easily compute \(\lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}=k\,\, X^\lambda _1=k)\). First, \(Q_\lambda (k,k)=1 c\lambda ^e+o(\lambda ^e)\) as \(\lambda \rightarrow 0\) by complementarity. Second, \(\lim _{\lambda \rightarrow 0} \lambda \varphi (\lambda ,t)=\ln (1t)\) for all \(t\in [0,1)\). Hence,
On the other hand, the equality \(\mathbb P(X^\lambda _{m}=\ell \,\, X^\lambda _{m}\ne k,\ X^\lambda _1=k)= \frac{Q_\lambda (k,\ell )}{\sum _{\ell '\ne k}Q_\lambda (k,\ell ')}\) holds for all \(m\ge 2\), and in particular for \(m=\varphi (\lambda ,t)\) when \(\lambda\) is small and \(t>0\). Thus, \(\lim _{\lambda \rightarrow 0} \mathbb P(X^\lambda _{\varphi (\lambda ,t)}=\ell \,\, X^\lambda _{\varphi (\lambda ,t)}\ne k,\, X^\lambda _1=k)=p_{k\ell }\). The desired result follows directly from (3.1), since
\(\square\)
The following consequence is straightforward.
Corollary 3.3
Let \((X^\lambda _m)_{m\ge 1}\) be a family of Markov chains with transition matrices \((Q_\lambda )\) satisfying the assumptions of Lemma 3.2, and let (c, e) and \((p_{k\ell })_{\ell \ne k}\) be the same as therein. Then, for all \(t\in [0,1]\) and \(\ell \ne k\),
where \(\alpha (c,t)= \frac{1(1t)^{1+c}}{1+c} \in [0,t]\).
Proof
It follows directly from Lemma 3.2 and the fact that all states \(\ell \ne k\) are absorbing. Let us explain how \(\alpha (c,t)\), the expected time spent in k in before time t when \(c>0\) and \(e=1\), is obtained. On the one hand, the state k has not been left at time t with probability \((1t)^c\). On the other, the state k is left between time s and \(s+ds\) with probability \(c(1s)^{c1}ds\). These two probabilities follow from Lemma 3.2. Hence, the expected time in state k before time t is given by
\(\square\)
Proof of Proposition 3.1
Because the game is absorbing with initial state k, the family of stochastic matrices \((Q_\lambda )\) satisfies the assumptions of Lemma 3.2, so let (c, e), and \((p_{k,\ell })_{\ell \ne k}\) be as therein. That is, \(\sum _{\ell \ne k}Q_\lambda (k,\ell )=c\lambda ^e+o(\lambda ^e)\) as \(\lambda \rightarrow 0\), and \(\lim _{\lambda \rightarrow 0}\frac{Q_\lambda (k,\ell )}{\sum _{\ell '\ne k}Q_\lambda (k,\ell ')}=p_{k\ell }\).
We distinguish two cases, depending on whether \(c>0\) and \(0<e<1\) hold or not.
Case 1 \(c>0\) and \(0\le e<1\). In this case, absorption occurs very soon in the game by Lemma 3.2. The constantpayoff property trivially holds in this case. On the one hand, \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;\,1)=v^k\) by the optimality of (x, y). On the other, for any \(t\in (0,1]\) one has \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;t)= t \sum \nolimits _{\ell \ne k} p_{k\ell } g_0^\ell\) by Corollary 3.3. Hence, \(v^k=\sum \nolimits _{\ell \ne k} p_{k\ell } g_0^\ell\) and \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;t)= tv^k\) for all \(t\in [0,1]\).
Case 2 \(c=0\) or \(e>1\). Similarly, in this case absorption occurs too late in the game (or never) by Lemma 3.2, and again the constantpayoff property trivially holds. On the one hand, \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;1)=v^k\) by the optimality of (x, y). On the other, for any \(t\in [0,1]\) one has \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;t)= t g_0^k\) by Corollary 3.3. Hence, \(v^k=g_0^k\), and \(\lim _{\lambda \rightarrow 0}\gamma _\lambda (x,y;t)= t v^k\) for all \(t\in [0,1]\).
Case 3 \(c>0\) and \(e=1\). This is the critical case: the total probability of absorption at every stage is \(c\lambda +o(\lambda )\), so that the game absorbs before time \(t\in (0,1)\) with probability \(1(1t)^c>0\) by Lemma 3.2. To establish the constantpayoff property, it is enough to prove that the equality \(v^k=g_0^k= \sum \nolimits _{\ell \ne k} p_{k\ell } g_0^\ell\) holds. Indeed, in this case, both the payoff prior to absorption and the payoff after absorption are equal to \(v^k\) in expectation.
We introduce some notation in order to partition the actions sets in a convenient manner.
For all \((i,j)\in I\times J\), let \((c(i),e(i),c'(j), e'(j))\in \mathbb R^4_+\) so that \(x_\lambda ^k(i)=c(i)\lambda ^{e(i)}+o(\lambda ^{e(i)})\) and \(y_\lambda ^k(j)=c'(j)\lambda ^{e'(j)}+o(\lambda ^{e'(j)})\). Suppose, without loss of generality, that \(c(i)=0\) implies \(e(i)=0\), and similarly \(c(j)=0\) implies \(e(j)=0\). Partition the actions sets I and J in four (possibly empty) sets as follows:
For each \(\ell \ne k\), define the transition rate from k to \(\ell\) by setting \(A^{\ell }:=A^{\ell }_{01}+ A^{\ell }_*+ A^{\ell }_{10}\) where
Let \(A,A_{01}, A_*,A_{10}\) denote the corresponding vectors in \(\mathbb R^{n1}\). Note that, by definition, these are nonnegative vectors. Also, \(\sum _{\ell \ne k} A^\ell =c>0\) and \(\frac{A^{\ell }}{\sum \nolimits _{\ell '\ne k}A^{\ell '}}=p_{k\ell }\) for all \(\ell \ne k\). The situation can be pictured as follows.
The actions in \(I_0\times J_0\) determine the payoff \(g_0^k\), while the shaded areas correspond to the pairs of actions which determine the transitions from k to the set of absorbing states \(K\backslash \{k\}\). The actions in \(I_+\) and \(J_+\) are irrelevant because, on the one hand, they do not affect the limit payoff \(g_0^k\); on the other, they induce lower order probability transitions so that the probability that an absorption occurs when an action in \(I_+\) or \(J_+\) was played goes to 0 as \(\lambda\) goes to 0. The limit payoff thus satisfies (see also Corollary 3.3) the following relation:
Case 3a Suppose by contradiction that \(g^k_0>v^k\) and \(\sum _{\ell \ne k}A^{\ell }_{01}>0\). In this case, Player 2 can deviate from \(y_\lambda\) to a strategy \(\widetilde{y}_\lambda\) which changes the probabilities of playing actions in \(J_1\) to \(c'(j)\lambda ^{1\varepsilon }\) for a sufficiently small \(\varepsilon\) (say, smaller than all nonzero e(i) and \(e'(j)\)). By doing so, the probability that the state k is left before stage \(t/\lambda\) goes to 1 for any \(t>0\). Consequently,
On the other hand, if Player 1 deviates from \(x_\lambda\) to a strategy \(\widetilde{x}_\lambda\) which plays actions outside \(I_0\) to 0, then the transition from k to the set of absorbing states depends only on \(A_{01}\), and one has
Yet, the optimality of \((x_\lambda )\) and \((y_\lambda )\) implies that
The relations (3.3), (3.4) and (3.5) are not compatible with \(g_0^k>v^k\), a contradiction.
Case 2b Suppose by contradiction that \(g^k_0>v^k\) and \(\sum \nolimits _{\ell \ne k}A^{\ell }_{01}=0\). In this case, for the strategy \((\widetilde{x}_\lambda )\) described in the previous case, one has
This contradicts the optimality of \((y_\lambda )\).
Together, cases 3a and 3b imply \(g_0^k\le v^k\). Similarly, reversing the roles of the players one obtains \(g_0^k\ge v^k\). Together with (3.2), the equality \(v^k=g_0^k\) implies \(v^k= \sum \nolimits _{\ell \ne k} p_{k\ell } g_0^\ell\) as well. The result follows from Corollary 3.3, since there exists \(\alpha _t:=\alpha (c,t)\in [0,t]\) so that
\(\square\)
The general case
Proposition 3.4
Suppose that (K, I, J, g, q, k) is absorbing. Let \((\sigma ^\theta ,\tau ^\theta )\) be the family of asymptotically optimal strategies defined in (2.2). Then, for all \(t\in [0,1]\), \(\lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)\) exists.
We start by two technical lemmas.
Lemma 3.5
Let \(0\le t<1\) and \(e\ge 0\). Then, as \(h>0\) tends to 0:
Proof
For each \(m\ge 1\) set \(t^\theta _m:=\sum _{r=1}^{m1}\theta _m\) so that \(\lambda ^\theta _m= \frac{\theta _{m+1}}{1t^\theta _m}\). Then, for any m between \(\varphi (\theta ,t)\) and \(\varphi (\theta ,t+h)\) one has \(t\le t^\theta _m\le t+h\), so that
We distinguish three cases, depending on whether \(e=1\), \(e>1\) or \(e<1\).
Case \(e=1\). Note that \(\lim _{\Vert \theta \Vert \rightarrow 0}\sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)}\theta _{m+1}= h\). Adding the inequalities of (3.6) for all \(\varphi (\theta ,t)\le m\le \varphi (\theta ,t+h)\) and then taking \(\Vert \theta \Vert\) to 0, one obtains
Hence, \(\lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)}\lambda ^\theta _m=\displaystyle \frac{h}{1t}+o(h)\) as \(h>0\) tends to 0.
Case \(e<1\). In this case \(\theta ^e_{m+1}\ge \Vert \theta \Vert ^{e1}\theta _{m+1}\). From (3.6) one derives
The result follows, since \(\lim _{\Vert \theta \Vert \rightarrow 0} \Vert \theta \Vert ^{e1}=+\infty\) and \(\lim _{\Vert \theta \Vert \rightarrow 0} \sum _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)}\theta _{m+1}=h>0\).
Case \(e>1\). In this case \(\theta ^e_{m+1}\le \Vert \theta \Vert ^{e1}\theta _{m+1}\), and the result follows from (3.6) like the previous case, since \(\lim _{\Vert \theta \Vert \rightarrow 0} \Vert \theta \Vert ^{e1}=0\) in this case. \(\square\)
The following result follows from the convergence of Riemann sums to integrals, and its proof is omitted.
Lemma 3.6
Let \((a^\theta _m)_{m\ge 1}\) be a family of sequences in [0, 1] so that
for some function \(f:[0,1]\rightarrow [0,1]\). Then,
We are now ready to prove Proposition 3.4.
Proof of Proposition 3.4
For each \(m\ge 1\), let \(Q_m^\theta \in \mathbb R^{n\times n}\) be the transition matrix induced by \((\sigma ^\theta ,\tau ^\theta )\) at stage m. By the definition of these strategies, there exist pairs (c, e) and \((c_\ell ,e_\ell )_{\ell \ne k}\) of nonnegative numbers so that
Moreover one can assume without loss of generality that \(e_\ell =+\infty\) whenever \(c_\ell =0\) for all \(\ell \ne k\), so that \(e_\ell \ge e\) for all \(\ell \ne k\). Then, for all \(t\in (0,1)\), the probability of being at k after \(\varphi (\theta ,t)\) stages is given by
Taking \(\Vert \theta \Vert\) to 0, and using Lemma 3.5, one obtains
That is, the limit exists. Similarly, conditional on reaching an absorbing at stage \(m+1\), the probability that \(k_{m+1}=\ell \ne k\) is given by
Therefore, the following limits exist and do not depend on the family of vanishing weights:
Together, (3.8) and (3.9) imply the existence of the following limits
Finally, \(\lim _{\Vert \theta \Vert \rightarrow 0} g(k,\sigma ^\theta ,\tau ^\theta )=g^k_0\in \mathbb R\) exists. Using Lemma 3.6, a similar computation than that of Corollary 3.3 gives then
where \(\alpha _t:= \int _{0}^t s p_s(k)ds +t p_t(k)\). As none of the quantities involved in this expression depend on the family of vanishing weights, the proof in complete. \(\square\)
NB Together, Propositions 3.1 and 3.4 establish Theorem 2.7 for absorbing games, that is:
To complete the proof of Theorem 2.7, we need to tackle smooth stochastic games now.
The constantpayoff property in smooth stochastic games
In this section we prove that the constantpayoff property holds for smooth stochastic games. That is, when there exists a family of optimal stationary strategy profiles \((x_\lambda ,y_\lambda )\) for the discounted game, so that the corresponding family of stochastic matrices \((Q_\lambda )\) satisfies \(\lim _{\lambda \rightarrow 0} \frac{Q_\lambda {\text {Id}}}{\lambda }=A\) for some real matrix \(A\in \mathbb R^{n\times n}\).
The discounted case
Consider a fixed optimal Puiseux strategy profile \((x_\lambda ,y_\lambda )\), and suppose that \(Q_\lambda\) satisfies \(\lim _{\lambda \rightarrow 0} \frac{Q_\lambda {\text {Id}}}{\lambda }=A\) for some real matrix \(A\in \mathbb R^{n\times n}\). Let \(g_0:=\lim _{\lambda \rightarrow 0}g(\,\cdot \,,x_\lambda ,y_\lambda )\in \mathbb R^n\) be the corresponding limit payoff vector.
We start by the following result.
Proposition 4.1
Suppose that the stochastic game (K, I, J, g, q, k) is smooth. Then,
Proof
The proof is similar to the one for the class of absorbing games. By assumption, \(Q_\lambda ={\text {Id}}+ A\lambda + o(\lambda )\). Recall also that \(\lim _{\lambda \rightarrow 0} \lambda \varphi (\lambda ,t)=\ln (1t)\) holds for all \(t\in (0,1)\). Consequently,
Thus, by Lemma 3.6, \(\lim _{\lambda \rightarrow 0} \sum \nolimits _{m= 1}^{\varphi (\lambda ,t)} \lambda (1\lambda )^{m1} Q_\lambda ^{m1}=\int _{0}^t e^{\ln (1s)A}ds\) for all \(t\in [0,1]\). Equivalently, we obtain
The exact same proof as Case 3 of Proposition 3.1 can be used to establish that \(Ag_0=0\). That is, first partition the players’ actions sets I and J into \(I_0,I_*,I_1,I_+\) and \(J_0,J_*,J_1,J_+\) respectively and define matrices \(A_{01},A_*,A_{10}\in \mathbb R^{n\times n}\) so that \(A=A_{01}+A_*+A_{10}\). Second, use the deviations used therein to prove that one has
These equalities imply \(Ag_0=0\). But then, \(f''(t)=\frac{1}{1t}e^{\ln (1t)A} A g_0=0\). Hence, \(f'(t)\) is constant. As \(f(0)=0\) and \(f(1)=v^k\), this implies that \(f(t)=t v^k\) for all t.\(\square\)
The general case
We now establish Theorem 2.7 for smooth stochastic games. By Proposition 4.1, it is sufficient to prove the following result.
Proposition 4.2
Suppose that (K, I, J, g, q, k) is smooth. Let \((\sigma ^\theta ,\tau ^\theta )\) be the family of asymptotically optimal strategies defined in (2.2). Then, for all \(t\in [0,1]\), \(\lim _{\Vert \theta \Vert \rightarrow 0}\gamma _\theta ^k(\sigma ^\theta ,\tau ^\theta ;t)\) exists.
The idea is to introduce a family of continuoustime stochastic processes indexed by \(\theta\), and prove that these processes converge as \(\Vert \theta \Vert\) goes to 0 (along any possible sequence of vanishing weights).
Let \((Y^{k,\theta }_m)_{m\ge 1}\) be the random process of states \((k_m)\) under the law \(\mathbb P^k_{\sigma ^\theta ,\tau ^\theta }\), which is a inhomogeneous Markov chain with transition matrices \((Q^\theta _m)\). Define the piecewise constant process \((X^{k,\theta }_t)_{t\in [0,1)}\) with values in K as follows:
The process \((X^{k,\theta }_t)_{t\in [0,1)}\) is clearly càdlàg. Note also that \(X^{k,\theta }_t=Y^{k,\theta }_1=k\) for all \(t\in [0,\theta _1)\).
In the sequel, we will use the following notation.

For any \(t,h\ge 0\) so that \(0\le t\le t+h< 1\), let \(J^\theta _{[t,t+h)}\) be the number of jumps (i.e. state changes) of the process \((Y^{k,\theta }_m)_{m\ge 1}\) between stage \(\varphi (\theta ,t)\) and stage \(\varphi (\theta ,t+h)\).

For any \(0\le t\le t+h<1\) define the matrix
$$\begin{aligned}P_{t,t+h}^\theta :=\prod _{m=\varphi (\theta ,t)}^{\varphi (\theta ,t+h)1}Q_m^\theta \in \mathbb R^{n\times n}\,. \end{aligned}$$ 
For all \(t\in [0,1)\) and \(1\le \ell \le n\), let \(\mathbb P^\ell _t\) denote the conditional probability on \(\{X_t^{k,\theta }=\ell \}\). Thus, for all \(1\le \ell , \ell '\le n\) and \(0\le t\le t+h< 1\),
$$\begin{aligned}\mathbb P^\ell _t(X^{k,\theta }_{t+h}=\ell '):= \mathbb P(X^{k,\theta }_{t+h}=\ell '\,\, X_t^{k,\theta }=\ell )=P_{t,t+h}^\theta (\ell ,\ell ')\,. \end{aligned}$$
Proposition 4.3
For any \(t\in [0,1)\), as \(h>0\) goes to 0 one has
 (i):

\(\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(X^{k,\theta }_{t+h}=\ell )=\displaystyle 1+\frac{A(\ell ,\ell ) }{1t}h + o(h)\).
 (ii):

\(\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(X^{k,\theta }_{t+h}=\ell ')=\displaystyle \frac{A(\ell ,\ell ')}{1t}h + o(h)\).
Proof
(i)
Conditional to \(\{X^{k,\theta }_{t}= \ell \}\), the event \(\{X^{k,\theta }_{t+h}= \ell \}\) is the disjoint union of \(\{J^\theta _{[t,t+h)}=0\}\) and \(\{X^{k,\theta }_{t+h}= \ell \}\cap \{J^\theta _{[t,t+h)}\ge 2\}\). For the former, one has
Indeed,
and the result follows from Lemma 3.5. For the latter, that is \(\{X^{k,\theta }_{t+h}= \ell \}\cap \{J^\theta _{[t,t+h)}\ge 2\}\), one has
Therefore, \(\lim _{\Vert \theta \Vert \rightarrow 0} \mathbb P^\ell _t(J^\theta _{[t,t+h)}\ge 2)=o(h)\), which together with (4.1) proves the desired result.
(ii) Similarly, conditional on \(\{X^{k,\theta }_{t}= \ell \}\),
Together with (4.2) this equality yields
Conditional on leaving the state \(\ell\) at stage m, the probability of going to \(\ell '\ne \ell\) is given by
By assumption, this converges to \(\frac{A(\ell ,\ell ')}{A(\ell ,\ell )}\) as \(\Vert \theta \Vert\) goes to 0. On the other hand, (4.2) implies
Consequently, using (4.1) one obtains
\(\square\)
Corollary 4.4
The processes \((X^{k,\theta }_{t})_{t \in [0,1)}\) converge, as \(\Vert \theta \Vert\) tends to 0, to an inhomogeneous Markov process with generator \(\left( \frac{1}{1t}A\right) _{t\in [0,1)}\).
Proof
The limit is identified by Proposition 4.3. The tightness is a consequence of (ii). Indeed, this point implies that, for any \(T>0\), uniformly in \(\theta\):
This is precisely the tightness criterion for càdlàg process with discrete values.\(\square\)
The following result is a direct consequence of Corollary 4.4 and Lemma 3.6.
Corollary 4.5
For all \(t\in [0,1)\) the following limit exist:
Proof of Propostion 4.2
The desired result follows directly from Proposition 4.1 and Corollary 4.5, as these two results give
\(\square\)
Final comments
Back to the Big Match. Recall that this stochastic game is both absorbing and smooth. The state space is \(\{k,0^*,1^*\}\) where k is the nonabsorbing state and \(v^k=\frac{1}{2}\). As already discussed, any stationary strategy \((x_\lambda ,y_\lambda )\) satisfying \(x_\lambda ^k(T)=\frac{\lambda }{1+\lambda }\) and \(y_\lambda ^k(L)=\frac{1}{2}\) is optimal in the \(\lambda\)discounted game; the corresponding limit payoff vector is given by \(g_0=(\frac{1}{2}, 0,1)\); and the corresponding Markov chain \(Q_\lambda\) satisfies
For any sequence of nonnegative weights \(\theta =(\theta )_m\) so that \(\sum _{m\ge 1} \theta _m=1\) and any \(m\ge 1\), recall that \(\lambda ^\theta _m=\theta _m(\sum _{m' \ge m} \ \theta _{m'})^{1}\). Then, the asymptotically optimal strategy profile \((\sigma ^\theta , \tau ^\theta )\) is such that, at every \(m\ge 1\), Player 1 plays T at state k with probability \(\frac{\lambda ^\theta _m}{1+\lambda ^\theta _m}\) and Player 2 plays L with probability \(\frac{1}{2}\). The constant payoff property is then as follows:
where \(\Pi _t= \int _{0}^t e^{\ln (1s)A}ds\).
Beyond absorbing or smooth stochastic games. The extension of Proposition 4.2 to all stochastic games is an important challenge, as it would settle the constantpayoff conjecture of Sorin, Venel and Vigeral (2010), i.e. Conjecture 1.1. The main difficulty to overcome is the following: while the continuoustime process \((X^{k,\theta }_t)_{t\in (0,1]}\), with values on K, can be defined for any stochastic game, its limit as \(\Vert \theta \Vert\) goes to 0 will not be smooth in general. For this reason, it may be preferable to replace the state space K with a set of disjoint subsets of states, namely cycles, in the sense of Freidlin and Wentzell (1984), where the process spends a positive amount of time at every visit. Although the aggregation of states poses several technical questions, we conjecture that our approach can be used to solve the general case.
References
Attia L, OliuBarton M (2019) A formula for the value of a stochastic game. Proc Natl Acad Sci 116(52):26435–26443
Bewley T, Kohlberg E (1976) The asymptotic theory of stochastic games. Math Op Res 1:197–208
Blackwell D, Ferguson TS (1968) The big match. Ann Math Stat 39:159–163
Freidlin MI, Wentzell AD (1984) Random perturbations of dynamical systems. Springer, Berlin
Gillette D (1957) Stochastic games with zero stop probabilities, contributions to the theory of games, III. In: Dresher M, Tucker AW, Wolfe P (eds) Annals of the mathematical studies. Princeton University Press, New Jersey, pp 179–187
Kohlberg E (1974) Repeated games with absorbing states. Ann Stat 2:724–738
Laraki R (2010) Explicit formulas for repeated games with absorbing states. Int J Game Theory 39:53–69
Mertens JF, Neyman A (1981) Stochastic games. Int J Game Theory 10:53–66
Mertens JF, Neyman A (1982) Stochastic games. Proc Natl Acad Sci USA 79:2145–2146
Neyman A, Sorin S (2010) Repeated games with public uncertain duration process. Int J Game Theory 39:29–52
Neyman Abraham (2017) Continuoustime stochastic games. Games Econ Behav 104:92–130
OliuBarton M, Ziliotto B (2021) Constant payoff in zerosum stochastic games, to appear in Annals of IHP Probability
Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100
Sorin S, Venel X, Vigeral G (2010) Asymptotic properties of optimal trajectories in dynamic programming. Sankhya A 72:237–245
Sorin S, Vigeral G (2020) Limit optimal trajectories in zerosum stochastic games. Dyn Games Appl 10:555–572
Ziliotto B (2016) A Tauberian theorem for nonexpansive operators and applications to zerosum stochastic games. Math Op Res 41:1522–1534
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
OliuBarton, M. Weightedaverage stochastic games with constant payoff. Oper Res Int J (2021). https://doi.org/10.1007/s12351021006256
Received:
Revised:
Accepted:
Published:
Keywords
 Stochastic game
 Value
 Markov chain
 Constant payoff