Superfair Stochastic Games

A two-person zero-sum stochastic game with a nonnegative stage reward function is superfair if the value of the one-shot game at each state is at least as large as the reward function at the given state. The payoff in the game is the limit superior of the expected stage rewards taken over the directed set of all finite stop rules. If the game has countable state and action spaces and if at least one of the players has a finite action space at each state, then the game has a value. The value of the stochastic game is obtained by a transfinite algorithm on the countable ordinals.


Introduction
A two-person zero-sum stochastic game is played in stages n = 0, 1, 2, . . .and has a nonnegative reward function defined on the state space.At each stage n of the game the players simultaneously select actions from their actions sets.These actions together with the current state determine the distribution of the next state.The payoff from player 2 to player 1 is the limit superior over finite stop rules t of the expectation of the reward function at stage t.The state space and the actions sets at each state are assumed to be countable with at least one of the action sets finite.The game is assumed to be superfair in the sense that, at every state, the value of the one-shot game in which the payoff is the expectation of the reward in the next state is at least as large as the reward in the current state.Superfair stochastic games could be seen as a game-theoretic analogue of a submartingale.Superfair stochastic games encompass for example all positive zero-sum stochastic games (cf.Flesch et al. [3]).Superfair games also include the leavable games of Maitra and Sudderth [6].Leavable games are obviously superfair since player 1 is allowed to stop and receive the value of the reward function at any stage.In the special case when the reward function is bounded and the action sets are finite, superfair games can be viewed as limsup games in the sense of [6].
The game is shown to have a value that can be obtained by a transfinite algorithm over the countable ordinals.This generalizes a similar result for positive zero-sum stochastic games in Flesch et al. [3].
Player 2 is proved to have an ε-optimal Markov strategy for every ε > 0. Whether such strategies exist for player 1 is an open question.If all the action sets of player 2 are finite then player 2 has an optimal stationary strategy and, for every state and every ε > 0, player 1 has a stationary strategy that is ε-optimal if the game starts in the given state.
Organization of the Paper: In Sect.2, we present our model.In Sect.3, we discuss an illustrative example.In Sect.4, we state our main result and provide an outline of the proof.In Sect. 5 we present an algorithm, which we use in Sects.6 and 7 to prove the two main steps of the proof on the upper value and the lower value, respectively.In Sect.8, we prove our results on the value of the game.In Sect.9, we discuss optimal strategies of the players.In Sect. 10, we make some concluding remarks.

The Game
We consider zero-sum stochastic games with countable state and action spaces such that, in each state, at least one of the players has only finitely many actions.The game is played at stages in N = {0, 1, 2, . ..} and begins at some state s 0 ∈ S. At every stage n ∈ N, the play is in a state s n ∈ S. In this state, player 1 chooses an action a n ∈ A(s n ) and simultaneously player 2 chooses an action b n ∈ B(s n ).Then, player 1 receives the reward r (s n ) from player 2, and the state s n+1 is drawn according to the probability measure p(s n , a n , b n ), where the play of the game continues at stage n + 1.

Histories and Plays
Let H n denote the set of histories at stage n, and let H = ∪ n∈N H n denote the set of all histories.For each history h ∈ H , let s h denote the final state in h.
A play is a sequence of the form We endow the set (S × A × B) ∞ with the product topology, where each of the sets S, A, and B has its discrete topology.The set H ∞ of plays is a closed subset of (S × A × B) ∞ , which we endow with the induced subspace topology.

Strategies
A mixed action for player 1 in state s ∈ S is a probability measure x(s) = (x(s, a)) a∈A(s) on A(s).Similarly, a mixed action for player 2 in state s ∈ S is a probability measure y(s) = (y(s, b)) b∈B(s) on B(s).The respective sets of mixed actions in state s are denoted by (A(s)) and (B(s)).
A strategy for player 1 is a map π that to each history h ∈ H assigns a mixed action π(h) ∈ (A(s h )).Similarly, a strategy for player 2 is a map σ that to each history h ∈ H assigns a mixed action σ (h) ∈ (B(s h )).The set of strategies is denoted by for player 1 and by for player 2.

Stop Rules
A finite stop rule is a mapping t : H ∞ → N with the following property: if for some play h ∞ = (s 0 , a 0 , b 0 , s 1 , . ..) ∈ H ∞ we have t(h ∞ ) = n, and h ∞ ∈ H ∞ is another play that begins with the history Intuitively, t assigns a stage to each play in such a way that the assigned stage only depends on the history in that stage.For finite stop rules t and t we write The set of finite stop rules equipped with the relation ≥ is a partially ordered directed set.1

Payoffs
An initial state s ∈ S and a pair of strategies (π, σ ) ∈ × determine, by Kolmogorov's extension theorem, a unique probability measure P s,π,σ on H ∞ .The corresponding expectation operator2 is denoted by E s,π,σ .
The payoff for a pair of strategies (π, σ ) ∈ × , when the initial state is s ∈ S, is defined as where t and t range over the directed set of finite stop rules.This payoff was introduced by Dubins and Savage ( [2], p. 39) and several of its properties are explained in Section 4.2 of Maitra and Sudderth [6].Player 1's objective is to maximize the payoff given by u, and player 2's objective is to minimize it.For a discussion of alternative payoffs, we refer to Sect.10.

Superfair Games
A zero-sum stochastic game is called superfair if, in each state, player 1 can play so that the expectation of the reward in the next state is at least as large as the reward in the current state.To make this notion precise, we introduce an associated one-shot game.
For a function φ : S → [0, ∞] and state s ∈ S, the one-shot game M φ (s) is the zero-sum game in which player 1 selects an action a ∈ A(s) and simultaneously player 2 selects an action b ∈ B(s), and then player 2 pays player 1 the amount s ∈S φ(s ) p(s |s, a, b).For mixed actions x(s) ∈ (A(s)) and y(s) ∈ (B(s)), let the expected payoff in the one-shot game M φ (s) be denoted by The game M φ (s) has a value, denoted by Gφ(s), by Theorems 2 and 3 in Flesch et al. [3].Thus Note that, generally, the supremum and infimum cannot be replaced with maximum and minimum, because the action set of one of the players can be infinite.The zero-sum stochastic game is called superfair if Gr(s) ≥ r (s) for all states s ∈ S.

Value and Optimality
Consider a zero-sum stochastic game .The lower value for the initial state s ∈ S is defined as Similarly, the upper value for the initial state s ∈ S is defined as The inequality α(s) ≤ β(s) always holds.If α(s) = β(s), then this quantity is called the value for the initial state s and it is denoted by v(s).Then, for ε ≥ 0, a strategy π ∈ for player 1 is called ε-optimal for the initial state s if u(s, π, σ ) ≥ v(s) − ε for every strategy σ ∈ for player 2. Similarly, a strategy σ ∈ for player 2 is called ε-optimal for the initial state s if u(s, π, σ ) ≤ v(s) + ε for every strategy π ∈ for player 1.
If the value exists for every initial state, then for every ε > 0, each player has a strategy that is ε-optimal for every initial state.We call these strategies ε-optimal.A 0-optimal strategy is simply called optimal.

An Illustrative Example
In this section we discuss an illustrative example, which is based on Examples (6.6), p. 183 and (9.1), p. 191 in Maitra and Sudderth [6].
Example 1 Let S = N = {0, 1, 2 . ..}.The state 0 is absorbing.For s ≥ 1, the action sets are A(s) = B(s) = {1, 2} and the transition law satisfies where p ab ∈ [0, 1] for every a, b ∈ {1, 2}.Let the reward function be r (0) = 1 and r (s) = 0 for all s ≥ 1.The game is superfair because In view of the reward function r , it is clear that, from any state s ≥ 1, player 1 wants to move to s − 1 and player 2 to move to s + 1.So it is not surprising, and is proved in [6], that optimal (stationary) strategies π and σ for players 1 and 2, respectively, are to play, at every stage and at every state s ≥ 1, optimal mixed actions in the matrix game: Let w be the value of this matrix game.Then, for s ≥ 1, the distribution P s,π,σ of the process s, s 1 , s 2 , . . . of states is that of a simple random walk that moves to the right 123 with probability 1 − w and to the left with probability w at every positive state, and is absorbed at state 0. The value v(s) of the stochastic game is exactly the P s,π,σ -probability of reaching state 0, namely: for a proof we refer to [6].Now suppose that the reward function is the identity r (s) = s for all s ∈ N. Because r is an increasing function on S, from any state s ≥ 1, player 1 wants to maximize and player 2 wants to minimize the chance of moving to s + 1.Let w be the value of the matrix game Assume also that w > 1/2.Then the game is superfair because, for each state s ≥ 1 (cf.also p. 183 in [6]): Optimal (stationary) strategies π and σ for the players in the stochastic game are now to play optimally in the matrix game M at every stage and at every state s ≥ 1.The distribution P s,π,σ of the process of states is then that of a simple random walk that moves right with probability w and left with probability 1 − w at every positive state, and is absorbed at state 0. Because w > 1/2, there is positive probability that the random walk will converge to infinity starting from any state s ≥ 1.Hence, the value function is If 1/2 < w < 1, then there is also positive probability of being absorbed at state 0. ♦

Main Result
The main result of the paper is the following theorem.
Theorem 1 Every superfair zero-sum stochastic game has a value v(s) for every initial state s ∈ S.Moreover, there is a transfinite algorithm for calculating the value function v = (v(s)) s∈S .
A similar result was proven for so-called positive zero-sum stochastic games in Flesch et al. [3].A positive zero-sum stochastic game is defined as a game in Definition 1, but instead of Eq. ( 1), the payoff is defined as the sum of the expected stage rewards.Because every positive zero-sum stochastic game can be transformed 3 into an equivalent zero-sum stochastic game as in Definition 1 with the payoff given in Eq. ( 1), and because this latter game is by construction superfair, Theorem 1 above generalizes Theorem 1 in Flesch et al. [3] for positive zero-sum stochastic games. 4here is also a transfinite algorithm for calculating the value of a limsup game in Maitra and Sudderth [6].Unlike the algorithms for superfair games and for positive games, the algorithm for a limsup game is not, in general, monotone.It may require infinitely many steps both increasing and decreasing.
The proof of Theorem 1 consists of the following steps.In Sect.5, we present a transfinite algorithm that results in a function r * : S → [0, ∞].In Sect.6 we show that r * is bounded from below by the upper value β of the game, while in Sect.7 we show that r * is bounded from above by the lower value α of the game: thus, β ≤ r * ≤ α.Since α ≤ β, this will imply that r * is the value of the game.

The Algorithm
In this section, we consider the following transfinite algorithm on the countable ordinals.Let r 0 = r and define, for each countable ordinal ξ , We remark that the first use of a transfinite algorithm to calculate the value of a stochastic game was by Blackwell [1].Other examples are in Maitra and Sudderth [5,6], and Flesch et al. [3].
Finally, Claim 3 follows from a standard cardinality argument.Indeed, for every state s ∈ S, the sequence (r ξ (s)) ξ is a nondecreasing sequence of real numbers.Hence, the sequence (r ξ (s)) ξ can only have countably many different elements. 5Thus, for state s ∈ S, there exists a countable ordinal ξ s such that r ξ s (s) = r ξ (s) for every ξ ≥ ξ s .Now take ξ * = sup s∈S ξ s .Because S is countable, ξ * is a countable ordinal.
To simplify notation, let r * = r ξ * where ξ * is from Lemma 1.Now assume that φ satisfies conditions (a) and (b).We will show that φ ≥ r ξ for all ξ by transfinite induction, and so φ ≥ r * = r ξ * .
The next two sections will show that the value of the superfair game is r * (s) for every initial state s ∈ S.

The Upper Value
In this section we show that r * , which we defined below Lemma 1, is at least as large as the upper value of the game.Lemma 3 Let ε > 0, and, for every stage n ∈ N and every state s ∈ S, let y n (s) ∈ (B(s)) be a mixed action for player 2 that is ε/2 n+1 -optimal for player 2 in the one-shot game M r * (s).Define σ to be the strategy for player 2 such that, for every stage n ∈ N and every history h n = (s 0 , a 0 , b 0 , . . .,

e., for every mixed action x(s n )
Then, for every initial state s ∈ S and every strategy π for player 1, we have u(s, π, σ ) ≤ r * (s) + ε. (3) Consequently, for every initial state s ∈ S, we have β(s) ≤ r * (s).
Proof Let ε > 0, and choose a strategy σ for player 2 as in the lemma.Fix an initial state s = s 0 ∈ S and a strategy π for player 1.By Eq. ( 2), the process for every finite stop rule t.Using Eq. ( 1) and r * ≥ r (cf.Claim 2 of Lemma 1), we have which completes the proof.

The Lower Value
In this section we show that r * is at most as large as the lower value of the game.

Lemma 4 For every initial state s ∈ S, we have α(s) ≥ r * (s).
Proof It is sufficient to verify that the function α satisfies conditions (a) and (b) of Lemma 2.
Step 1 First we verify that α satisfies condition (b) of Lemma 2. Let ε > 0, and choose a strategy π for player 1 such that for every stage n ∈ N and every history -optimal for player 1 in the one-shot game M r (s n ), i.e., for every mixed action y(s n ) ∈ (B(s n )) of player 2, Fix an initial state s = s 0 ∈ S and a strategy σ for player 2.Then, by Eq. ( 4) and Claim 1 of Lemma 1, for every history h n ∈ H n at stage n ∈ N we have Thus, the process . . is a P s,π,σ -submartingale.By an optional sampling theorem (cf.Maitra and Sudderth [6], Section 2.4), for every finite stop rule t.Hence, by Eq. ( 1), and therefore α(s) ≥ r (s) − ε.Because ε is an arbitrary positive number, α satisfies condition (b).
Step 2 Now we verify that α satisfies condition (a) of Lemma 2. Fix an initial state s = s 0 ∈ S and let R < Gα(s).Because Gα(s) is the value of the one-shot game M α (s), there exists a mixed action x * (s) ∈ (A(s)) for player 1 such that, for every action b ∈ B(s) for player 2, Hence, by the monotone convergence theorem, for every action b ∈ B(s) of player 2, there is some m(b) ∈ N such that The reason for taking the minimum with m(b) is that some values of α(s ) may be infinite.
Let ε > 0. By the definition of α, for each state s ∈ S and each m ∈ N, there is a strategy π m s for player 1 such that, for every strategy σ for player 2, we have Now let π be the following strategy for player 1: (i) in stage 0, in state s = s 0 , the strategy π plays the mixed action x * (s), (ii) from stage 1 on, from state s 1 , the strategy π follows the strategy π , where b 0 denotes the action chosen by player 2 in stage 0.
Then, for every strategy σ of player 2, where σ [h 1 ] denotes the continuation strategy of σ from stage 1 onward.The first equality in this calculation follows from Theorem 2.12, p. 64, in Maitra and Sudderth [6] when both sides are finite, and the proof of that result can be modified to handle the case when there are infinite values.The first inequality follows from Eq. ( 6), and the last inequality follows because Eq. ( 5) holds for every action b 0 ∈ B(s).Therefore, As the number R < Gα(s) is otherwise arbitrary and ε is an arbitrary positive number, we conclude that α(s) ≥ Gα(s).This implies that α satisfies condition (a).

The Value
Theorem 1 now follows from Lemmas 3 and 4. It also follows that r * is equal to the value function v of the game.
We have two additional results on the value of the game in the following special cases: when the game is fair and when the action sets of player 2 are finite.
A zero-sum stochastic game as in Definition 1 is called fair if Gr = r .

Corollary 1 If the zero-sum stochastic game is fair, then
Proof It is clear from the algorithm that r ξ = r for all ξ .Thus v = r * = r .
If all the action sets for player 2 are finite, then the algorithm for the value function reaches a fixed point not later than at ordinal ω (i.e., ξ * ≤ ω), where ω denotes the first infinite ordinal.

Theorem 2 If all the action sets B(s), s ∈ S, of player 2 are finite, then
Proof By Claim 2 of Lemma 1, r ω ≥ r .So, by Lemma 2, it suffices to show that Gr ω ≤ r ω .
Fix s ∈ S and ε > 0. Let R < Gr ω (s).Because Gr ω (s) is the value of the one-shot game M r ω (s), there exists a mixed action x(s) ∈ (A(s)) for player 1 such that, for all actions b ∈ B(s) of player 2, As r ω (s) is the increasing limit of r n (s) as n → ∞, by the monotone convergence theorem, we have for n sufficiently large and all b ∈ B(s) here we rely on the assumption that the set B(s) is finite.Thus, for large n, Since R was an arbitrary number smaller that G r ω (s), we can conclude that r ω (s) ≥ Gr ω (s), as desired.
It can happen that r ω < v if player 2 has an infinite action set.The following example is a slight variation on Example 6 in Flesch et al. [3].
Example 2 Let S = {0, 1, 2, . ..} and let the reward r be such that r (1) = 1 and r (n) = 0 for all states n ∈ S\{1}.Player 1 is a dummy (i.e., A is a singleton) and player 2's action set in state 0 is B(0) = {1, 2, . ..}.The transitions are as follows: (i) In state 0, playing action n leads to state n, (ii) state 1 is absorbing, and (iii) in each state n ≥ 2, regardless the action of player 2, play moves to state n − 1.Let the initial state be state 0.
In this game, by choosing an initial action b > n + 1, player 2 can guarantee a reward of 0 for the first n stages.This implies that r n (0) = (G n r )(0) = 0 for each n ∈ N, and hence r ω (0) = sup n r n (0) = 0.However, as state 1 is absorbing, the value of the game with initial state 0 is v(0) = 1.In fact, in this game, the value function v is equal to r ω+1 = Gr ω .♦

Markov Strategies and Stationary Strategies
In this section we examine the optimal and ε-optimal strategies of the players in the superfair zero-sum stochastic game.
A selector for player 2 is a mapping y : S → (B) such that, for every s ∈ S, y(s) ∈ (B(s)).A strategy σ for player 2 is called stationary if there is a selector y such that σ (h) = y(s h ) for every history h ∈ H ; recall that s h denotes the final state of the history h.Thus a stationary strategy ignores the past and the current stage of the game when choosing an action.
A strategy σ for player 2 is called Markov if there is a sequence of selectors y 0 , y 1 , . . .for player 2 such that σ (h) = y n (s h ) for every history h ∈ H n in every stage n ∈ N.So a Markov strategy ignores the past but may make use of the current stage of the game.
Stationary and Markov strategies for player 1 are defined similarly.

Theorem 3
In every superfair zero-sum stochastic game, player 2 has an ε-optimal Markov strategy for every ε > 0. If all the action sets B(s), s ∈ S, of player 2 are finite, then player 2 has an optimal stationary strategy.
Proof The strategy σ in the statement of Lemma 3 is Markov.It is also ε-optimal by Eq. ( 3) because r * = v is the value function for the game.This proves the first assertion.Now assume that the action sets B(s), s ∈ S, are all finite.By Theorem 2 in Flesch et al. [3], player 2 has, for every state s ∈ S, an optimal mixed action y(s) ∈ (B(s)) for the one-shot game M v (s).The argument for Lemma 3 can be repeated with y n = y for all n and ε = 0 to show that the stationary strategy σ corresponding to the selector y is optimal.
We do not know whether the first assertion of Theorem 3 holds for player 1; that is, whether ε-optimal Markov strategies exist for player 1 for all ε > 0. Kiefer et al. [7] constructed a very nice and delicate Markov decision problem on a countable state space, with a so-called Büchi payoff, in which the decision maker has no ε-optimal Markov strategies for small ε > 0. This illustrates that Markov strategies are not always sufficient when the state space is countable.Their Markov decision problem, however, is not superfair in our sense.
The second assertion of Theorem 3 is false for player 1.Indeed, there is an example in Kumar and Shiau [4] with three states and two actions for each player in which player 1 has no optimal strategy.(The example is also on p. 192 in Maitra and Sudderth [6].) Here is a positive result for player 1 under further assumptions.
Theorem 4 If the reward function r is bounded and all the actions sets B(s), s ∈ S, of player 2 are finite, then, for every ε > 0 and every state s ∈ S, player 1 has a stationary strategy that is ε-optimal for the initial state s.
Proof This follows from Theorem 2 and the argument for Corollary 3.9 in Secchi [8].

Related Payoff Functions
In our model, the payoff function u was defined in Eq. ( 1).There are a few related payoff functions that we briefly discuss here: for every pair of strategies (π, σ ) ∈ × and initial state s ∈ S, define where as before, n denotes a stage of the game.The connection between k and and the payoff function u stems from the fact that stage n can be seen as the outcome of the stop rule that is constant n (i.e., always stops in stage n).In the specific case when the reward function r is bounded, we have k ≤ = u, where the inequality follows by Fatou's lemma and the equality follows from Theorem 2.2 in Maitra and Sudderth [6].However, when the reward function r is unbounded, this is no longer true.Consider the following example, in which < k = u.In this game, s n eventually reaches state 0 with probability 1.Hence, the payoff function gives payoff 0 (recall that we follow the convention that 0 • ∞ = 0).On the other hand, which implies that the payoff function k gives payoff 1.One can similarly deduce that u also gives payoff 1 (this also follows directly from Corollary 1).♦ We do not know if the statement of Theorem 1 holds for the payoff function : when the reward function r is unbounded, we do not know, in general, how to calculate the value of the game for the payoff function , or even whether the game has a value.We are also not aware of a paper where an algorithm is given for calculating the value of the game for the payoff function k, or even where it is proven that the game with the payoff function k admits a value.

Approximating the Reward Function with Bounded Functions
In games with unbounded reward functions, it is a frequently used technique to approximate the reward function with bounded functions and investigate what happens when the mistake in the approximation tends to zero.In our case, it is natural to consider the reward function r m : S → [0, ∞), for every m ∈ N, defined by r m (s) = min{r (s), m} for every state s ∈ S.
For every m ∈ N, and for every pair of strategies (π, σ ) ∈ × and initial state s ∈ S, define the payoff as u m (s, π, σ ) = lim sup t E s,π,σ [r m (s t )]; the payoff is similar to Eq. ( 1) but the reward function r m is used instead of the original r .Let v m denote the value function of the game with respect to the payoff function u m .As v m is non-decreasing in m, it converges to some v ∞ ∈ [0, ∞] S as m tends to infinity.
One may wonder whether v ∞ is always equal to the value function of the game v [with respect to the original payoff in Eq. ( 1)].This is not true, as in Example 3 for the initial state 1 we have v m (1) = 0 for every m ∈ N and hence v ∞ (1) = 0, whereas the value is v(1) = 1.

Rewards that also Depend on the Actions
In our model of zero-sum stochastic games (cf.Definition 1), the reward function r only depends on the current state.One could extend the model by allowing the reward function to depend on the current actions well, and have the form r : S × A × B → [0, ∞).However, in this extended model, it is not clear how to define the notion "superfair" without this notion being too restrictive.One natural idea would be to transform a zero-sum stochastic game in which the rewards depend on the current state and the current actions into a zero-sum stochastic game in which the rewards only depend on the current state (by extending the state space so that each state in the latter game corresponds to a triple consisting of a state of the former game and an action for each player).As it turns out, after applying such a transformation, the latter game is only superfair under rather restrictive conditions on the former game.
It remains an open problem to find an appealing definition of the notion "superfair" when the reward function may also depend on the current actions.The third equality is by Theorem 4.2.7,p. 62 in Maitra and Sudderth [6].We also used that in the stage rewards are nondecreasing during the play, so the limit superior over finite stop rules can be replaced with the limit.

Lemma 2
The function r * is the least function φ : S → [0, ∞] such that (a) Gφ ≤ φ and (b) φ ≥ r.Proof The function r * satisfies condition (a) by Claim 3 of Lemma 1.The function r * also satisfies condition (b) because, by Claim 2 of Lemma 1, we have r * = r ξ * ≥ r 0 = r .
Definition 1 A zero-sum stochastic game (with countable state and action spaces) is a tuple = (S, A, B, (A(s)) s∈S , (B(s)) s∈S , p, r ), where: 1. S is a nonempty and countable state space.2. A and B are nonempty and countable action sets for player 1 and player 2, respectively.3.In each state s ∈ S, the sets A(s) ⊆ A and B(s) ⊆ B are nonempty, such that at least one of the sets A(s) and B(s) is finite.The elements of A(s) and B(s) are called the available actions in state s for player 1 and player 2, respectively.4. p is a transition law: to every state s ∈ S and actions a ∈ A(s), b ∈ B(s), the function p assigns a probability measure p(s, a, b) = ( p(s |s, a, b)) s ∈S on S. 5. r : S → [0, ∞) is a non-negative reward function.
Example 3 (Double-or-Nothing) Let S = {0, 1, 2, 2 2 , . . ., 2 n , . ..} and r (s) = s for every state s ∈ S. Assume that both players are dummies (i.e., A and B are singletons), and therefore, to simplify the notation, we omit the strategies from the notations.The transitions are as follows: state 0 is an absorbing state, and p(2 s|s) = p(0|s) = 1/2 for every state s > 0. Let the initial state be state 1.Note that the game is superfair and even fair.