Discrete stop-or-go games

Dubins and Savage (How to gamble if you must: inequalities for stochastic processes, McGraw-Hill, New York, 1965) found an optimal strategy for limsup gambling problems in which a player has at most two choices at every state x at most one of which could differ from the point mass δ(x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta (x)$$\end{document}. Their result is extended here to a family of two-person, zero-sum stochastic games in which each player is similarly restricted. For these games we show that player 1 always has a pure optimal stationary strategy and that player 2 has a pure ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document}-optimal stationary strategy for every ϵ>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon > 0$$\end{document}. However, player 2 has no optimal strategy in general. A generalization to n-person games is formulated and ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document}-equilibria are constructed.

time, Dubins and Savage (1965) were formulating their theory of gambling problems which encompassed a theory of when to stop playing. The relationship between the two theories is treated by Dubins and Sudderth (1977a).
A class of two-person, zero-sum stopping problems was defined by Dynkin (1969), who also showed his games to have a value. A number of mathematicians have extended these results in various directions such as to non-zero sum games and n-person games. (See, for example, Rosenberg et al. (2001), Shmaya et al. (2003Shmaya et al. ( , 2004, Mashiah-Yaakovi (2014), or Solan and Laraki (2013).) Here we propose a theory of two-person, zero-sum games related to Dubins-Savage gambling theory. In the formulation of Dynkin, exactly one player was able to halt play at every stage. Here it is possible that one or both or neither of the players can stop at some states. The payoff for Dynkin's game was zero if play was never stopped whereas we use the Dubins-Savage payoff corresponding to the limsup of the values of a utility function. These games are shown to have a value and good strategies are found for the players. A related class of n-person games is defined and shown to have approximate equilibria.

The game
A two-person stop-or-go game G = (S, A, B, q, u) is a two-person, zero-sum stochastic game such that S is a countable non-empty state space; for each x ∈ S there are action sets A(x) = {g, s} or {g} and B(x) = {g, s} or {g} for players 1 and 2, respectively; the law of motion q satisfies q(·|x, g, g) = α(x) where α(x) is a countably additive probability measure defined on all subsets of S and q(·|x, a, b) = δ(x), the point mass at x, if either a = s or b = s; the utility function u : S → R is assumed to be bounded. Note that each player has available the "go" action g at every state but may only have the "stop" action s at certain assigned states. At any given state, the action s may be available to one or both of the players or to neither of them. It is possible that α(x) = δ(x) for some states x. Also a player may "stop" temporarily by playing action s at one stage and action g at the next stage.
The game is played at stages in N = {0, 1, . . .}. Play begins at an initial state x = x 0 ∈ S. At every stage n ∈ N, the play is in a state x n ∈ S. In this state, player 1 chooses an action a n ∈ A(x n ) and simultaneously player 2 chooses an action b n ∈ B(x n ). The next state x n+1 has distribution q(·|x n , a n , b n ). Thus, play of the game generates a random infinite history h = (x 0 , a 0 , b 0 , x 1 , a 1 , b 1 , . . .). The payoff from player 2 to player 1 is u * (h) = lim sup n u(x n ). Of course, this limsup is just u(x k ) if either player chooses the stop action s from stage k onwards.

Strategies and expected payoffs
The set of histories ending at stage n is denoted by H n . Let Z = {(x, a, b)|x ∈ S, a ∈ A(x), b ∈ B(x)}. Then H 0 = S and H n = Z n × S for every stage n ≥ 1. Let H = ∪ n∈N H n denote the set of all finite histories. For each history h ∈ H , let x h denote the final state in h. Let H ∞ = Z × Z × · · · be the collection of all infinite histories h = (x 0 , a 0 , b 0 , x 1 , a 1 , b 1 , . . .).
A strategy for player 1 is a map π that to each finite history h ∈ H assigns a probability distribution (that is, a mixed action) on A(x h ). Similarly, a strategy for player 2 is a map σ that to each history h ∈ H assigns a probability distribution on B(x h ). Let Π and Σ denote the sets of strategies for players 1 and 2 respectively. A strategy is called stationary if the assigned mixed actions only depend on the history through its final state.
Beginning at some initial state x = x 0 , player 1 chooses a strategy π and player 2 chooses a strategy σ . The strategies together with the law of motion and the initial state x determine the distribution P x,π,σ of the infinite history h ∈ H ∞ . The payoff from player 2 to player 1 is the expected value Player 1's objective is to maximize this expected payoff and player 2 seeks to minimize it.

Value and optimality
A two-person stop-or-go game is a special limsup stochastic game of the type treated in Maitra and Sudderth (1992) and it follows from their results, or from the more general result of Martin (1998), that it has a value V (x) for every initial state x ∈ S, i.e.
For ≥ 0, a strategy π ∈ Π for player 1 is called -optimal for initial state x if u(x, π, σ ) ≥ V (x) − for every strategy σ ∈ Σ for player 2. Similarly, a strategy σ ∈ Σ for player 2 is called -optimal for initial state x if u(x, π, σ ) ≤ V (x) + for every strategy π ∈ Π for player 1. A strategy is called -optimal if it is -optimal for every initial state. A 0-optimal strategy is called optimal.

n-person games
The two-person model of this section will be generalized to games with an arbitrary finite number of players in Sect. 8 below. The results on two-person games in the earlier sections will be used to construct -equilibria for n-person games.

Theorems for two-person stop-or-go games
Our first result is that player 1 always has an easily described optimal stationary strategy.
Theorem 1 A pure optimal stationary strategy for player 1 is, at every state x, to play action s if u(x) = V (x) and s ∈ A(x), and to play action g otherwise.
If player 2 is a dummy with only one action at every state, then this theorem specializes to give a version of the original result of Dubins and Savage (1965, page 61) for one-person problems. It was their result which led us to the study of stop-or-go games.
Player 2 need not have an optimal strategy, much less a stationary optimal strategy, as the following example, adapted from Sudderth (1983), shows.
Example 1 Let S = {1, 2, . . .}; u(n) = n −1 − 1 for n odd, u(n) = 0 for n even; A(n) = {g}, B(n) = {s, g}; q(n + 1|n, g, g) = 1 and, by definition of action s, q(n|n, g, s) = 1. The value of the game is −1 at every state because player 2 can play g a large number of times and then stop at a large odd number. However, no strategy for player 2 can achieve the value. Note that player 1 is a dummy with only one action at every state.
However, player 2 does have nearly optimal stationary strategies.
Theorem 2 If > 0, then a pure -optimal stationary strategy for player 2 is to play action s if u(x) ≤ V (x) + and s ∈ B(x), and to play action g otherwise.
If the state space is finite, then player 2 has an optimal stationary strategy.
Corollary 1 If S is finite, then a pure optimal stationary strategy for player 2 is to play action s if u(x) ≤ V (x) and s ∈ B(x), and to play action g otherwise.

Proof
For each positive integer n, let σ n be the strategy of Theorem 2 when = 1/n. So, by the theorem, σ n is 1/n-optimal for player 2. Now each σ n is a pure stationary strategy and, because S is finite, there are only finitely many pure stationary strategies. So some strategy, say σ * , must occur infinitely often in the sequence of the σ n . It follows that σ * is optimal and also that σ * is the strategy described in the statement of the corollary.
Remark 1 Suppose that the rules of a stop-or-go game are made more restrictive in the sense that, whenever a player uses the stop action, the game ends and the payoff to player 1 from player 2 is the utility at the current state. Then no temporary stops are available and a player must at each stage either play the go action g or stop permanently. The stationary strategies of Theorems 1 and 2 are still available because any stationary strategy that plays s at a state x must continue to do so. Thus the theorems still hold and the value of the game is unchanged.
Remark 2 A drawback of Theorems 1 and 2 is that the stationary strategies they specify depend on the value function V . However, an algorithm for calculating V is given in Maitra and Sudderth (1992). In general, this algorithm requires iterating an operator for a number of steps up to an arbitrary countable ordinal. In the special case when the state space is finite, the algorithm terminates at the first countable ordinal (Theorem 11.13, page 201, Maitra and Sudderth 1996).

Remark 3
The limsup payoff u * is more general than it may first appear. Suppose the state at stage n is defined to be y n = (x 0 , a 0 , b 0 , x 1 , . . . , x n−1 , a n−1 , b n−1 , x n ) and the utility u is taken to be a bounded function of the y n . For example, the utility function could be of the form where r is a bounded real-valued function. With this change of variable, the notion of a stationary strategy loses interest. However, Theorems 1 and 2 also tell us that there exist pure subgame perfect strategies for player 1 and pure subgame -perfect strategies for player 2.
The next section has some preliminary results. Section 5 is devoted to the proof of Theorem 1. Section 6 treats the special case of games with 0-1 valued utility functions. Section 7 is for the proof of Theorem 2. Section 8 is on n-person games. The final section mentions possible generalizations.

The optimality equation
For each x ∈ S, let M(x) be the one-shot game with action sets A(x) for player 1 and B(x) for player 2 and with payoff for actions a ∈ A(x) and b ∈ B(x) equal to y∈S V (y)q (y|x, a, b).
where μ and ν range over the probability measures on A(x) and B(x), respectively.
This is a standard result for stochastic games. See, for example, Flesch et al. (2018).

Continuation strategies
Given a finite history h = (x 0 , a 0 , b 0 , . . . , x n ) and a strategy π for player 1, the continuation strategy of π at h is the map π [h] that to each finite history h = (x n , a n , b n , . . . , x m ) assigns the probability distribution π(hh ), where hh = (x 0 , a 0 , b 0 , . . . , x n , a n , b n , . . . , x m ). Intuitively, in the subgame at h, this is the strategy induced by π . For a strategy σ of player 2, the continuation strategy σ [h] is defined analogously.

A useful equality
Let τ be a stopping time. The following equality can be thought of as separating the payoff into that earned before time τ and that after time τ : Here t varies over the directed set of stop rules. The limsup of a real-valued function of stop rules r (t) is defined by lim sup t r (t) = inf s sup t≥s r (t) where both s and t range over the collection of stop rules. Similarly, lim inf t r (t) = sup s inf t≥s r (t). The symbol h τ denotes that part of the infinite history h ∈ H ∞ up to time τ and π [h τ ] and σ [h τ ] are the continuation strategies for π and σ . Equality (1) first appeared in Dubins and Sudderth (1977b) and is also on page 66 of Maitra and Sudderth (1996). It will be convenient to have a slight variation of equality (1) (2) This equality is easily obtained from equality (1) by replacing u with −u.

The proof of Theorem 1
and s ∈ A(x) and let a(x) = g otherwise. Let π be the stationary strategy for player 1 that plays action a(x) at each state x. Thus π is the strategy that is asserted in Theorem 1 to be optimal for player 1. (Note that if u(x) ≥ V (x) and s ∈ A(x), then u(x) = V (x) because player 1 can guarantee a payoff of u(x) by playing s forever.) We will now prove that π is optimal.

The strategy conserves V
The next lemma shows that, if player 1 uses the strategy π , then the value function cannot decrease in expectation.
Lemma 2 For every initial state x = x 0 and every strategy σ for player 2, the process Proof Because π is the stationary strategy that plays action a(x) at each state x, it suffices to show that, for each Here a(x) = s gives payoff V (x) in the one-shot game M(x) and is clearly optimal in M(x).
In this case, a(x) = g is the unique action available to player 1 and must therefore be optimal in M(x). .
For an argument by contradiction, assume that Let π 1 be an 1 -optimal strategy for player 1. Define the stopping time τ as the first time (if any) when π 1 uses the action g; that is, for each infinite his- . . , a n−1 , b n−1 , x n ). Note that prior to time τ the strategy π 1 is playing s and so the process of states remains at x 0 = x. In particular, x τ = x with probability 1 if τ < ∞. Thus at time τ player 1 plays g and player 2 must play g, so the conditional distribution of x τ +1 is q(·|x, g, g) = α(x).
We set P = P x,π 1 ,σ in the calculations below. By equation (1), we have If τ (h) = n, then, with probability one, h n is of the form s, g, x, s, g, x, . . . , s, g, x) and π 1 (h n ) = σ (h n ) = g. Hence, the next state y = x n+1 has distribution α(x). So, by the choice of the strategy σ , .
Combining these inequalities, we have which contradicts our choice of 1 < /2. This completes the proof.
Lemma 3 For every initial state x, every strategy σ for player 2, and every stop rule

Player 1 reaches good states by using
The objective in this section is to show that, if player 1 plays the strategy π then, for every strategy of player 2, it is almost certain that states x will be reached where the utility u(x) is almost as large as the value V (x). To be precise, for > 0, define Lemma 4 For every initial state x, every strategy σ for player 2 and all > 0, Proof Fix > 0. Suppose first that player 2 plays the stationary strategy σ 1 which always plays the action g. In this special case, player 1 faces a one-person problem that is equivalent to a stop-or-go gambling problem as defined in Section 5.4 of Maitra and Sudderth (1996). Let W be the value function for this one-person problem. Then it follows from Corollary 4.7, page 99 in Maitra and Sudderth (1996) that an optimal strategy for player 1 versus σ 1 is the strategy π 1 that plays action s at state x if u(x) ≥ W (x) and s ∈ A(x), and plays action g otherwise. Now W ≥ V because player 2 has been restricted to play σ 1 in the one-person problem. So π 1 certainly plays g at state x if u(x) < V (x) and thereby agrees with π on this set. By Theorems 7.2 and 7.7, pages 76-78 in Maitra and Sudderth (1996), the probability is one under Since π 1 agrees with π on the set [u < V ], the P x,π,σ 1 -probability of reaching the set [u ≥ V − ] is also one. That is, the conclusion of the lemma holds for the special case when σ = σ 1 . Now let σ be an arbitrary strategy for player 2 and consider a state x such that , then player 2 can guarantee a payoff no larger than u(x) by playing s repeatedly.) Thus the strategy σ must agree with σ 1 on the set [u ≤ V − ] and therefore P x,π,σ [τ < ∞] = P x,π,σ 1 [τ < ∞] = 1.
Lemma 5 For all strategies σ for player 2, all > 0, and every stop rule r , there is a stop rule t ≥ r , such that P Proof Fix σ , , and assume first that r is the identically 0 stop rule. By countable additivity, So the lemma is proved for the special case when r = 0. Now let r be an arbitrary stop rule. By the previous case, there is, for each infinite history h ∈ H ∞ , a stop rule t h depending on the finite history h r (h) such that For the conditional probability given h r (h) , we have the inequality for every value of h r (h) . Hence, the inequality also holds unconditionally.

Completion of the proof that is optimal
Let x ∈ S and let σ be a strategy for player 2. We need to prove that u(x, π, σ ) ≥ V (x). By a "Fatou equation" (Maitra and Sudderth (1996), Theorem 2.2, page 60) The final equality is just the definition of the limsup over the directed set of stop rules.
So it suffices to show that, for every stop rule r , sup t≥r E x,π,σ u(x t ) ≥ V (x). To that end, fix r and > 0. By Lemma 5, there exists a stop rule t ≥ r such that Let k be an upper bound on the absolute value of u and therefore also an upper bound on the absolute value of V . An elementary calculation then shows that and, hence by Lemma 3, Since is an arbitrary positive number, the proof of Theorem 1 is complete.

Remark 4
In the language of Exercise 18.13, page 224 in Maitra and Sudderth (1996), Lemma 3 shows that π is uniformly thrifty and Lemma 5 shows that π is uniformly equalizing. The exercise is to show, as is done above, that these two conditions imply that π is optimal. These notions have their origin in the Dubins-Savage theory of thrifty and equalizing strategies for gambling problems (Dubins and Savage (1965), pages 46-54).

Games with 0-1 utility functions
Before proceeding to the proof of Theorem 2, we study a stop-or-go game G K in which the utility function u is the indicator function of a subset K of S. The results obtained for G K will be used to show that optimal strategies allow to reach and stay in good states (see Sect. 7.2).
In the game G K , the limsup payoff u * from player 2 to player 1 is the indicator of the set L of those h = (x 0 , a 0 , b 0 , x 1 , a 1 , b 1 , . . .) ∈ H ∞ such that x n ∈ K for infinitely many n.
In this special case, there are optimal strategies of a very simple form. Indeed, let π 0 be the pure stationary strategy for player 1 that plays action s at state x if x ∈ K and s ∈ A(x), and plays action g otherwise; and let σ 0 be the pure stationary strategy for player 2 that plays s at x if x / ∈ K and s ∈ B(x), and plays g otherwise. Unlike the strategies of Theorems 1 and 2, the strategies π 0 and σ 0 can be played without knowledge of the value function V . Note that V (x) = 1 if x ∈ K and s ∈ A(x), because player 1 can guarantee payoff 1 by always playing action s, and V (x) = 0 if x / ∈ K and s ∈ B(x), because player 2 can guarantee payoff 0 by always playing action s, but in all other cases we only know that V (x) ∈ [0, 1] and determining V (x) is not immediate. Indeed, the simplicity of the utility function u does not seem to lead to essential simplifications in the transfinite induction provided in Maitra and Sudderth (1992) to determine the value.
Theorem 3 In the game G K , the pure stationary strategy π 0 is optimal for player 1, and the pure stationary strategy σ 0 is optimal for player 2.
The proof will use three lemmas. In these lemmas we prove even more: the strategies in Theorem 3 are optimal responses to all stationary strategies of the opponent.

Lemma 6
In the game G K , the strategy π 0 is an optimal response to every stationary strategy for player 2.
Proof Fix a stationary strategy σ for player 2. Then player 1 faces a one-person problem of the type treated in Section 5.4 of Maitra and Sudderth (1996). Let Q be the value function for this one-person problem; that is, Q(x) = sup π u(x, π, σ ) for all x. By Corollary 4.7, page 99 in Maitra and Sudderth (1996), an optimal strategy π in this problem is to play action s if x ∈ K or Q(x) = 0, and s ∈ A(x), and to play action g otherwise. (The optimality of π also follows from Theorem 1 specialized to the oneperson case.) Notice that π 0 and π differ only at states x such that x / ∈ K , Q(x) = 0, and s ∈ A(x). Let D be the collection of such states. Since the value function Q equals 0 on the set D, the strategy π 0 is obviously optimal at every x ∈ D. If x / ∈ D, then π 0 and π agree up to time τ (h) = inf{n | x n ∈ D} for h = (x 0 , a 0 , b 0 , x 1 , a 1 , b 1 , . . .) ∈ H ∞ . If τ (h) < ∞ then x τ (h) ∈ D and the continuation of strategy π 0 is optimal at x τ (h) . By Lemma 2.3, page 92 in Maitra and Sudderth (1996), π 0 is optimal.
Lemma 7 In the game G K , the strategy σ 0 is an optimal response to every stationary strategy for player 1.
Theorem 3 will follow from Lemmas 6 and 7. Indeed, these lemmas imply that (π 0 , σ 0 ) is a Nash equilibrium in the game G K . Since G K is a zero-sum game, it means that π 0 and σ 0 are optimal strategies.
For the application of this result in Sect. 7 below, it will be convenient to restate and prove Lemma 7 for the equivalent liminf stop-or-go problem G C , where C is the complement of the set K , the players are reversed and the payoff u * from player 2 to player 1 is the indicator of the set E of those infinite histories h such that x n ∈ C for all but finitely many n. Thus E = ∪ n ∩ m≥n [x m ∈ C] . Now let π 1 be the stationary strategy for player 1 in the game G C that plays action s at state x if x ∈ C and s ∈ A(x), and plays action g otherwise. The next lemma is equivalent to Lemma 7.

Lemma 8
In the game G C , the strategy π 1 is an optimal response to every stationary strategy for player 2.
For each nonnegative integer n, let C n = ∩ m≥n [x m ∈ C]. Then the set E is the increasing union of the C n and, by countable additivity, P x,π 1 ,σ (E) = sup n P x,π 1 ,σ (C n ). Here is an intermediate step toward proving (3).
Observe that the set C 0 is the decreasing intersection of the sets F n , where F n = ∩ 0≤m≤n [x m ∈ C] for each nonnegative integer n. Thus P x,π,σ (C 0 ) = inf n P x,π,σ (F n ) for all x, π, σ . So, to verify Step 1, it suffices to show The proof of (4) is by induction on n. The case n = 0 follows from the fact that the quantities P x,π,σ (F 0 ) are all equal to 1 if x ∈ C and are all equal to 0 if not.
So assume that n > 0 and that the desired inequalities hold for n − 1. If the initial state x / ∈ C, then P x,π,σ (F n ) = 0 for all π, σ . So assume that x ∈ C. If s ∈ A(x), then π 1 plays s forever and P x,π 1 ,σ (F n ) = 1 ≥ P x,π,σ (F n ). So assume that s / ∈ A(x). Then every strategy π for player 1 must play g at x. Condition on h 1 = (x, g, b, x 1 ) and use the inductive hypothesis to calculate as follows: This completes the proof of Step 1.
As noted above, P x,π,σ (E) = sup n P x,π,σ (C n ). So this step will complete the proof of the inequalities in (3). The proof of Step 2 is again by induction on n. The case n = 0 follows from Step 1 because C 0 ⊆ E and therefore P x,π 1 ,σ (E) ≥ P x,π 1 ,σ (C 0 ) ≥ P x,π,σ (C 0 ) for all x, π.
So assume that n > 0 and the desired inequalities hold for n − 1. Condition on h 1 = (x, a 0 , b 0 , x 1 ) and use the inductive hypothesis to get The fact that σ [h 1 ] = σ , which was used in the line above, holds because σ is stationary. Now from the shift invariance of the set E, it follows that

So it suffices to show that
If A(x) = {g} is a singleton, then both π and π 1 must play g at x so that the two quantities above are the same. So assume that A(x) = {s, g}. If x ∈ C, then π 1 will play s forever and thus P x,π 1 ,σ (E) = 1. So assume that x / ∈ C in which case π 1 plays g at x. If π also plays g, then the two quantities above are equal. If π plays s, then by the inductive hypothesis with h 1 = (x, g, s, x), Whether π plays g or s, the desired inequality holds. Consequently, it also holds if π plays a mixture of the two. The proofs of Step 2 and the lemma are now complete.
Let σ 1 be the stationary strategy for player 2 in the game G C that plays action s at state x if x / ∈ C and s ∈ B(x), and plays g otherwise. The following lemma is equivalent to Lemma 6, which we have already proved.

Lemma 9
In the game G C , the strategy σ 1 is an optimal response to every stationary strategy for player 1.
Lemmas 8 and 9 imply the next theorem, which amounts to a restatement of Theorem 3.

Theorem 4
In the game G C , the pure stationary strategy π 1 is optimal for player 1, and the pure stationary strategy σ 1 is optimal for player 2.

Stop-or-go games with liminf payoff
Let G = (S, A, B, q, u) be a stop-or-go game as defined in Sect. 2 except that the payoff is now taken to be u * where a 0 , b 0 , x 1 , a 1 , b 1 , x 2 Because − lim inf n u(x n ) = lim sup n (−u(x n )), this liminf stop-or-go game is equivalent to a limsup stop-or-go game as in Sect. 2 with the players reversed and u replaced by −u. Let u * (x, π, σ ) = E x,π,σ u * denote the expected payoff at state x when the players choose the strategies π and σ , and let W be the value function for the liminf stop-or-go game G . We will now prove the following theorem, which is equivalent to Theorem 2.
Theorem 5 If > 0, then a pure -optimal stationary strategy for player 1 in the game G is to play action s if u(x) ≥ W (x) − and s ∈ A(s), and to play action g otherwise.
and let a (x) = g otherwise. Let π be the stationary strategy that plays action a (x) at each state x. To prove Theorem 5, and thereby Theorem 2, we need to show that π is -optimal for player 1 in the game G .

The strategy conserves W
Here is the analogue to Lemma 2 for the strategy π .
Lemma 10 For every initial state x = x 0 and every strategy σ for player 2, the process W (x n ) is a submartingale under P x 0 ,π ,σ ; that is, for every finite history h n = (x 0 , a 0 , b 0 , x 1 , . . . , a n−1 , b n−1 , x n ).
Proof The proof is similar to the proof of Lemma 2.
Let M (x) be the one-shot game with action sets A(x), B(x) and payoff for actions a ∈ A(x), b ∈ B(x) equal to y∈S W (y) q(y|x, a, b). It suffices to show that, for each state x, a (x) is optimal for player 1 in M (x). Note that the value of M (x) is W (x) by the optimality equation for the game G .

Case 1: u(x) ≥ W (x) − and s ∈ A(x).
In this case, a (x) = s and is clearly optimal in M (x).

Case 2: u(x) < W (x) − and s / ∈ A(x).
In this case, a (x) = g is the unique action available to player 1 and is therefore optimal.

Case 3: u(x) < W (x) − and s ∈ A(s).
In this case, s / ∈ B(x). (If s ∈ B(s), player 2 can guarantee a a payoff no larger than u(x) by playing s forever.) So B(x) = {g} and player 2 must play g. Also a (x) = g in this case. So we need to show that y W (y)α(x)(y) = y W (y)q(y|x, g, g) ≥ W (x).
For an argument by contradiction, assume that y W (y)·α(x)(y) < W (x). Choose Next choose 1 such that 0 < 1 < 0 /2 and a strategy π 1 for player 1 that is 1optimal at x. Let τ be the first time (if any) when π 1 plays action g. Note that prior to time τ the strategy π 1 is playing s and so x n = x for 0 ≤ n ≤ τ . At time τ both players play g and the distribution of x τ +1 is therefore α(x) = q (·|x, g, g). Let σ be a strategy for player 2 such that σ [h τ +1 ] is 1 -optimal at x τ +1 when τ < ∞.
Repeat the calculation in case 3 of the proof of Lemma 2 using equality (2), rather than (1), with W in place of V , and 0 in place of to find that W (x) − 1 ≤ W (x) − 0 + 1 contradicting our choice of 1 < 0 /2. The next lemma follows from Lemma 10 as Lemma 3 did from Lemma 2.

Lemma 11
For every initial state x, every strategy σ for player 2, and every stop rule t, E x,π ,σ W (x t ) ≥ W (x).

Reaching and staying in good states
When the payoff is the limsup, a good strategy must, with high probability, reach states with utility close to the value infinitely often. However, a good strategy for the liminf payoff must, with high probability, reach such states and eventually stay in the collection of them.
Let > 0. Define C = {x ∈ S | u(x) ≥ W (x) − } and, as in Sect. 6, let E be the set of infinite histories h = (x 0 , a 0 , b 0 , x 1 , a 1 , b 1 , . . .) such that x n ∈ C for all but finitely many n. Here is the main result of this section.
The first step in the proof of Theorem 6 is to show that, for every x and every stationary strategy σ , there exist strategies π such that P x,π,σ (E) is close to one.
Proof With σ fixed, player 1 faces a liminf one-person stop-or-go game equivalent to a liminf gambling problem as treated in Sudderth (1983). From position h n =  (x 0 , a 0 , b 0 , x 1 , . . . , x n ) in the one-person problem the distribution of a n is selected by player 1 as a distribution on A(x n ) and b n has distribution σ (x n ). Then x n+1 has conditional distribution q(·|x n , a n , b n ). The payoff to player 1 from a strategy π at state x is then u * (x, π, σ ). Let Q be the value function for this one-person problem. Then, for each x ∈ S, where W is the value function for the two-person game.
Consider now the liminf stop-or-go game G C of Sect. 6 in which the utility function is the indicator of the set C. Let π 1 and σ 1 be the strategies as in Sect. 6. The strategy π is the same as the strategy π 1 . By Lemma 8, π is an optimal response to the stationary strategy σ 1 . So, by Lemma 12, Also, by Theorem 4, the game G C has the value function V C (x) = P x,π ,σ 1 (E) = 1 and π is optimal for player 1 in the game. Hence, for all x ∈ S and all strategies σ for player 2, P x,π ,σ (E) ≥ V C (x) = 1. This completes the proof of Theorem 6.

n-person stop-or-go games
Let n ≥ 2 be a positive integer. An n-person stop-or-go game G = (S, A, q, u) is a stochastic game with players I = {1, . . . , n}, a countable non-empty state space is the product of the action sets A 1 (x), . . . , A n (x) for the players, where for each player i ∈ I , A i (x) = {g, s} or {g}; the law of motion q satisfies q(·|x, a) = q(·|x, a 1 , . . . , a n ) = α(x) if a 1 = · · · = a n = g where α(x) is a countably additive probability measure defined on all subsets of S and q(·|x, a) = δ(x) if at least one of the actions a i equals s; the function u = (u 1 , . . . , u n ) is the vector of the bounded real-valued utility functions u i , i ∈ I , for the players. Note that if any player plays the stop action s, then the state remains the same.
Play is similar to that in two-player games. At each stage k, the play is at some state x k , the players simultaneously choose actions a k = (a k1 , . . . , a kn ) ∈ A(x k ), and the next state has distribution q(·|x k , a k ). A play from an initial state x 0 generates an infinite history h = (x 0 , a 0 , x 1 , a 1 , . . . ). Each player i is either a limsup player and has payoff lim sup k u i (x k ) or a liminf player with payoff lim inf k u i (x k ).
For each finite history h = (x 0 , a 0 , . . . , x k−1 , a k−1 , x k ), let x h = x k denote the final state in h. A strategy σ i for player i ∈ I is a mapping that assigns to each finite history h a mixed action σ i (h) on A i (x h ). A strategy profile σ = (σ 1 , . . . , σ n ) consists of a strategy σ i for each player i ∈ I . An initial state x together with a profile σ and the law of motion q determine a distribution P x,σ for the infinite history h = (x 0 , a 0 , x 1 , a 1 , . . . ) and the expected payoff according to whether player i is a limsup or a liminf player.
For ≥ 0, an -equilibrium is a strategy profile σ such that, for every i, σ i is an -optimal strategy for player i versus the remaining strategies denoted by σ −i . A 0-equilibrium is called simply an equilibrium.
For the purpose of constructing an -equilibrium, the game G can be viewed as a game of perfect information; that is, a game in which at most one player has a choice of actions at each stage. The reason is that, if two or more players have the action s available at some state, then an equilibrium is attained trivially when these players play action s. Such states can be viewed as absorbing. At all other states, at most one player can have the action s available and such a player we call the active player at the state.
Theorem 7 If G is an n-person stop-or-go game, then G has an -equilibrium for every > 0. If the state space S is finite or if every player has a limsup payoff function, then G has an equilibrium.
Proof The proof is based on an idea of Mertens and Neyman (cf. Mertens 1987, which also appears in Thuijsman and Rahgavan (1997), 1 and uses an auxiliary game for each player i ∈ I .
Step 1: The auxiliary game G i Consider the auxiliary zero-sum game G i in which player i maximizes his own payoff and the other players minimize player i's payoff. The game G i can be viewed as being a two-person stop-or-go game. This is because the players −i can be viewed as a single player.
For each finite history h the continuation game has a value v i (h) and player i has a pure stationary strategy σ i that is /2-optimal in every subgame. This follows from Theorem 1 if player i is a limsup maximizer and from Theorem 5 if i is a liminf maximizer. Similarly, the players −i have a pure stationary profile σ i −i that is /2optimal in every subgame. Let u i (h, σ i , σ −i ) be the expected payoff to player i from playing σ i versus σ −i . Then Note that if player i is the active player, then none of his actions can increase his value in expectation. Indeed, suppose that h is the history, ending in state x. Then: where (h, a, x ) is the history when after h action a is played and the new state is x .
Step 2: In the original game Let σ be the pure strategy profile σ = (σ i ) i∈I . Consider the strategy profile σ * such that: -The players follow σ as long as there is no deviation from σ . Note that because σ consists of pure strategies, a deviation is immediately noticed. -If a player i deviates from σ i , then all his opponents "punish" player i in the remaining game, by switching to the strategy profile σ i −i from the next stage.
We argue that σ * is an -equilibrium. Let H denote the set of histories in which no deviation from σ has taken place. (So, according to σ * , the players should still follow σ .) Let h ∈ H , ending in some state x. Assume that player i is active in state x.
If player i does not deviate, and follows σ i in the remaining game, then by (7), his payoff in the subgame at h is at least On the other hand, if player i deviates at h from σ i (h) to any action a = σ i (h), then player i's payoff in the subgame at h is at most because players −i will "punish" player i from the next stage and because of (8).
Thus, no player can profitably deviate, up to , at any history in H . This means that σ * is an -equilibrium indeed. The proof of the first assertion is complete.
To verify the second assertion, note that, by Theorem 1 and Corollary 1, players have optimal pure stationary strategies in zero-sum stop-or-go games when S is finite or if the players are limsup maximizers. Thus the argument above can be repeated with taken to be zero.
The strategies of the -equilibria constructed in the proof of Theorem 7 are stationary except when punishments occur. So one might suspect that there exist stationary -equilibria. Here is a two-person example where stationary -equilibria do not exist for small .
There are four pure stationary profiles: (s, s) where both active players play s, (g, g) where both play g, (s, g) where player 1 plays s and 2 plays g, (g, s) where 1 plays g and 2 plays s. We need not consider mixed strategies because repeated play of a mixed action with positive mass on g is equivalent to playing g.
The profile (s, s) is not an equilibrium because it gives player 1 payoff 1 at state 1 while player 1 would get payoff 1 2 · 0 + 1 2 · 3 > 1 by playing g. The profile (g, g) is not an equilibrium because it gives player 1 payoff 0 at state 1 while player 1 would get payoff 1 by playing s.
In the special case when every player has a 0-1 valued utility function, there is a very simple stationary equilibrium.
Theorem 8 Suppose that, for every i ∈ I , the utility function u i is the indicator of a subset K i of S. Then an equilibrium is the profile σ = (σ 1 , . . . , σ n ) where each strategy σ i plays s at each state x if x ∈ K i and s ∈ A i (x), and plays g otherwise.

Proof
The strategies σ −i can be viewed as a single stationary strategy in a two-person game versus player i. By Lemma 6, σ i is optimal versus σ −i if player i is a limsup maximizer. The same holds by Lemma 8 if player i is a liminf maximizer.

Extensions
It seems plausible that the results proved here for stop-or-go games with countable state space and a countably additive law of motion can be generalized to finitely additive games and also to Borel measurable games as in Maitra and Sudderth (1993a, b). The existence of the value follows in both cases from general theorems. Generalizations of some results of probability theory would be needed in the finitely additive case, and there are likely to be measurability obstacles in the Borel setting.
In Sect. 8 we could reduce n-player stop-or-go games to perfect information games. Such a reduction would also be possible in the zero-sum case for the proofs of Theorems 1 and 2, but would not lead to essential simplifications.
Theorem 7 was stated for games with finitely many players, but it also holds for a set of players of arbitrary cardinality, with a very similar proof.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.