Subgame-perfection in recursive perfect information games, where each player controls one state

We consider a class of multi-player games with perfect information and deterministic transitions, where each player controls exactly one non-absorbing state, and where rewards are zero for the non-absorbing states. With respect to the average reward, we provide a combinatorial proof that a subgame-perfect ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-equilibrium exists, for every game in our class and for every ε>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon > 0$$\end{document}. We believe that the proof of this result is an important step towards a proof for the more general hypothesis that all perfect information stochastic games, with finite state space and finite action spaces, have a subgame-perfect ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-equilibrium for every ε>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon > 0$$\end{document} with respect to the average reward criterium.


Introduction
We consider a subclass of stochastic games with finite state and action spaces.Shapley (1953) introduced the class of zero-sum stochastic games.He proved that such games have a value, under the assumption that there is a positive stopping probability after each move by the players, or similarly, under the assumption that stage rewards are discounted.Mertens and Neyman (1981) demonstrated that every such game also has a value with respect to the average reward.Vieille (2000a, b) showed that all nonzerosum two-player stochastic games have an ε-equilibrium for the average reward, for every ε > 0.
It follows from the result of Mertens (1987) that multi-player stochastic games with deterministic transitions and with perfect information admit an ε-equilibrium for the average reward, for every ε > 0. Thuijsman and Raghavan (1997) showed that they even admit a 0-equilibrium.These ε-equilibria are however not always subgameperfect.An example by Solan and Vieille (2003) demonstrated that a subgame perfect 0-equilibrium need not exist.
The question remains open whether subgame perfect ε-equilibria exist, for every ε > 0. In this paper, we consider a subclass of stochastic games, where each player controls exactly one non-absorbing state, and where non-zero rewards can only be obtained by entering an absorbing state.The technical novelty of this class is that it combines two features that both make it hard to analyse the game: (1) payoffs can be negative in the absorbing states, (2) it may be impossible to move from one nonabsorbing state to another non-absorbing state immediately.The technique to deal with these difficulties builds further on those in Flesch et al. (2010a, b), and Kuipers et al. (2013), and may be flexible enough to deal with the entire class of perfect information stochastic games.

The model
We consider the class G of games, given by (1) a nonempty set of players N = {1, . . ., n}; (2) exactly two states associated with each player t ∈ N : one non-absorbing state identified with t, and one absorbing state denoted by t * ; the set of absorbing states is denoted by N * , and the set of all states is denoted by S = N ∪ N * ; (3) for each state t ∈ N , a set of states A(t) ⊆ N ∪ {t * } with t * ∈ A(t) and t / ∈ A(t); for each state t * ∈ N * , the set A(t * ) is defined as A(t * ) = {t * }; (4) for each player t ∈ N , an associated (reward) vector r (t) ∈ R N .
A game in G is to be played at stages in N in the following way.At any stage m one state is called active.If t ∈ N is active, then player t announces a state in A(t), and the announced state will be active at the next stage.The rewards to the players are zero when this happens.If t * ∈ N * is active, then the unique state t * ∈ A(t * ) will be active at the next stage (thus, t * will be active forever).The stage rewards to the players when this happens are according to r (t), and since r (t) will be the reward at every subsequent stage, r (t) is also the (expected) average reward.The game starts with an initial state s ∈ S.
We assume complete information (i.e. the players know all the data of the game), full monitoring (i.e. the players observe the active state and the action chosen by the active player), and perfect recall (i.e. the players remember the entire sequence of active states and actions).
Playing a game in G can be interpreted as making an infinite walk in the directed graph G = (S, E), where E = {(x, y) | x ∈ S and y ∈ A(x)}.In this paper, whenever we refer to an ordered pair (x, y) as an edge, it is implicit that x ∈ S and y ∈ A(x), i.e. we mean that (x, y) is an element of E.
Let H be a subgraph of G, and denote the edge-set of H by E(H ).Plans.A plan in H is an infinite sequence of states g = (t m ) m∈N (where N = {1, 2, . ..}), such that (t m , t m+1 ) ∈ E(H ) for all t m ∈ N. A plan in G is simply called a plan.A plan is interpreted as a prescription for play for a game with initial active state t 1 .The set of states that become active during play if plan g is executed is denoted by S(g) ⊆ S, i.e.S(g) = {t ∈ S | ∃m ∈ N : t m = t}, and the set of players that become active during play is denoted by N(g) ⊆ N , i.e.N(g) = {t ∈ N | ∃m ∈ N : t m = t}.Notice that, if the initial state of g is an element of N * , then g is of the form (t * , t * , . ..), with t * ∈ N * .Also, if plan g contains a state in N * , say t * , and the initial state of g is an element of N , then we must have t ∈ N (g) and there must be a stage M with t M = t and with t m = t * for all m > M.This is interpreted as a prescription for t to announce his absorbing state t * at stage M. We say that the plan absorbs at t if this is the case.If S(g) ⊆ N , then we say that the plan is non-absorbing.We denote by φ t (g) the average reward to player t when play is according to g, i.e. φ t (g) = r t (x) if g absorbs at x, and φ t (g) = 0 if g is non-absorbing.The initial state of plan g is denoted by first(g).Paths.A path (or history) in H is a finite sequence p = (t m ) k m=1 with k ≥ 1, such that (t m , t m+1 ) ∈ E(H ) for all m ∈ {1, . . ., k − 1}.A path in G is simply a path.The number k − 1 is called the length of the path.The initial state t 1 of path p is denoted by first( p) and the final state t k is denoted by last( p).If the length of the path is at least 1, i.e. if the path contains at least one edge, we allow ourselves to say that p is a path from t 1 to t k .We will sometimes want to concatenate a number of paths to make a longer path or a plan, or we may want to concatenate a finite number of paths and a plan to make another plan.We allow concatenation if p 1 , p 2 , . . .p m are paths that satisfy last( p k ) = first( p k+1 ) for all k ∈ {1, . . ., m − 1}.The concatenation of these paths is denoted by p 1 , p 2 , . . .p m and it represents the path that follows the prescription of p 1 from first( p 1 ) to last( p 1 ) = first( p 2 ), then follows the prescription of p 2 until last( p 2 ) = first( p 3 ) is reached, and so on, until last( p m ) is reached.Also, if g is a plan with first(g) = last( p m ), then the plan that first follows the prescription of p 1 , p 2 , . . ., p m , and then switches to g is denoted by p 1 , . . ., p m , g .Finally, if we have an infinite number of paths p 1 , p 2 , . . .with the property last( p k ) = first( p k+1 ) for all k ∈ N, then p 1 , p 2 , . . .represents the path or plan that subsequently follows the prescription of p 1 , p 2 , etc. (The concatenation of an infinite number of paths is still a path if only finitely many of them have positive length).Let P denote the set of all possible paths, and for t ∈ N , let P t denote the set of all paths with endpoint t.Strategies.A strategy π t for player t is a decision rule that, for any path p ∈ P t , prescribes a probability distribution π t ( p) over the elements of A(t).We use the notation t for the set of strategies for player t.A strategy π t ∈ t is called pure if every prescription π t ( p) places probability 1 on one of the elements of A(t).We use the notation for the set of joint strategies π = (π t ) t∈N with π t ∈ t for t ∈ N .A joint strategy π = (π t ) t∈N is pure if π t is pure for all t ∈ N , in which case we say that play is deterministic.
In this paper, we will sometimes define joint strategies by formulating a prescription of play for several stages of the game, possibly for the entire game, which holds only so long as players execute actions that are assigned a positive probability by the prescription.In the event that a player chooses an action that is assigned probability 0, a revised prescription is formulated.Specifically, we will consider two types of prescribed play, called default mode and threat mode.
Default mode is characterized by a plan g.During default mode, one of the players on g is active and he is required to follow the prescription of g.Play will stay in default mode characterized by g during the entire game, provided that players indeed follow the plan.
Threat mode is characterized by a triple (g, v, x), where g and v are plans and where x is a player located on g, such that first(v) ∈ A(x) and such that the state on g following the first occurrence of x differs from first (v).During threat mode, the active player is either located on g before the first occurrence of x or it is the first occurrence of x.If the active player is located before x, then he is required to follow the prescription of g.If the active player is the first occurrence of x on g, then he is required to perform a lottery, in which to place probability ε on the switch to v, and probability 1 − ε on the continuation of g (The requirement that first (v) differs from the state following x on g ensures that the lottery places positive probability on two different options).Threat mode ends after the lottery and play returns to default mode characterized by either g or v, depending on the outcome of the lottery.
The triple (g, v, x) that characterizes threat mode is always chosen such that there exists y ∈ N located on g after the first occurrence of x on g, and such that φ y (v) < φ y (g).The interpretation here is that y is the threatened player and that the threat consists of the possible switch from plan g to plan v.
Here, we will not say how a revised prescription is formulated, once a player plays an action that has probability 0 according π .This is postponed until the proof of our main Theorem 2.1.Expected rewards.Consider a joint strategy π ∈ and a path p ∈ P. Suppose that the game has developed along the path p and that state last( p) is now active.Suppose further that all players, starting at last( p), follow the joint strategy π , taking p as the history of the game.Denote the overall probability of absorption at t by P p,π (t).In our model, where non-zero rewards only occur in absorbing states, we see that the expected average reward for player t exists, and that it can be expressed as If π is a pure joint strategy, then following π results in deterministic play, which can be described by a plan.If we denote this induced plan by g p,π , then the expected average reward is given by 123 If a joint strategy is given in terms of prescribed play, then it is sufficient to know the mode of play (default or threat) at last( p) and the data to describe that mode.If play is in default mode according to plan g, then the expected reward for an arbitrary player t ∈ N is given by If play is in threat mode according to (g, v, x), then the expected reward for an arbitrary player t ∈ N is given by (v).
Equilibria.Still consider the joint strategy π ∈ and a game that has developed along the path p ∈ P. The joint strategy π = (π t ) t∈N ∈ is called a (Nash) ε-equilibrium for path p, for some ε ≥ 0, if which means that, given history p, no player t can gain more than ε by a unilateral deviation from his proposed strategy π t to an alternative strategy σ t .The joint strategy π is called an ε-equilibrium for initial state s ∈ N if π is an ε-equilibrium for path (s).
The joint strategy π is called a subgame-perfect ε-equilibrium if π is an ε-equilibrium for every path p ∈ P.

Strategic concepts
We now describe some strategic concepts, necessary to compute and describe a subgame perfect ε-equilibrium for a game in the class G.
For α ∈ R N , a plan g, and a player x ∈ N , we say that x is α-satisfied by g if φ x (g) ≥ α x .We define We say that plan g is α-viable if N(g) ⊆ sat (g, α).This means that, if play is according to g, every player t that becomes active during play will receive an average reward of at least α t .Notice that a plan of the form g = (t * , t * , . ..) with t * ∈ N * is trivially α-viable, since N(g) = ∅.For every state t ∈ S, we denote the set of α-viable plans g with first(g) = t by viable(t, α).Notice that, for t * ∈ N * , the set viable(t * , α) consists of only the plan (t * , t * , . ..).
Not all α-viable plans are equally credible as prescriptions for play.Let t ∈ N , let u ∈ A(t), and let g ∈ viable(u, α).Imagine that plan g is proposed as a prescription for play and suppose that t is located on g.Then player t may prefer to announce u when he becomes active, instead of following the prescription of g.Indeed, if the other players keep restarting plan g each time that t announces u, and if player t keeps announcing u when he becomes active, then play will not absorb and player t may profit from this.The following definition of admissible plans is meant to select the α-viable plans, where such a deviation by player t is not possible, not profitable for t, or can be countered with a credible threat by one of the other players.For t ∈ N , and a plan g ∈ viable(u, α) with u ∈ A(t), say that g is (t, u, α)admissible if at least one of the following holds: AD-i: t / ∈ N(g) or g is non-absorbing; AD-ii: α t > 0; AD-iii: t ∈ N(g) and there exists a pair (x, v), such that x = t is a player who resides on g in the part from u to the first occurrence of t, such that v is an α-viable plan with first(v) ∈ A(x), such that first (v) is not the state following the first occurrence of x on plan g, and such that x, t / ∈ sat (v, α).
We denote the set of plans that are (t, u, α)-admissible by admiss(t, u, α).
Condition AD-i describes the situation, where the announcement of u by player t is not possible (the case t / ∈ N(g)) or would yield the same average reward as g (the case that g is non-absorbing).Condition AD-ii describes the situation, where this deviation, if it is possible, would yield a strictly lower average reward for t, as the non-absorbing plan yields zero average reward, while g yields an average reward of at least α t > 0. Condition AD-iii describes the situation, where a player x with x = t and participating in the non-absorbing plan created by t's deviation to u, has the threat of switching to an α-viable plan v with t, x / ∈ sat (v, α).Now, if prescribed play is not exactly g, but if player x is required to place a very small probability on a switch to v, the first time he is active, then play will still be according to g with very high probability if players follow the prescription.Player t will not be able to profit from continuous deviation to u, as this would eventually result in a switch to v with probability 1 and t / ∈ sat (v, α).Player x will not be tempted to increase the probability of a switch to v, since x / ∈ sat (v, α).
Example 2.1 Figure 1 represents two games,1 each with two players, 1 and 2. Only the two nodes that correspond to the non-absorbing states of these players are depicted, and the vectors below these states are the rewards that will be obtained if the controlling player chooses his absorbing state.For both games, the arc from 2 to 1 indicates that 1 ∈ A(2), hence A(2) = {1, 2 * }, and the other arc indicates 2 ∈ A(1), hence A(1) = {2, 1 * }.
Update procedure.We now propose a method for updating one coordinate of a vector α ∈ R N .For t ∈ N , u ∈ A(t), and α ∈ R N , we define and We use the convention min ∅ = ∞, so that β(t, u, α) and δ(t, α) are well defined.Moreover, by the definition of δ(t, α), the set B(t, α) is always nonempty.The update of vector α is done by replacing coordinate α t by δ(t, α).
Example 2.2 Let us apply the updating procedure to the two games depicted in Fig. 1, with the initial α-values from Example 2.1.
For game I, we chose α 1 = 1 and α 2 = −1.Let us first update the state controlled by player 1.We trivially have β(1, 1 * , α) = r 1 (1) = 1 and we have Thus,δ(1,α) = max(1, 2) = 2, and the updated vector is given by α * 1 = 2 and α * 2 = −1.Let us now update the updated vector α * .This time, we update the state controlled by player 2. We trivially have and we observe that this update had no effect.One can verify that an update of the already updated state controlled by 1 will have no further effect either.

Main result
In Sects. 3 and 4, we analyze the update procedure.The analysis demonstrates that, starting with an appropriate initial vector α 0 ∈ R N , repeated application of the update procedure will produce, in a finite number of iterations, a vector The existence of such a 'fixed point' α * is proven in Theorem 4.16 and it allows for the construction of a subgame-perfect ε-equilibrium for all ε > 0. The following theorem is our main result.
Theorem 2.1 There exists a subgame perfect ε-equilibrium for every game in class G and every ε > 0.
The iterative procedure in this paper differs from those in earlier papers in the following respects.In comparison with results for games where all payoffs in absorbing states are non-negative, such as in Kuipers et al. (2009) and Flesch et al. (2010a), or where the payoffs are lower semi-continuous as in Flesch et al. (2010b), the main role is played by the set of all viable plans, whereas in the current paper we need to consider the more sophisticated concept of admissible plans.The need for this was already clear from the game in Solan and Vieille (2003) (see game I in Example 2.1), where the unique type of subgame perfect ε-equilibrium requires randomization where a player puts a small but positive probability on a suboptimal action.This is reflected in condition AD-iii of the definition of admissible plans.The only other paper where a similar type of iteration was applied to a game with possibly negative payoffs is Kuipers et al. (2013), which deals with a much smaller class of games and for which most of the concepts introduced in Sects.3 and 4 were not needed.
Further, we would like to relate our main result, Theorem 2.1, to two recent existence results.The first one is the existence result in Flesch and Predtetchinski (2015a) for more general perfect information games.Their main result implies, in the context of our model, that a subgame-perfect ε-equilibrium exists if the number of non-absorbing plans is countable. 2In our model this condition is very restrictive, and is typically violated in games which have two non-disjoint cycles of non-absorbing states.As an example in which the condition does hold, we refer to the game by Solan and Vieille in Solan and Vieille (2003) (see game I in Example 2.1), which has only one non-absorbing plan from the initial state.
Another existence result that we mention is the one in Flesch and Predtetchinski (2015b).Their result implies, in the context of our model, that a subgame-perfect 0-equilibrium exists in pure strategies provided that the following condition3 holds for every cycle c of non-absorbing states: the set S * (c) of all absorbing states that can eventually be reached from the cycle c can be partitioned into two sets S * − (c) and S * + (c) such that (1) S * + (c) is non-empty, (2) all payoffs in the states in S * + (c) are positive, (3) each player prefers the payoff in any state in S * + (c) to that in any state in S * − (c).Intuitively, the players would like to get absorption in a state in S * + (c), even though they can still disagree in which one exactly, so this condition would allow to safely ignore absorbing states having a negative payoff.

Proof of Theorem 2.1
I: Description of a joint strategy.Take a game in G.As we remarked, a vector We can thus choose a plan g t ∈ viable(t, α * ) for all t ∈ N .We can also choose, for all t ∈ N and all u ∈ A(t), a plan g tu ∈ admiss(t, u, α * ) such that φ t (g tu ) = β(t, u, α * ).In case the choice g tu violates both AD-i and AD-ii, and therefore satisfies AD-iii, we can make the additional choice of a player x tu ∈ N \{t} and an α * -viable plan v tu , such that first(v tu ) ∈ A(x tu ), such that first (v) is not the state following the first occurrence of x tu on plan g tu , such that x tu resides on the non-absorbing plan that would result if player t were to deviate from g tu by announcing u every time he is active, and such that x tu , t / ∈ sat(v tu , α * ).Now, if s ∈ N is the initial state of the game, then we start the game in default mode with plan g s as initial prescription.Whenever a deviation from prescribed play is detected, say that player t deviates to u, then we check if plan g tu , associated with the pair (t, u), satisfies AD-i or AD-ii.If so, then revised play will be in default mode according to the (newly) prescribed plan g tu .If not, then AD-iii is satisfied, and play will resume in threat mode according to the triple (g tu , v tu , x tu ).Let us denote the joint strategy described in this way by π .II: Verification.We will verify that π is a subgame perfect 2εM-equilibrium, where Let y ∈ N and let σ y be a strategy for player y.Let σ denote the joint strategy where player y uses strategy σ y and all players t = y use strategy π t .We will prove that ψ p y (σ ) ≤ ψ p y (π ) + 2εM for every path p.
If there exists a mixed strategy for the deviating player such that he profits more than 2εM, then there also exists a pure strategy for him with this property.Therefore, we may further assume that σ y is a pure strategy.Now, let p be an arbitrary path and suppose that play has developed along p. Suppose further that strategy σ is used after p.We will prove that player y can profit at most 2εM in expectation from this, compared to playing strategy π y .
For a path q with last(q) = y, say that σ deviates at q if σ y (q) and π y (q) are not the same probability distributions.Say that player y deviates (during play according to σ ) whenever play according to σ develops along a path q such that σ deviates at q.
We divide the proof in three cases, depending on the type and number of deviations by player y after p.In each case, we bound the conditional expected reward of the deviating player y, or we prove that the case has probability 0 of happening.IIa: First assume that player y deviates a finite number of times and that, at the last deviation, σ y assigns probability 1 to an action u ∈ A(y) that is assigned probability 0 in π y (hence the last deviation causes a revision of prescribed play).
If the plan g yu satisfies AD-i or AD-ii, then after y's last deviation, play will resume in default mode with prescription g yu .The plan g yu was selected such that φ y (g yu ) = β(y, u, α * ).By definition of δ(y, α * ), we have β(y, u, α * ) ≤ δ(y, α * ), and by the properties of α * y , we have δ(y, α * ) = α * y .It follows that the expected reward for player y, after deviations, is φ y (g yu ) ≤ α * y .If the plan g yu violates both AD-i and AD-ii, then after y's last deviation, play will resume in threat mode according to the triple (g yu , v yu , x yu ).Since no further deviations will take place, plan g yu will be executed with probability 1 − ε and with an expected average reward of at most α * y for player y; plan v yu will be executed with probability ε and with an expected average reward strictly less than α * y for player y, since y / ∈ sat(v yu , α * ).Thus, the expected average reward for y, after deviations, is strictly less than (1 − ε)α * y + εα * y = α * y .Let us now demonstrate that player y has an expected average reward of at least α * y −2εM if he follows π y .If y becomes first active during default mode, it means that prescribed play is deterministic according to an α * -viable plan, say g.Then y's average reward will be φ y (g) and since y ∈ N(g), we have indeed φ y (g) ≥ α * y > α * y − 2εM.If player y becomes active during threat mode, then play is according one of the plans g tu .If the first occurrence of player x tu comes before y on plan g tu , then there is no chance of a switch to v tu anymore, and the average reward for player y will be If the first occurrence of x tu comes after y or if x tu = y, then there is probability 1 − ε that plan g tu will be executed, with an average reward of at least α * y for player y.There is also a probability of ε that v tu will be executed, with an average reward of at least −M for player y.Thus indeed, the expected average reward for player y is at least (1 − ε)α * y − εM ≥ α * y − 2εM.IIb: Now assume that player y deviates a finite number of times after p, and that, at the last deviation, player y chooses an action u ∈ A(y) that is played with positive probability according to π (hence the last deviation does not cause a revision of prescribed play).This implies that, at the last deviation by y, the game is in threat mode, and that according to π , player y is supposed to use a lottery to determine his action.Say that the game is in threat mode according to the triple (g, v, y).
Notice that the player to cause a revision of play is never the one assigned the task of performing a lottery in revised play.Therefore, player y did not deviate in the period between the stage at the end of p and the stage at which he is supposed to perform the lottery.So, the last deviation of y is in fact his only deviation in the relevant time period, and after p, play is in threat mode according to (g, v, y).We conclude that the expected average reward for y by following π is (1 − ε)φ y (g) + εφ y (v).
Since we assume that the last deviation of y has positive probability under π , there are precisely two possibilities for that action.If player y deviates by choosing continuation of g (with probability 1, as σ y is pure), then plan g will be executed entirely (with probability 1) and the average reward for player y, after deviations, is φ y (g).If player y deviates by choosing first (v), then plan v will be executed with probability 1 and the average reward for player y, after deviations, is φ y (v).In both cases, the reward is bounded by φ y (g), since φ y (v) < α * y ≤ φ y (g).For this case, it now remains to see that φ y (g) IIc: Let us finally investigate the possibility that player y deviates infinitely many times.As this implies infinite play along non-absorbing states, the average reward to all players will be zero if this happens.If α * y ≥ 0, then player y will profit at most 2εM, as he will receive at least α * y −2εM by sticking to the plan.So we can assume that α * y < 0. Notice that, if player y deviates infinitely many times, he causes infinitely many times a revision of prescribed play.Each time at such a revision, it will be checked if the plan g yu satisfies AD-i or AD-ii, where u is the state to which y deviates.By the choice of g yu as a minimizer for β(y, u, α * ), we have φ y (g yu ) = β(y, u, α * ) ≤ δ(y, α * ) = α * y < 0. This implies that g yu is an absorbing plan.Also, since y will deviate again, we must have y ∈ N(g yu ).This means that AD-i is violated, and clearly AD-ii is violated too, by our assumption α * y < 0. Thus, only AD-iii is satisfied, and play will resume in threat mode according to the triple (g yu , v yu , x yu ).
We conclude that, in the event that σ deviates infinitely many times during play and α * y < 0, play will enter threat mode infinitely many times.Each time that play enters threat mode, there is probability ε that a switch to one of the plans v yu is made.Thus, with probability 1, such a switch is eventually made.Notice that after the switch, player y will not become active again, since y / ∈ sat(v yu , α * ), and therefore y / ∈ N (v yu ).This directly contradicts that player y deviates infinitely many times.We thus see that the event of infinitely many deviations during play has probability 0 if α * y < 0. This completes the proof of Theorem 2.1.
With these choices, the proof of theorem 2.1 now prescribes the players to always announce each other, indefinitely.Deviation from the plan means absorption, which is indeed not profitable for the deviating player.
Of the restrictions imposed on our class G of games, the requirement that each player control just one non-absorbing state seems especially severe.The following example is to demonstrate that at least one specific attempt to deal with multiple states per player does not work.
Example 2.4 The update procedure could be applied to the 1-player game in figure 2, as if the two states were controlled by two different players, say 1 and 1 .If we initiate the update procedure with α t = −1 for t = 1, 1 , then we find that α is in fact a fixed point.The construction of theorem 2.1 then allows, for any ε > 0, for the strategy profile in which player 1 absorbs immediately with probability 1, in any subgame.So clearly, the construction of Theorem 2.1 fails here.♦

Semi-stable vectors and their properties
The purpose of this section is to provide a sufficient condition for α ∈ R N that guarantees the existence of a (t, u, α)-admissible plan for every t ∈ N and every u ∈ A(t).
For α ∈ R N , t ∈ N and u ∈ S, let us say that t is α-safe at u if u ∈ A(t) and if t ∈ sat(g, α) for all g ∈ viable(u, α).For t * ∈ N * , it will be convenient to say that t * is α-safe at t * .We define, for all t ∈ S, We also define and for α ∈ R N , we define We say that an edge e = (x, y) is an α-exit from X if x ∈ X and y ∈ S\X , and if, for all g ∈ viable(y, α), We say that e is a trivial α-exit from X if x ∈ esc(X, α) and a non-trivial one if x ∈ X \esc(X, α).
We now say that α ∈ R N is semi-stable if safestep(x, α) = ∅ for all x ∈ N , and if there exists a non-trivial α-exit from X for every X ∈ X (α).The set of semi-stable vectors in R N is denoted by .
Remark The vector α = (−1, 2, 0, 0) can be obtained by updating player 2 with respect to the vector ρ = (−1, 1, 0, 0).The updated value of 2 for player 2 can be associated with action 1 ∈ A(2), followed by the minimizing (2, 1, ρ)-admissible plan (1, 2, 3, 4, 4 * , 4 * , . ..).We see that, under the logic of the updating procedure, it is in this example smart for player 2 to announce state 1 before possibly announcing another state at a later stage of the game, as this will prevent possible absorption at 3. We see in this example also how the update from ρ to α naturally creates an α-exit from X .♦ Lemma 3.1 We have ρ ∈ , where ρ is the vector defined by ρ t = r t (t) for all t ∈ N .
For any subgraph H of G and a subset X of the vertex set V (H ) of H , say that X is an ergodic set of H if (i) for all x, y ∈ X , there exists a path p in H from x to y with N( p) ⊆ X , and (ii) for all x ∈ X and y ∈ V (H )\X , there is no path in H from x to y.
The following lemma is an easy result in graph theory.It is stated without proof.

Lemma 3.3 Let H = (V (H ), E(H
)) be a (directed) graph, such that for every vertex x ∈ V (H ), there exists y ∈ V (H ) with (x, y) ∈ E(H ).Then, for every x ∈ V (H ), there is a path from x to an element of an ergodic set of H .
For α ∈ , define the graph G(α) as the graph with vertex set S and edge set {(x, y) ∈ E | y ∈ safestep(x, α)}.Notice that, for all α ∈ and for all t * ∈ N * , the singleton {t * } is an ergodic set of the graph G(α), since (t * , t * ) is a path in G(α) from t * to t * and since there is no edge leaving the set {t * }.The definition of ergodic set implies that different ergodic sets of a graph are disjoint.Therefore, any ergodic set of G(α) is either a singleton from the set N * or a subset of N .The following corollary follows directly from Lemma 3.2.
An immediate insight from Corollary 3.4 is that, for all α ∈ , a non-absorbing plan v is α-viable if N(v) is a subset of an ergodic set of G(α).Lemma 3.5 Let α ∈ , let p be a path in G(α), and let g be an α-viable plan such that first(g) = last( p).Then the plan p, g is α-viable.

Lemma 3.6 Let α ∈ . Then, for all t ∈ N , a plan g in G(α) exists with g ∈ viable(t, α).
Proof Let t ∈ N .We have safestep(x, α) = ∅ for all x ∈ N , since α ∈ .Thus, the graph G(α) satisfies the conditions of Lemma 3.3.Therefore, there is a path p in G(α) from t to an element x of an ergodic set X of G(α).By the properties of an ergodic set, a path q in G(α) exists from x to x, and with N(q) ⊆ X .First we prove that the plan q, q, . . . is α-viable.Indeed, this is true by definition if the plan q, q, . . . is absorbing, i.e. if x ∈ N * and q = (x, x).Otherwise, we have X ⊆ N , and pos(X, α) = ∅ follows by Corollary 3.4.The plan q, q, . . . is then α-viable, due to the fact that it is non-absorbing, hence it gives average reward 0 to all players.Finally, it follows that g := p, q, q, . . .∈ viable(t, α), by Lemma 3.5.
The main result of this section concerns the existence of admissible plans.
Proof Let t ∈ N and u ∈ A(t).By Lemma 3.6, viable(u, α) = ∅.If v ∈ viable(u, α) exists with t / ∈ N(v), then v ∈ admiss(t, u, α) since v satisfies ADi, and we are done.Assume further that t ∈ N(v) for all v ∈ viable(u, α).Notice that this implies t ∈ sat (v, α) for all v ∈ viable(u, α).Thus, u ∈ safestep(t, α), i.e. (t, u) is an edge of G(α). Define and define X = Y ∪ {u, t}.We claim that esc(X, α) ⊆ {t}.To prove our claim, we let x ∈ X \{t}, and we will show that y ∈ X for all y ∈ safestep(x, α).So, let y ∈ safestep(x, α).If y = t, then trivially y ∈ X .If y = t, then there is a path in G(α) from u to y not containing t. Indeed, if x = u, then (x, y) = (u, y) is such a path, and if x = u, then there is a path p in G(α) from u to x not containing t by the fact that x ∈ X \{t, u} ⊆ Y , and p, (x, y) is then a path in G(α) from u to y not containing t.Thus, y ∈ Y ⊆ X as claimed.We now distinguish between the cases pos(X, α) = ∅ and pos(X, α) = ∅.First assume that pos(X, α) = ∅.Notice that, for all x ∈ X , an element y ∈ X exists, such that (x, y) is an edge of G(α).In particular also, (t, u) is an edge of G(α).Then it is possible to construct a non-absorbing plan g with first(g) = u, with N(g) ⊆ X , and such that every edge of g is in the edge-set of G(α).Then g ∈ viable(u, α) by the assumption that pos(X, α) = ∅, and by the fact that a nonabsorbing plan gives reward 0 to all players.Since the non-absorbing plan g satisfies condition AD-i, it also follows that g ∈ admiss(t, u, α).Now assume that pos(X, α) = ∅, i.e. assume that X ∈ P(α).We then have esc(X, α) = ∅ by Lemma 3.2, so we must have esc(X, α) = {t}.If t ∈ pos(X, α), then admiss(t, u, α) = viable(u, α) = ∅, where the equality is by the fact that AD-ii is satisfied by all plans in viable(u, α) and the inequality is by Lemma 3.6.So we can further assume that t / ∈ pos(X, α).Under this assumption, we have esc(X, α) ∩ pos(X, α) = ∅, i.e.X ∈ E(α).We also have that A(x) ∩ X = ∅ for all x ∈ X , which follows from our earlier observation that, for all x ∈ X , an element y ∈ X exists, such that (x, y) is an edge of G(α).Thus, we have X ∈ C.
We choose z ∈ S such that (x, z) is an edge of G(α), which is possible by the fact that α ∈ .Notice that x ∈ X \{t}.Indeed, we have x = t, since x ∈ X \esc(X, α) and t ∈ esc(X, α).It follows that z ∈ Y ∪ {t} ⊆ X by the properties of the set Y .
We also choose h ∈ viable(z, α), which is possible by Lemma 3.6.Now, in case x = u, we define g := (x, z), h .In case x ∈ X \{u}, we actually have x ∈ X \{u, t} ⊆ Y , and we can choose a path q in G(α) from u to x with t / ∈ N( p).We then define g := q, (x, z), h .We claim that g ∈ admiss (t, u, α).
By Lemma 3.5, we have g ∈ viable(u, α).Notice that the first occurrence of x in this plan is before the first occurrence of t.Moreover, we have y ∈ A(x) and a plan v ∈ viable(y, α) with t, x / ∈ sat (v, α).Also notice that first(v) = y and the state z, which follows the first occurrence of x in g, are different states, since y ∈ S\X and z ∈ X .This demonstrates that g satisfies condition AD-iii of admissibility, hence g ∈ admiss(t, u, α).

Stable vectors and their properties
In the previous section we showed that for all α ∈ , the updated vector, say δ(α), is finite and satisfies δ(α) ≥ α.If we could also prove δ(α) ∈ for all α ∈ , then it would be an easy corollary to establish a 'fixed point' in , i.e. the existence of a vector α * ∈ with the property δ(t, α * ) = α * t for all t ∈ N .We begin this section with an example of α ∈ and δ(α) / ∈ .The example demonstrates that, if we initiate the updating process with a vector in , the process may terminate with a vector that is not finite.
This 'negative result' will motivate the rather intricate definition of the set * of stable vectors, later in this section.The set * will be a subset of , so that all results derived in Sect. 3 will also hold for all α ∈ * .Most importantly however, we will be able to prove that δ(α) ∈ * for all α ∈ * .
Fig. 3 Update of a semi-stable vector may not be semi-stable Any update on α is still finite, even though Corollary 3.8 does not apply.However, two consecutive updates of the vector α will result in a vector that is not finite anymore.If we first update the state controlled by 3, then its new value will be β(3, 1, α) = 1: indeed, any α-viable plan with initial state 1 and with a reward lower than 1 for player 3 must absorb at 5, which is not (3, 1, α)-admissible.If we subsequently update the state controlled by 2, we see that there are no (2, 3)-admissible plans anymore, hence the value of 2 becomes infinite.The reader may wish to verify that a different order of updates does not solve the problem.
Let α ∈ , X ⊆ N and Z ⊆ X .We say that an edge (x, y) is an (α, Z )-exit from X if x ∈ X and y ∈ S\X , and if, for all v ∈ viable(y, α), We say that a sequence of edges e = (x i , y i ) k i=1 is an α-exit sequence from X if, for all i ∈ {1, . . ., k}, the edge (x i , y i ) is an (α, {x 1 , . . ., x i−1 })-exit from X .For technical reasons, we allow k = 0, i.e. the empty sequence will also be called an α-exit sequence from X .We say that the α-exit sequence from X is trivial if x i ∈ esc(X, α) for all i ∈ {1, . . ., k}.We say that it is non-trivial if the sequence is non-empty and if there exists i ∈ {1, . . ., k} such that x i ∈ X \esc(X, α).We say that the α-exit sequence from X is positive if the sequence is non-empty and if there exists i ∈ {1, . . ., k} such that x i ∈ pos(X, α).
We now say that a vector α ∈ R N is stable if safestep(x, α) = ∅ for all x ∈ N and if a positive α-exit sequence from X exists for all X ∈ X (α).We denote the set of stable vectors in R N by * .
Example 4.2 Let us illustrate the definitions of this section with the 3-player game in Fig. 4. Notice that X = {1, 2, 3} is the unique element of C, and thus the only candidate for an element of X (α) = P(α) ∩ E(α) ∩ C.
Remark The vector α = (0, 2, −1) can be obtained in two updates from the vector ρ = (−1, 1, −1) (the vector defined by ρ t = r t (t) for all t ∈ N ).In the example, each update adds to the edge-sequence from X , which is typically what happens if an element t ∈ X is updated and if B(t, α) ⊆ X.In Example 4.1, we saw how an α-exit from X can disappear by an update of t / ∈ X.However, such an update outside X cannot make a positive α-exit sequence disappear. ♦ The following lemma states some elementary facts about exit sequences.
Lemma 4.1 Let X ⊆ N and let α ∈ .
(i) If e is a non-empty α-exit sequence from X, then its first edge is an α-exit from X.
(ii) If e is an α-exit sequence from X, and if (x, y) is an edge of e with x ∈ esc(X, α), then the sequence e obtained from e by deleting the edge (x, y) is also an αexit sequence from X.Moreover, if e is a positive α-exit sequence from X and if X ∈ X (α), then e is also a positive α-exit sequence from X. (iii) If a non-trivial α-exit sequence from X exists, then a non-trivial α-exit from X exists.(iv) If e and f are both α-exit sequences from X , then the concatenation of these two sequences, denoted by (e, f), is also an α-exit sequence from X .If moreover, e {x 1 , . . ., x k }, y(e) = {y 1 , . . ., y k }, and k(e) = k.We will further use the notation ∅ ∅ for the empty sequence.Let X ∈ W(t, δ) and let e be an α-exit sequence from X disregarding t.It will be convenient to have terminology for an edge e = (x, y) such that (e, e) fails to be an α-exit sequence from X disregarding t only because y ∈ X .Say that (x, y) is an α-cap for (X, e, t) if x ∈ X \{t} and y ∈ X and if, for all v ∈ viable(y, α), we have x(e) ∪ (esc(X, α)\{t}) ⊆ sat(v, α) ⇒ x ∈ sat (v, α).
We denote by F(X, e, t, α) the set of α-caps for (X, e, t).Notice that ∅ ∅ is an α-exit sequence from X disregarding t, so F(X, ∅ ∅, t, α) is well-defined.
We remark that Lemma 4.1 remains valid when every occurrence of the phrase "sequence from X " in the lemma is replaced by the phrase "sequence from X disregarding t".The proof of this is completely analogous to the proof of Lemma 4.1.In particular, the concatenation of two α-exit sequences from X disregarding t is an α-exit sequence from X disregarding t.Since there exists at least one α-exit sequence from X disregarding t for every X ∈ W(t, δ) (the empty sequence), it is obvious then that, for every X ∈ W(t, δ), an α-exit sequence e * from X disregarding t exists such that x(e) ⊆ x(e * ) for all α-exit sequences e from X disregarding t.Let us call e * with this property a maximal α-exit sequence from X disregarding t.The following corollary now follows immediately from Lemma 4.8.Corollary 4.9 Let X ∈ W(t, δ) and let e * be a maximal α-exit sequence from X disregarding t.Then F(X, e, t, α) ⊆ F(X, e * , t, α) for every α-exit sequence e from X disregarding t.
For X ∈ W(t, δ) and an α-exit sequence e from X disregarding t, let us define H (X, e, t, α) as the graph with vertex set X and edge set F(X, e, t, α) ∪ {(t, u) | u ∈ B(t, α)}).Notice that this graph is well-defined.Indeed, for every edge in F(X, e, t, α), the endpoints are by definition in X .For every edge in {(t, u) | u ∈ B(t, α)}, we have t ∈ X by the fact that X ∈ W(t, δ), and we have u ∈ X , since u ∈ B(t, α) ⊆ safestep(t, δ) ⊆ X , where the first inclusion is by Lemma 3.9-(iii) and the second inclusion is by the fact that t ∈ X \esc(X, δ).Lemma 4.10 Let X ∈ W(t, δ), let e be an α-exit sequence from X disregarding t, let p be a path in H(X, e, t, α), and let g ∈ viable(last( p), α).Then x(e) ∪ (esc(X, α)\{t}) ⊆ sat(g, α) ⇒ p, g ∈ viable(first( p), α).
Let X ∈ W(t, δ) and let e be an α-exit sequence from X disregarding t.Notice that, for every x ∈ X , there exists y ∈ X such that (x, y) is an edge of H (X, e, t, α).This follows directly from Lemma 4.7 and the fact that B(t, α) = ∅.Thus, by Lemma 3.3, the graph H (X, e, t, α) has at least one ergodic set, and for every x ∈ X , there is a path in H (X, e, t, α) from x to an ergodic set of H (X, e, t, α).Lemma 4.11 Let X ∈ W(t, δ) and let e be an α-exit sequence from X disregarding t such that x(e) ∩ pos(X, α) = ∅.Then H (X, e, t, α) has an ergodic set Y with pos(Y, α) = ∅.Lemma 4.13 Let X ∈ W(t, δ) and let e * be a maximal α-exit sequence from X disregarding t.Then one of the following holds.
Proof If e * is a positive α-exit sequence from X , we are done.We assume from here that this is not the case.Then Lemma 4.11 applies, and we can choose an ergodic set Y of H = H (X, e * , t, α) with pos(Y, α) = ∅.A: We will first prove that t ∈ Y .Suppose that t / ∈ Y .A1: We claim that Y ∈ X (α).We trivially have Y ∈ P(α), and by the properties of an ergodic set, we have Y ∈ C. It remains to prove that Y ∈ E(α).
We obviously have pos(Y, α) ⊆ pos(X, δ).We also have Here, the equality is because t / ∈ Y , the first inclusion is by Lemma 4.12, the second inclusion follows by 3.9-(iv), and the third inclusion is trivial.We now conclude that where the equality is because X ∈ X (δ) ⊆ E(δ).This proves that Y ∈ E(α).A2: Since Y ∈ X (α) and since α ∈ * , we can choose a positive α-exit sequence e from Y .We claim that e is an α-exit sequence from X disregarding t.
We have x(e) ⊆ Y ⊆ X \{t}, where the first inclusion follows by the fact that e is an α-exit sequence from Y , and the second inclusion is by the fact that Y ⊆ X and t / ∈ Y .By definition of an α-exit sequence from Y , the sequence e satisfies, for all i ∈ {1, . . ., k(e)}, and all v ∈ viable(y i (e), α), (v, α).
It remains to prove that y(e) ⊆ S\X .Assume that this is not true.Since e is an α-exit sequence from Y , we have y(e) ⊆ S\Y = (X \ Y ) ∪ (S\X ).Then, by our assumption y(e) S\X , there must be h ∈ {1, . . ., k(e)} with the property y h (e) ∈ X \Y ⊆ X .Choose the smallest h with this property.Let f = (x h (e), y h (e)), and let f = (x j (e), y j (e)) h−1 j=1 .Observe then that f is an α-exit sequence from X disregarding t and that f is an α-cap for (X, f, t).Thus, f ∈ F(X, f, t, α) ⊆ F(X, e * , t, α), where the inclusion is by Lemma 4.9.We also have first( f ) ∈ Y and last( f ) ∈ S\Y , as f is an edge from an α-exit sequence from Y .The set Y is an ergodic set of the graph H , so f is not an edge of H . Thus, f / ∈ F(X, e * , t, α).Contradiction.So indeed, e is an α-exit sequence from X disregarding t.Then it is even a positive α-exit sequence from X disregarding t, since it is a positive α-exit sequence from Y .It also follows that e * is a positive α-exit sequence from X disregarding t, by the maximality of e * .Contradiction.B: So we have indeed t ∈ Y .We will prove by contradiction that condition (ii) of the lemma holds.Assume therefore that g ∈ viable(t, α) exists with x(e * )∪esc(X, α) ⊆ sat(g, α) and t / ∈ sat(g, δ).
We further exploit the properties of plan g and path p, chosen in B-1 to prove that t / ∈ pos(Y, α).Choose x ∈ N( p)\{t}.Now, from the fact that p, g violates AD-iii, we deduce that ∀y ∈ A(x)\Y, ∀v ∈ viable(y, α) : t ∈ sat(v, α) ∨ x ∈ sat (v, α), (1) as this formally negates the existence of a 'threat' plan for player x that starts at an element of A(x)\Y (Notice that the elements of A(x)\Y do not coincide with the follower of x on path p, since N( p) ⊆ Y ).We need to prove that Eq. ( 1) not only holds for x ∈ N( p)\{t}, but for all x ∈ Y \{t}.Choose x ∈ Y \{t} arbitrarily.By the properties of an ergodic set and the fact that t ∈ Y , a path q in H exists from t to x with N(q) ⊆ Y .Obviously, we may require that there is only one occurrence of t on this path, at the beginning.Let q denote the part of q that starts at the second state.There also exists a path r in H from x to t with N(r ) ⊆ Y , and we may require that there is only one occurrence of t on this path, at the end.Let p = q, r .Clearly, p is a path in H from first( p ) to t with N( p ) ⊆ Y and with only one occurrence of t on this path, at the end.We claim that first( p ) ∈ B(t, α).Indeed, this is true, because (t, first( p )) is an edge of the graph H and edges of the type (t, u) in H all have the property u ∈ B(t, α).
We now let p play the role of the earlier chosen path p, and argue similarly that p , g is an α-viable plan, such that p , g / ∈ admiss(t, first( p ), α).From the fact that plan p , g violates AD-iii, and the fact that plan p , g is constructed such that the first occurrence of x is before t, it follows that Eq. (1) holds indeed.

B-3: Proof that Y ∈ X (α).
We trivially have Y ∈ P(α).We also have Y ∈ C by the properties of an ergodic set.Since t / ∈ pos(Y, α), we may write We have where the first inclusion is by Lemma 4.12 and the second inclusion Lemma 3.9-(iv).
We also have pos(Y, α) ⊆ pos(X, δ), by the fact that Y ⊆ X and α ≤ δ.Therefore, where the last equality is by the fact that X Since Y ∈ X (α) and since α ∈ * , we can choose a positive α-exit sequence e from Y .Let e denote the sequence that results from e by deleting all edges (x j (e), y j (e)) with the property x j (e) = t.We claim that e is a positive α-exit sequence from Y disregarding t.
Since e is an α-exit sequence from Y , we have x(e) ⊆ Y .Then x(e) ⊆ Y \{t} due to the construction of e. Obviously, we have y(e) ⊆ S\Y , as e is an α-exit sequence from Y .
This proves that e is an α-exit sequence from Y disregarding t.Since e is a positive α-exit sequence from Y , we have x(e) ∩ pos(X, α) = ∅.We have t / ∈ pos(X, α), by the result of B-1.Therefore, x(e) ∩ pos(X, α) = x(e) ∩ pos(X, α) = ∅.It follows that e is a positive α-exit sequence from Y disregarding t.
The sequence e will now be used to derive a contradiction.
So e is a positive α-exit sequence from X disregarding t.Then e * is also a positive α-exit sequence from X disregarding t, since we have x(e * ) ∩ pos(X, α) ⊇ x(e) ∩ pos(X, α) = ∅ by the maximality of e * .This contradicts our assumption at the beginning of the lemma.Case 2: Suppose that y(e) S\X .Then there must be h ∈ {1, . . ., k(e)} with the property y h (e) ∈ X .Choose the smallest h with this property.Let f = (x h (e), y h (e)), and let f = (x j (e), y j (e)) h−1 j=1 .Observe then that f is an α-exit sequence from X disregarding t and that f is an α-cap for (X, f, t).Then f ∈ F(X, f, t, α) ⊆ F(X, e * , t, α), where the inclusion is by Lemma 4.9.We also have first( f ) ∈ Y and last( f ) ∈ S\Y , as f is an edge from an α-exit sequence from Y .The set Y is an ergodic set of the graph H , so f is not an edge of H . Thus, f / ∈ F(X, e * , t, α).Contradiction.
Lemma 4.14 For all X ∈ W(t, δ), a positive δ-exit sequence from X exists.
Proof Let X ∈ W(t, δ) and let e * be a maximal α-exit sequence from X disregarding t.If e * is a positive α-exit sequence from X disregarding t, then it is also a positive δ-exit sequence from X by Lemma 4.6, and we are done.Assume from here that e * is not a positive α-exit sequence from X disregarding t.Then, by Lemma 4.13, we have for all v ∈ viable(t, α), x(e * ) ∪ esc(X, α) ⊆ sat(v, α) ⇒ t ∈ sat (v, δ). (2) We distinguish between the cases t ∈ esc(X, α) ∩ pos(X, α) and t / ∈ esc(X, α) ∩ pos(X, α).
It is straightforward to prove that an edge in (e * , f) of the form (x j (f), y j (f)) and with x j (f) = t is a (δ, {x 1 (f), . . ., x j−1 (f)})-exit from X , using the fact that f is an α-exit sequence from X .Then the edge is also a (δ, x(e * )∪{x 1 (f), . . ., x j−1 (f)})-exit from X .
The following result is a direct consequence of Corollary 4.5 and Lemma 4.14. 123

Fig. 1
Fig. 1 Representation of two example games B-1: Proof that t / ∈ pos(Y, α).Choose u ∈ B(t, α).Since t ∈ Y and since (t, u) is an edge of the graph H , it follows that u ∈ Y , by the properties of an ergodic set.Also by the properties of an ergodic set, a path p in H from u to t exists with N( p) ⊆ Y .Notice that p, g ∈ viable(u, α), by Lemma 4.10.Then, plan p, g violates all three conditions AD-i, AD-ii, and AD-iii.Now, from the fact that p, g violates AD-ii, we deduce that t / ∈ pos(Y, α).