Subgame perfection in recursive perfect information games

We consider sequential multi-player games with perfect information and with deterministic transitions. The players receive a reward upon termination of the game, which depends on the state where the game was terminated. If the game does not terminate, then the rewards of the players are equal to zero. We prove that, for every game in this class, a subgame perfect ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-equilibrium exists, for all ε>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon > 0$$\end{document}. The proof is constructive and suggests a finite algorithm to calculate such an equilibrium.


Introduction
We study multi-player games where play proceeds from one state to another and where each transition is decided by one of the players. That is, each state is controlled by one of the players and it is the controlling player of a state who decides what the next state We would like to thank the associate editor and three anonymous referees for their very valuable comments. will be. We do not consider chance moves in our model and the number of states is finite. The players receive a (possibly negative) reward upon termination of the game. Termination is decided by the controlling player of the active state, who always has this option instead of moving to another state. The rewards to the players depend on the state where the game is terminated. Infinite play is a possibility in these games, as no player is forced to terminate the game (unless the structure of the game leaves him no other option). Infinite play is associated with a zero reward for all players.
Our games belong to the more general class of dynamic games with perfect information, which have numerous applications in economic theory, computer science, and other disciplines. One of the main goals in the literature has always been to identify conditions that guarantee the existence of a subgame perfect equilibrium, or at least of a subgame perfect ε-equilibrium for every positive error term ε. For our class of games, we prove the existence of a subgame perfect ε-equilibrium, for every ε > 0. Our existence result extends several earlier results, where further restrictions were imposed on either the transition structure or the reward structure of the game. A subgame perfect ε-equilibrium, for every ε > 0, was previously established in games with only nonnegative rewards (Flesch et al. 2010a), in free transition games (Kuipers et al. 2013), and in games where each player only controls one state (Kuipers et al. 2016).
In most economic models, payoffs are bounded and discounted, and this automatically guarantees continuity at infinity, a condition defined by Fudenberg and Levine (1983). For the topological meaning of continuity at infinity, we refer to Ritzberger (2016, 2017). Even though in our model payoffs are not discounted, our results have an implication for the discounted case. Indeed, the joint strategies that we construct are not only subgame perfect ε-equilibria in the undiscounted game, but also in the discounted game, provided that the discount factor is sufficiently close to 1 [cf. the notion of uniform ε-equilibrium, e.g., the survey by Jaśkiewicz and Nowak (2016)]. The strategy profile is thus independent of the discount factor, provided it is large enough, so the knowledge of the exact discount factor is not required.
The undiscounted game on its own is also interesting in the context of negotiations or delegation problems, when there is no specific deadline given for an agreement. An example of this can be found in the paper by Bloch (1996), where the negotiation process for coalition formation is modeled as a positive recursive game. A positive recursive model is limited to situations where any agreement is always better than no agreement, for all players. The generalization to recursive games that are not necessarily positive removes this limitation and allows for models, where some players may wish to sabotage certain outcomes.
The relevance of our paper, we think, mostly lies in the fact that we obtain insight in the structure of equilibria in perfect information games with deterministic transitions. Let us briefly discuss this. In general, one can distinguish two essentially different reasons why an error term may be needed for equilibrium play in a dynamic game. It could be that, in subgame perfect ε-equilibrium play, every action that is played with positive probability gives the player a reward that is very close or equal to his initially expected reward. Let us say that this is an error term of the first type. It could also be that, in subgame perfect ε-equilibrium play, a player must place a very small probability on an action that will lead to a substantially lower reward than initially expected. Let us say that this is an error term of the second type. For a game in our class, if an error term ε is required, it is always of the second type, and the suboptimal action that is played with small probability invariably serves as a threat to one of the other players to make him follow the plan (Fig. 1).

Fig. 2
Detours in a subgame perfect ε-equilibrium player 1, which retaliates any deviations by player 1. Note that the error term ε is of the second type. We remark that all subgame perfect ε-equilibria in this game have this feature: player 2 threatens player 1 with termination at s 2 . probability 1 − ε and terminates with probability ε (regardless the history). Player 3's strategy is not stationary: if state s 3 is reached from state s 1 then player 3 nominates state s 2 with probability 1, whereas if state s 3 is reached from state s 2 then player 3 nominates state s 1 with probability 1.
In this strategy profile, if player 1 deviates by nominating state s 3 , then instead of moving back to state s 1 directly, a detour is made via player 2, because player 2 is the only player who has a threat action against player 1. One can verify that such a detour, at least with a positive probability, is necessary to obtain a subgame perfect ε-equilibrium. This underlines the difficulty to construct subgame perfect ε-equilibria in our class of games. We remark that, in more complex games, a threat is sometimes not immediate termination by a player with small probability, but rather a complete sequence of actions that a player can start with small probability.
Interestingly, our analysis shows that just one computational effort suffices to find a subgame perfect ε-equilibrium for every ε > 0. The only difference between these equilibria is the probability with which a threat action should be executed. Examples 1 and 2 indicate how this works. The analysis in this paper suggests that it is unlikely that such a computation can be done efficiently: A naive implementation of the procedure we propose in this paper obviously requires super-exponential time. This is in contrast with the situation of nonnegative rewards, for which Flesch et al. (2010a) proved that a subgame perfect equilibrium can be computed in polynomial time.
Although the exact computation of a subgame perfect ε-equlibrium likely becomes intractable already for moderately sized problems, our results are probably useful for finding a good quality of solutions. As an illustration, let us see what happens if we introduce a discount factor β ∈ (0, 1) in the model to simplify the analysis. It follows from a result by Fink (1964) and Takahashi (1964) that the discounted model has a subgame perfect equilibrium in stationary strategies. For Example 1, we then have precisely one stationary equilibrium, which is also subgame perfect, and where both players should terminate the game with probability 1−β 2 β(2−β) when they are active. This means that the game will terminate with probability 1, and when β is close to 1, both states have a probability of approximately 1 2 of termination. This totally ignores the fact that, given termination of the game, both players have an interest in termination at state 1.
Readers who are only interested in the construction of a subgame perfect εequilibrium for a game in our class and why it is indeed an equilibrium, can limit themselves to reading the first four sections of this paper. We formally introduce our model in Sect. 2, we introduce terminology and strategic concepts in Sect. 3, and we give a proof of our main result Sect. 4. The proof in Sect. 4 makes use of a fixed point theorem, which we prove in Sect. 5.

Formal model
Our class of games was informally introduced as consisting of games that potentially have infinite horizon, but where players only obtain a nonzero reward if one of them chooses to terminate. We formally introduce our class as always having infinite play. This is done by letting termination correspond to entering an absorbing state, after which the game continues, but is strategically over. We consider the class G of dynamic games given by (1) a non-empty set of players N = {1, . . . , n}, where n ∈ N, (2) a non-empty and finite set S of non-absorbing states and a set S * of absorbing states such that there is a one-to-one correspondence between states in S and S * ; the state in S * that corresponds to t ∈ S is denoted by t * , (3) for each state t ∈ S ∪ S * , an associated controlling player i t ∈ N , (4) for each state t ∈ S, a set of actions A(t) ⊆ {t * } ∪ (S\{t}) with t * ∈ A(t); for each state t * ∈ S, we have A(t * ) = {t * }, (5) for each state t ∈ S, an associated reward vector r (t) ∈ R N .
A game in G is to be played at stages in N in the following way. At any stage m one state is called active. If t ∈ S is active, then player i t announces a state in A(t), and the announced state will be active at the next stage. If t * ∈ S * becomes active, then the unique state t * ∈ A(t * ) will be active at the next stage and thus, t * will be active forever. The game is then strategically finished and the rewards to the players are according to r (t). The game starts with an initial state s ∈ S.
We assume complete information (i.e., the players know all the data of the game), full monitoring (i.e., the players observe the active state and the action chosen by the active player), and perfect recall (i.e., the players remember the entire sequence of active states and actions).

Basic concepts and terminology
It will be necessary to develop a rather extensive notation and terminology in this paper. Here, we introduce the basics.
Let us define the directed graph G by G = (S ∪ S * , {(x, y) | x ∈ S ∪ S * and y ∈ A(x)}).
This graph can obviously be interpreted as the graph on which the game is played. Whenever we refer to an ordered pair (x, y) as an edge, it is implicit that (x, y) is an edge of the directed graph G, and hence that y ∈ A(x). Let us also have notation for the set of non-absorbing states that are controlled by one particular player. For every i ∈ N , we define Obviously, the sets S i form a partition of the set S of non-absorbing states.
Let us now introduce the basic concepts of this paper. Plans: A plan is an infinite sequence of states g = (t m ) m∈N , such that (t m , t m+1 ) is an edge for all m ∈ N. A plan is interpreted as a prescription for play for a game with initial active state t 1 . The set of non-absorbing states that become active during play if plan g is executed is denoted by S(g), i.e., S(g) = {t ∈ S | ∃m ∈ N : t m = t}.
Notice that, if the initial state of g is an element of S * , then g is of the form (t * , t * , . . .), with t * ∈ S * . Such a plan will also be denoted as (t * ). Also, if plan g contains a state in S * , say t * , and the initial state of g is an element of S, then we must have t ∈ S(g) and there must be a stage M with t M = t and with t m = t * for all m > M. This is interpreted as a prescription for player i t to announce his absorbing state t * at stage M . We say that the plan absorbs at t if this is the case. Otherwise, we say that the plan is non-absorbing. An absorbing plan, for example (r , s, t, t * , t * , . . .) will also be denoted as (r , s, t, t * ). We denote by φ i (g) the reward to player i ∈ N when play is according to g, i.e., φ i (g) = r i (t) if g absorbs at t, and φ i (g) = 0 if g is non-absorbing. The initial state of plan g is denoted by first(g).
Paths: A path (or history) is a finite sequence p = (t m ) k m=1 with k ≥ 1, such that (t m , t m+1 ) is an edge for all m ∈ {1, . . . , k − 1}. The number k − 1 is called the length of p. The initial state t 1 of p is denoted by first( p) and the final state t k is denoted by last( p). We will sometimes want to concatenate a number of paths to make a longer path or a plan, or we may want to concatenate a finite number of paths and a plan to make another plan. We allow concatenation if p 1 , p 2 , . . . , p m are paths that satisfy last( p k ) = first( p k+1 ) for all k ∈ {1, . . . , m − 1}. The concatenation of these paths is denoted by p 1 , p 2 , . . . , p m and it represents the path that follows the prescription of p 1 from first( p 1 ) to last( p 1 ) = first( p 2 ), then follows the prescription of p 2 until last( p 2 ) = first( p 3 ) is reached, and so on, until last( p m ) is reached. Also, if g is a plan with first(g) = last( p m ), then the plan that first follows the prescription of p 1 , p 2 , . . . , p m and then switches to g is denoted by p 1 , . . . , p m ,g . Finally, if we have an infinite number of paths p 1 , p 2 , . . . with the property last( p k ) = first( p k+1 ) for all k ∈ N, then p 1 , p 2 , . . . represents the path 1 or plan that subsequently follows the prescription of p 1 , p 2 , etc.
Strategies: A strategy π i for player i ∈ N is a decision rule that, for any path p with last( p) ∈ S i , prescribes a probability distribution π i ( p) over the elements of A(last( p)). We use the notation Π i for the set of strategies for player i. A strategy π i ∈ Π i is called pure if every prescription π i ( p) places probability 1 on one of the elements of A(last( p)). We use the notation Π for the set of joint strategies Expected rewards: Consider a joint strategy π ∈ Π and a path p. Suppose that the game has developed along the path p and that state last( p) is now active. Suppose further that all players, starting at last( p), follow the joint strategy π , taking p as the history of the game. Denote the overall probability of absorption at t by P p,π (t). In our model, where nonzero reward are only obtained in absorbing states, the expected reward for player i ∈ N can then be expressed as Equilibria: Consider a joint strategy π ∈ Π and a game that has developed along the path p. The joint strategy π = (π i ) i∈N ∈ Π is called a (Nash) ε-equilibrium for path p, for some ε ≥ 0, if which means that, given history p, no player i can gain more than ε by a unilateral deviation from his proposed strategy π i to an alternative strategy σ i . The joint strategy π is called an ε-equilibrium for initial state s ∈ S if π is an ε-equilibrium for path (s).
The joint strategy π is called a subgame perfect ε-equilibrium if π is an ε-equilibrium for every path p.

Strategic concepts and an update procedure
In this section, we introduce the strategic concepts that we need for the description of a subgame perfect ε-equilibrium. These concepts all involve the assignment of a real number to each of the non-absorbing states, represented by a real vector α ∈ R S . One of the key concepts in the paper is that of an α-viable plan. These are plans g where every player who controls a state t ∈ S on g will receive a reward of at least α t when plan g is executed. The vector α is chosen such that every plan that can possibly occur in a subgame perfect ε-equilibrium is surely contained in the set of α-viable plans. Initially, the set of α-viable plans may also contain plans that do not occur in any subgame perfect ε-equilibrium play for small enough ε. Our aim is to eliminate those plans by increasing one or more coordinates of α in an update procedure. The update procedure is repeated until no further increase in the coordinates of α is possible. The final vector α will then be used to construct a subgame perfect ε-equilibrium for every ε > 0.
Viable plans: For α ∈ R S , a plan g and a state t ∈ S, we say that t is α-satisfied by g if φ i t (g) ≥ α t . We define sat(g, α) = {t ∈ S | t is α-satisfied by g}. We say that plan g is α-viable if S(g) ⊆ sat (g, α). This means that, if play is according to g, the controlling player of every non-absorbing state t that becomes active during play will receive a reward of at least α t . For every state t ∈ S ∪ S * , we denote the set of α-viable plans g with first(g) = t by viable(t, α). Notice that a plan of the form g = (t * , t * , . . .) with t * ∈ S * is trivially α-viable, since S(g) = ∅, and that the set viable(t * , α) consists of only the plan (t * ).
Compatible plans: Consider that a player i ∈ N can influence play by choosing a specific action if play visits one of his states, say t ∈ S. Now, if every α-viable plan after the selected action yields a strictly higher reward for player i than α t , then α t can be increased without eliminating any plan that may occur in equilibrium. This idea formed the basis for the iterative procedure in Flesch et al. (2010b) and Kuipers et al. (2016). In those papers, it was sufficient to consider only one state at a time per iteration to eventually eliminate all non-equilibrium plans. The approach fails for the trivial 1-player game in Fig. 3. Note that this game has one subgame perfect equilibrium, which is for the one player to never terminate the game. The values α 1 A = α 1 B = 0 correspond to this equilibrium. As an illustration, we set α 1 A = α 1 B = −1, which are the rewards of the game at termination. Then every plan is α-viable. If player 1 specifies action 1 B ∈ A(1 A ) for when state 1 A is visited, but does not specify a particular action for when 1 B is visited, then termination at 1 B is in accordance with the specification and α-viable. An iterative procedure would reflect this by letting α unchanged and thus fail to eliminate any of the absorbing plans. Our iterative procedure should reflect the fact that player 1 is able to coordinate his actions in 1 A and 1 B . We therefore consider that a player can select an action at multiple states simultaneously. This leads to the definition of compatible plans.
For α ∈ R S and t, u ∈ S, we say that state t is α-safe at state u if t ∈ sat(g, α) for all g ∈ viable(u, α). For t * ∈ S * , it will be convenient to say that t * is α-safe at t * . We define, for all t ∈ S, For α ∈ R S and a non-empty set F ⊆ S, we say that F is an α-plateau if there exists i ∈ N such that i t = i for all t ∈ F and if α s = α t for all s, t ∈ F. An α-plateau that is maximal with respect to inclusion is called an α-level.
For α ∈ R S , we say that a function U : F → S ∪ S * is an α-safe combination if the domain F of U is an α-plateau and if we have U (t) ∈ safestep(t, α) for all t ∈ F. If the domain of an α-safe combination U is not explicitly specified, then it will be denoted by F(U ). We denote the set of all α-safe combinations by U(α) and the set of α-safe combinations with given domain F by U(F, α).
For a plan g and an α-safe combination U , we now say that plan g is U -compatible if, for every state t ∈ S(g) ∩ F(U ), the first occurrence of t on g is followed by U (t). A path p is U -compatible if, for every state t ∈ S(g) ∩ F(U ), the first occurrence of t on p is followed by U (t) unless the first occurrence of t is at the end of p. For every t ∈ S we denote the set of plans in viable(t, α) that are U -compatible by viacomp(t, U , α). Now consider again the 1-player game in Fig. 3, where we set α by α 1 A = −1 and α 1 B = −1. We define U by U (1 A ) = 1 B and U (1 B ) = 1 A . Notice that U is indeed an α-safe combination. Also notice that the α-viable plans (1 A , 1 * A ), are not elements of viacomp(t, U , α). Nevertheless, there are still plans in viacomp(t, U , α) that should be eliminated if we wish to find the unique subgame perfect equilibrium associated with this exam- are examples of this. The set viacomp(t, U , α) thus only serves as a pre-selection of plans that are subject to further scrutiny to see if they can remain. For this, we introduce the concept of an admissible plan.
Admissible plans: Consider again the game depicted in Fig. 1, where we set α by α s 1 = −1 and α s 2 = 2. We define U by U (s 1 ) = s 2 . Then U is an α-safe combination, and the plan (s 1 , s 2 , s 1 , s * 1 ) is an element of viacomp(t, U , α). Here, we do not wish to eliminate the plan, as it is a plan that can occur in equilibrium play. The reason this plan will be considered admissible is the fact that player 2, who controls state s 2 on the plan, can threaten player 1 with termination of the game at s 2 . If player 1 does not follow the plan and nominates s 2 always when s 1 is active, then the threat will eventually become reality if player 2 places a small probability on executing the threat. So the intuition here is that player 1 has no possibility to force a better outcome than (s 1 , s 2 , s 1 , s * 1 ). This is in contrast with the plan (1 A , 1 B , 1 A , 1 * A ) for the game in Fig. 3, where player 1 can easily force a non-absorbing plan without the possibility of retaliation. The formal criteria for admissibility distinguishes between these two situations and are given below.
Let α ∈ R S , let U ∈ U(α), and let t ∈ F(U ). For a plan g ∈ viacomp(t, U , α), we say that g is (t, U , α)-admissible if it satisfies at least one of the following four conditions. AD-i α t > 0 or there exists a state x on g with i x = i t and α x > α t that appears on g before any state of F(U ) has appeared for the second time; AD-ii g is non-absorbing; AD-iii each state of F(U ) occurs at most once on g; AD-iv there exists a threat pair (x, v) for g. Here, x and v are a state and a plan respectively that satisfy the following properties: (a) x ∈ S and x appears on g before any state of F(U ) has appeared for the second time on g, We denote the set of plans that are (t, U , α)-admissible by admiss(t, U , α). We can gain some additional insight in the definition of a (t, U , α)-admissible plan, by considering, for a plan g ∈ viacomp(t, U , α) an associated plan g U . Plan g U is the plan where player i t chooses his selected actions defined by U always, and the other players keep their actions the same as in g. We may have g U = g, for example when every state in F(U ) appears at most once on g. The plans g and g U may also differ, which happens when at least one state of t ∈ F(U ) appears at least twice on g and t is not always followed by U (t). In the latter case g U is a non-absorbing plan, where a certain part of g is followed infinitely many times. We compare the plans g and g U . If the comparison comes out in favor of g U , then plan g can be discarded, i.e., plan g will not be considered admissible. Let us interpret the conditions for admissibility one by one in this way.
Condition AD-i: If α t > 0 and the plan g U is non-absorbing, then g U gives a lower reward to i t than g does. If α t > 0 and the plan g U is absorbing then g U = g. In either case, g cannot be discarded in favor of g U . If there exists x on the plan g with i x = i t = i and α x > α t , then g is guaranteed to give a strictly higher reward than α t . Here, we keep g because this will not hinder an increase in α t . Also, it will be convenient to exclude this situation when we later consider plans that satisfy AD-iv, but not AD-i, AD-ii, or AD-iii.
Condition AD-ii: If g is non-absorbing, then both g and g U are non-absorbing, with the same reward 0. There is therefore no reason to discard g.
Condition AD-iii: If each state of F(U ) appears at most once on g, then g = g U .
Condition AD-iv: This describes a situation, where a player other than i t , who controls a state x on the U -compatible plan g, has the possibility to switch from g to an αviable plan v with t, x / ∈ sat(v, α). Due to condition AD-iv-(a), state x is also on plan g U . State x does not necessarily lie on the part of g U that is repeated, but to obtain intuition we assume that state x does lie on that part. Now imagine that the players are supposed to follow plan g, except for the player i x , who is required to place a very small probability on the switch to v when state x is active. Then play will be according to g with very high probability if players indeed follow this prescription. If however player i t deviates by always playing U (s) for all s ∈ F(U ) when s is active, in an attempt to force play according to g U , then this will eventually fail, since the switch to v will then be made with probability 1. Thus, the deviation by player i t is not profitable for him, since t / ∈ sat(v, α). The requirement x / ∈ sat(v, α) is there because player i x should not be tempted to increase the probability of a switch to v. These considerations are the motivation to call g admissible and to not discard g in favor of g U . An update procedure: Let α ∈ R S and let U ∈ U(α). We define, for all t ∈ F(U ), We use the convention min ∅ = ∞, so that β(t, U , α) is well defined for all t ∈ F(U ).
We also define Note that the plans in admiss(t, U , α) are all α-viable, for every t ∈ F(U ). Thus, β(t, U , α) ≥ α t for all t ∈ F(U ), and hence also γ (U , α) ≥ α t for any representative t ∈ F(U ). One can interpret the number γ (U , α) as the worst possible reward for the player controlling the states of F(U ) when play visits a state t ∈ F(U ) and if he selects action U (t) when this happens. Now, we replace in α the number α t by the number γ (U , α) at every coordinate t with t ∈ F(U ). Let us denote the updated vector by δ(U , α).
The update procedure performs a simultaneous update on the states of a given αplateau. The idea is to repeat the procedure over and over until the updates do not change any α-values, for any given α-plateau. The game admits the following subgame perfect ε-equilibrium. When state 1 a is active, player 1 should nominate state 2, when state 2 is active, player 2 should nominate state 1 b , and when state 1 b is active, player 1 should nominate state 3 with high probability and place a small probability on terminating the game. Finally, player 3 should terminate the game when state 3 is active. We will go through the process of updating to see how it all works.

Initialization:
We choose α such that every plan in equilibrium play will surely be α-viable. A good choice is to set α t at the reward for player i t if play terminates at t. Thus, we set α 1 a = −2, α 2 = 1, α 1 b = −1, α 3 = −1.
Iteration 1: To obtain an overview of all α-safe combinations, we determine the sets safestep(t, α) for all t ∈ S. The result can be seen in the picture below, where (x, y) is represented by a solid arrow if y ∈ safestep(x, α) and by a dashed arrow otherwise (Fig. 5).
Iteration 3: We now have ( Fig. 7) Here, we have several α-safe combinations to consider, all involving the states of player 1. The good choice is to define U by U (1 a ) = 2 and U (1 b ) = 3. Observe that all plans in viacomp(1 a , U , α) and viacomp(1 b , U , α) terminate at state 3, and that admiss(t, Iteration 4: Further attempts to update α do not lead to an increase in any of its coordinates. The current α-values indicate that, under equilibrium play, the game will terminate at state 3 (with high probability) (Fig. 8).
A final calculation will demonstrate how the subgame perfect ε-equilibrium should be played. Note that 1 a ∈ safestep(3, α). We define the α-safe combination U by U (3) = 1 a . Then the plan (3, 1 a , 2, 1 b , 3, 3 * ) is an element of viacomp(3, U , α). Observe that this plan is also element of admiss(3, U , α) due to condition AD-iv: The , which is also the threat pair needed in equilibrium play.

Introduction
For this section, we choose an arbitrary game G in the class G. We also choose the parameter ε > 0. We keep G and ε fixed throughout this section and we prove that the game G has a subgame perfect ε-equilibrium.
For the description of a subgame perfect ε-equilibrium for G, we will use the fact that a vector α * ∈ R S exists with the following properties: The proof that such a vector indeed exists is delayed until Sect. 5. The properties F-i and F-ii essentially describe the existence of a fixed point for the update procedure from Sect. 3.
Property F-i can be used to formulate a (pure) joint strategy such that, for every state t ∈ S that is visited during play, player i t can expect a reward of at least α * t . This can be achieved by prescribing an α * -viable plan that should be executed in its entirety with probability 1.
Property F-ii can be used to formulate a (pure) joint strategy such that, for every state t ∈ S that is visited during play, player i t can expect a reward of at most α * t if he plays an action that is not prescribed by the joint strategy. This can be achieved by selecting a new α * -viable plan to follow after a deviation. The following lemma shows that property F-ii makes the selection of such a new plan indeed possible.
Lemma 1 Let t ∈ S and u ∈ A(t).
Proof Proof of (i): Let t ∈ S and let u ∈ safestep(t, α * ). Denote by U be the α * -safe combination with domain {t} and with U (t) = u. By property F-ii, we have By definition of the number β(t, U , α * ), there exists a plan h ∈ admiss(t, U , α * ) The plan h is α * -viable, since admiss(t, U , α * ) is by definition a subset of viable(t, α * ). The part of h that starts at the second state (i.e., at u) is the required plan g ∈ viable(u, α * ) with φ i t (g) = α * t . Claim (ii) of the lemma follows by the definition of the set safestep(t, α * ).
An informal description of a subgame perfect ε-equilibrium. Consider a deterministic joint strategy, where initially, an α * -viable plan is selected for the players to follow in its entirety. Only if a player deviates, a new α * -viable plan is selected, such that the new plan minimizes the payoff to the deviating player. Note that a single deviation or even finitely many deviations do not profit a deviating player, by the result of Lemma 1. If α * ≥ 0, then infinitely many deviations do not help the deviating player either. Indeed, if α * ≥ 0, then the formulated joint strategy constitutes a subgame perfect 0-equilibrium. The situation is more complex when the vector α * has negative coordinates. The game of Example 1 is typical for this situation, where play according to a subgame perfect ε-equilibrium was achieved by placing a small probability on a non-credible threat. This way of playing a subgame perfect ε-equilibrium can be generalized to work for every game in our class. Specifically, at each stage of the game, the players are given a prescription of play that consists of a main plan g and possibly, depending on the properties of g, a threat pair (x, v) for g. If the prescription consists of only the main plan g, then the players are supposed to follow plan g. If the prescription consists of a main plan g together with a threat pair (x, v), then the players are supposed to follow plan g until the first occurrence of state x on g is reached. The controlling player of state x is then required to perform a lottery, where he places a high probability on the continuation of plan g and a small probability on the switch to plan v.
A joint strategy can now be formulated as follows. The game begins with an initial prescription, which could be any α * -viable main plan g. A new prescription is selected when a player does not choose an action with positive probability according to the current prescription. Note that the lottery player may deviate from the prescription without instigating a new prescription, as long as he chooses a continuation of the main plan or a switch to the threat plan. A new prescription is chosen such that its main plan minimizes the reward of the deviating player among the available admissible plans. Note that a threat pair (x, v) can be part of the new prescription only if the main plan g is admissible due to condition AD-iv, as threat pairs are defined only for such plans. If this happens, then the execution of plan v is indeed a threat to the deviating player (who is identified as the player controlling the initial state of g), since this player strictly prefers g over v, by AD-iv-(e). Moreover, the execution of v is a non-credible threat, since player i x , who must make the switch from g to v, also strictly prefers g over v, by AD-iv-(e). A non-credible threat makes sure that player i x cannot make a profit by increasing the probability of a switch to v. Now, prescriptions consisting of a main plan with a threat pair are essentially there to make it impossible for a deviating player to deviate infinitely many times. Conceivably however, the deviating player may still establish infinite play when lotteries with a threat are prescribed as retaliation for his deviations. This would happen if the deviating player became active again and again after every deviation, before the lottery state is reached and before absorption takes place. By an appropriate choice of the prescriptions, we can however establish a bound on the number of times that a deviating player can avoid absorption or the execution of a lottery. This will ensure a lottery at more or less regular intervals and finally execution of the threat plan with probability 1 when a player keeps deviating.
In Sect. 4.2, we will establish a ranking of the states of each α * -level. The ranking will be the tool to make sure that infinitely many deviations cannot occur. In Sect. 4.3, we give a description of a joint strategy π ε , which is the detailed and complete version of the description given here. Then, in Sect. 4.4 we prove our main result, which is that π ε is a subgame perfect ε-equilibrium.
Let us choose α * ∈ R S such that it has properties F-i and F-ii and let us keep α * fixed for the remainder of this section.

A ranking of the states
Let U ∈ U(α * ). We will be interested in all admissible plans that can be associated with the α * -safe combination U . We define therefore For every g ∈ admiss(U , α * ), we wish to identify the set of states on g, where the deviating player could sensibly deviate from g, to avoid a lottery or to avoid absorption at a state with negative reward for him. The following definition of the set D(g, U ) simply lists the cases. We define, for g ∈ admiss(U , α * ), the set D(g, U ) by (Here, infinitely many deviations do not profit the deviating player.) D-ii If g violates the conditions AD-i and AD-ii, but satisfies condition AD-iii, then we define D(g, U ) = S(g − ) ∩ F(U ), where g − is the part of g that starts at the second state of g. (Here, any deviation before absorption could be profitable, if it could be infinitely repeated.) D-iii In all other cases, i.e., if g violates the conditions AD-i, AD-ii, and AD-iii, but satisfies condition AD-iv, there exists a threat pair (x, v) for g. In this case, we choose state x as close as possible to the initial state of g, and we define D(g, U ) as the set of states in F(U ) that appear on g from the second state of g until the first occurrence of x on g. (Here, a deviation should really be before the lottery, as there are only finitely many opportunities available after the lottery.) Notice that the first state of g can only be a member of D(g, U ) if that state reappears on g. This is because we will interpret the first state of a plan in admiss(U , α * ) as the state where a deviation just took place and the second state as the deviation. Let t ∈ S and u ∈ A(t). Imagine that the choice for u ∈ A(t) at state t is not according to the prescription and that u ∈ safestep(t, α * ). Then, for the purpose of punishment, we choose U ∈ U(α * ) with t ∈ F(U ) and U (t) = u, and a plan g in admiss(t, U , α * ) that minimizes the reward to player i t . By property F-ii, the reward equals α * t . Ideally, we choose g such that also D(g, U ) = ∅ holds. This may not always be possible, but we do have the following lemma.

Lemma 2 For every U
where i is the controlling player of the states in F(U ) and t is any state in F(U ).
then the claim of the lemma follows immediately by setting t = s and g = h. We assume further that D(h, U ) = ∅. This rules out the possibility that h satisfies AD-i or AD-ii. We distinguish between the two remaining possibilities.
Case 1: Assume that plan h satisfies AD-iii. Then each element of D(h, U ) is visited exactly once on h. We define t as the state of D(h, U ) that is visited last on h and we define g as the plan with first(g) = t that follows the prescription of h from the unique occurrence of t on h. It is obvious that g ∈ admiss(t, U , α * ) due to property AD-iii and that D(g, U ) = ∅.
Case 2: Assume that plan h satisfies AD-iv but not AD-iii. Then there exists a threat pair (x, v) for h. By condition AD-iv, every element of D(h, U ) is visited exactly once on h before the first occurrence of x on h. Define t as the state of D(h, U ) visited last on h before x. Construct plan g with first(g) = t as follows.
Follow plan h from the first occurrence of t on h until the next occurrence of a state in F(U ), say r . If r is a state of F(U ) that is visited for the first time during the construction of g and if the corresponding location of r on h is not the first occurrence of r on h, then we jump back to the first occurrence of r on h. From there, we follow h again. We proceed, jumping back to an earlier location on h every time a state of F(U ) is visited for the first time during construction of g and if the corresponding location on h is not the first occurrence of that state on h.
The construction trivially results in a plan g with S(g) ⊆ S(h). It is also clear that a jump back during the construction can occur only a finite number of times. The resulting plan g will therefore have its tail the same as h, which implies that Further, g is U -compatible, since at the first visit of a state in F(U ) during construction of g, the action of that state's first occurrence on h is copied to g. Thus, g ∈ viacomp(s, U , α * ).
Notice that the construction of g starts at the first occurrence of t on h, after which the construction of g proceeds uninterrupted by jump backs until x is reached. Indeed, by the choice of t, there are no states of F(U ) on h between t and x where such a jump back might occur. This demonstrates obviously that x appears on plan g, that the only element of F(U ) appearing on g before x is t, and that t appears exactly once before x. Thus, the threat pair (x, v) for h can also serve as threat pair for plan g, and we may conclude that g ∈ admiss(t, U , α * ) due to property AD-iv. Now, if g does not satisfy AD-iii, then definition D-iii applies, and we may conclude that D(g, U ) consists of the states of F(U )\{t} that appear before x on g. That is, we may conclude It remains to prove that AD-iii does not apply to g, i.e., that one of the states of F(U ) appears more than once on g. By assumption, plan h does not satisfy AD-iii, so we have a state r ∈ F(U ) that appears more than once on h. At least one of the occurrences of r on h comes after the first occurrence of x on h, as all states of F(U ) before x are different. It follows that state r appears on plan g, since obviously, all states on h that come after t are eventually visited during the construction of g. If the first appearance of r during the construction of g is upon arrival at the first location of r on h, then r will obviously reappear during the construction of g at a later stage, at the latest upon arrival at the second location of r on h. If the first appearance of r during the construction of g corresponds to the arrival at the second location of r on plan h, then a jump back to the first location on h will take place. Then too state r will reappear during the construction of g, as there will be another arrival at the second location of r on h.
The result of Lemma 2 does not guarantee that, for a given t ∈ S and u ∈ safestep(t, α * ), an appropriate U ∈ U(α * ) and g ∈ admiss(t, U , α * ) exist that we consider ideal for punishment. However, the result is sufficient to prove that, for an arbitrary α * -plateau F, there is at least one t ∈ F, such that for every u ∈ safestep(t, α * ), an ideal pair U ∈ U(α * ) and g ∈ admiss(t, U , α * ) for punishment exists.
Let F be an α * -plateau and let t ∈ F.

Lemma 3 For every α * -plateau F, the set tied(F) is a non-empty subset of F.
Proof Let F be an α * -plateau and suppose that tied(F) = ∅. Then, for every s ∈ F, Let us apply Lemma 3 to an α * -level L. The fact that tied(L) is non-empty shows that there exist states in L, where a deviation can always be retaliated by an ideal punishment plan, that is, a punishment plan which avoids all states of L, until absorption or until a lottery is executed. Let us apply Lemma 3 again, now to the set L\tied(L) (assuming that this set is non-empty). The lemma then shows that there is a non-empty subset of states of L\tied(L), where any deviation can be retaliated by a plan that may visit other states of L before absorption or lottery, but only those in tied(L). So, after a deviation at a state in tied(L\tied(L)), another deviation before absorption or lottery may be possible, but after the second deviation, there will be an ideal punishment plan in place. This suggests that an α * -level L can be partitioned into a hierarchy of α * -plateaus, where each plateau is given a rank indicating the maximum number of deviations to go before an ideal punishment plan is in place.
Let L be an α * -level. We define Then, for k > 1, we define recursively We stop the recursive definitions when ∪ k 1 rank( , L) = L. It follows by repeated application of Lemma 3 that the process will indeed terminate, say at iteration K , and that the sets rank(k, L) for k = 1, . . . , K form a partition of L. We define the rank of a state t ∈ L as the unique index k for which t ∈ rank(k, L) and we denote its rank by r(t).
We now demonstrate, for a state t ∈ L and a deviation u ∈ safestep(t, α * ), a punishment plan exists that, insofar it visits other states of L before absorption or lottery, it only visits those of rank strictly less than r(t). The implication is that at state t, at most r(t) deviations are possible (including the one at t) before absorption or a lottery will take place.

Lemma 4 Let L be an
Then, by definition of the set tied(F) and by the fact that t is an element of this set, there exists an α * -safe

Description of the joint strategy "
We will associate with each pair (t, u) with t ∈ S and u ∈ A(t) a main plan g tu . In some cases, depending on the properties of the main plan g tu , we may associate additionally a threat pair (x tu , v tu ) with (t, u). These plans and combinations of a plan and a threat pair will be used in prescription for play as outlined in Sect. 4.1. Let t ∈ S and u ∈ A(t). Case 1: α * t ≥ 0. Then we choose g tu ∈ viable(u, α * ) with φ i t (g tu ) ≤ α * t . This is possible by Lemma 1. We do not choose a threat pair. Case 2: α * t < 0 and u / ∈ safestep(t, α * ). Then we choose g tu ∈ viable(u, α * ) with φ i t (g tu ) < α * t . This is possible by Lemma 1. We do not choose a threat pair. Case 3: α * t < 0 and u ∈ safestep(t, α * ). Then let L denote the α * -level to which t belongs and let If g tu is admissible due to condition AD-iv and not due to AD-i, AD-ii, or AD-iii then we choose additionally a threat pair (x tu , v tu ) for g tu . (The reader may note that the plan g tu starts at t in Case 3, and that it starts at u in Cases 1 and 2. This is inconsequential regarding its use as a prescription. In all three cases, if the prescription becomes current, the active state is already u when that happens.) In Table 1, we listed the choices of the main plan g tu and the threat pair (x tu , v tu ) for the game of Example 3, for every (t, u).
With the above choices, we are set to formulate a joint strategy in the way that was already outlined in 4.1, by providing a prescription for play at every stage of the game. Here, we fill in the details.
The prescription for the players, at any stage, is given in two possible forms. A type I prescription consists of a main plan g alone. A type II prescription consists of a main plan g together with a threat pair (x, v). If the prescription is of type I, then the players are supposed to follow the main plan g in its entirety. If the prescription is of type II, then the players are supposed to follow the main plan g until the first occurrence of x on g. The player who controls x is then required to perform a lottery, where it is decided whether plan g is continued or whether a switch to plan v is made. A type II prescription will only be current until the lottery. After the lottery, a prescription of type II automatically reduces to a prescription of type I. It reduces to the main plan g if the lottery player chose continuation of g, or to the threat plan v if the lottery player decided to make the switch to v.
A renewal of the prescription becomes necessary if one of the players chooses an action with zero probability according to the current prescription. If this should happen, the new prescription becomes the one associated with the pair (t, u), where t ∈ S is the state where the deviation took place, and u ∈ A(t) is the state that was nominated.
To complete our description of a joint strategy, it remains to provide the specifics of the lotteries that may have to take place. This is where the parameter ε plays a role. Let us determine an upper bound M on the absolute value of the expected reward to any player in the game G. We define To play according to the joint strategy π ε , a lottery player must always place probability 1 − q ε on continuation of the main plan and probability q ε on a switch to the threat plan.

Main result
Before we prove the claim that π ε is a subgame perfect ε-equilibrium for the game G, let us first establish a property of play when one player deviates while other players stick to π .
Lemma 5 Let p be an arbitrary path. Assume that play has developed along path p and that t = last( p) is the current state. Suppose that player i = i t chooses an action that has probability 0 according to the prescription dictated by π ε at state t, and suppose that each player j ∈ N \{i} is going to use strategy π j ε after p. Suppose further that player i becomes active again after his deviation at t, say at state s. (Player i may or may not be active between t and s.) Then α * s ≤ α * t . Moreover, if α * s = α * t < 0 and if no lottery took place during play from t to s, then r(s) < r(t).
Proof Say that player i violates the prescription at t with the action u ∈ A(t). After this, the new prescription of π ε is given by the main plan g tu , possibly together with the threat pair (x tu , v tu ).
Let us first assume that all players acted according to prescription from u to s. For the claim α * s ≤ α * t , we distinguish between two cases. Case 1: Assume that state s does not lie on plan g tu . This is only possible if the main plan g tu is associated with a threat pair (x tu , v tu ), if state x tu was active before s, and if the lottery player chose the switch to plan v tu . Thus, state s lies on plan v tu . We then have t / ∈ sat(v tu , α * ) by the properties of a threat plan (see AD-iv), and For the second claim of the lemma, assume that α * s = α * t < 0 and that no lottery took place during play from t to s. Then u ∈ safestep(t, α * ), as otherwise a plan g tu with φ i (g tu ) < α * t would have been chosen, and we would have and where L denotes the α * -level to which t belongs. Moreover, we have D(g tu , U ) = ∅ and φ i (g tu ) = α * t , by the choice of g tu . We see that g tu does not satisfy AD-i as we assume α * t < 0. We also see that g tu does not satisfy AD-ii, since φ i (g tu ) = α * t < 0 implies that g tu is absorbing. Thus, plan g tu satisfies AD-iii or AD-iv.
Notice that s and t must be different states. Indeed, if g tu satisfies AD-iii, this follows from the fact that all states of F on g tu are different. If g tu satisfies AD-iv, then state t is also different from s, by our assumption in the lemma that s comes before the lottery and the fact that, by AD-iv-(a), all states of F on g tu before x tu are different.
We now prove that s / ∈ F. Suppose to the contrary that s ∈ F, and hence that s ∈ (S(g tu )∩ F)\{t}. If g tu satisfies AD-iii, then it follows that s ∈ (S(g tu )∩ F)\{t} = D(g tu , U ). If g tu satisfies AD-iv, then it follows that s is a state in (S(g tu ) ∩ F)\{t} that comes before the lottery, hence that s ∈ D(g tu , U ). This contradicts that g tu has the property D(g tu , U ) = ∅. We Now assume that player i did not only deviate at state t, but that he deviated multiple times before s was reached. Then we apply the result for a single deviation multiple times, for each play between one deviation and the next. This then shows that the lemma also holds for multiple deviations.
Theorem 1 Joint strategy π ε is a subgame perfect ε-equilibrium for the game G.
Proof Let p be an arbitrary path and let i ∈ N . Assume that play has developed along path p, that all players j ∈ N \{i} are going to use strategy π j ε after p, and that i is the only player who does not necessarily play according to π ε . Let us denote the strategy of player i by σ i and the resulting joint strategy by σ . We will prove that the reward to player i is at most ε higher in expectation if, after p, play is according to σ , compared to i's reward if play is according to π ε : ψ Let us first provide a lower bound for the expected reward for i if play is according to the joint strategy π ε . For this, we denote by g p the main plan from the prescription of π ε , given to the players, when play has reached the last state of path p. If the prescription of π ε comes without a threat pair, then plan g p will be executed with probability 1, and the expected reward for player i equals φ i (g p ). If the prescription consists of the main plan g p together with a threat pair (x p , v p ), then the expected reward for player , as g p will be executed with probability 1−q ε and v p with probability q ε . In both cases, the number φ i (g p )−2q ε M is a lower bound for the expected reward for player i under joint strategy π ε , i.e., Recall that, by definition of q ε , we have 2q ε M ≤ 1 2 ε. Thus, it will be sufficient to prove that the expected reward for player i under joint strategy σ is bounded from We divide the proof in three cases, depending on the number of deviations by player i after p. For each case, we either bound the expected reward of the deviating player i from above by φ i (g p ) + 2q ε M, or we prove that the case has probability 0 of happening.
Case I: Player i does not deviate during play after history p under σ . We distinguish three subcases.
(a): Assume that the prescription is given by the main plan g p without a threat pair. Then the expected reward to player i is equal to φ i (g p ).
(b): Assume that the prescription is given by the main plan g p together with the threat pair (x p , v p ), and that player i is the controlling player of state x p . Then it is still true that either plan g p or plan v p will be executed, because we assume no deviations. By the properties of a threat pair, we have x p ∈ sat(g p , α * ) and Therefore, the expected reward for player i in this subcase is bounded from above by φ i (g p ).
(c): Assume that the prescription is given by the main plan g p together with the threat pair (x p , v p ), and that player i is not the controlling player of state x p . Then, like under strategy π ε , plan g p will be executed with probability 1 − q ε and plan v p will be executed with probability q ε . Here, player i may gain if v p is executed, but in expectation the gain will be small: We see that (2) holds in each of the cases (a), (b), and (c). Case II: Player i deviates at least once after history p under σ , but only finitely many times. Let us denote the state where player i makes his first deviation by t ∈ S and the state where i makes his last deviation by s ∈ S. Let us further denote the chosen action by player i at state s by u ∈ A(s). After the last deviation by player i, prescribed play according to π ε is given by the main plan g su , possibly together with the threat pair (x su , v su ). Since no further deviations will take place, either plan g su or v su will be executed in its entirety. We have φ i (g su ) ≤ α * s and φ i (v su ) < α * s by the choices for g su and v su . Therefore, the expected reward for player i is at most α * s . By Lemma 5, we have α * s ≤ α * t , so we can bound the expected reward to player i from above by α * t . Let h denote the main plan in the prescription of π ε just before player i wants to make his first deviation at t. (We have h = g p or possibly h = v p if a lottery takes place before player i makes his first deviation.) Then φ i (h) ≥ α * t , since t lies on h and since h is an α * -viable plan. Thus, the reward to player i will be at least as good if he just follows the prescription of the main plan with probability 1, which is a strategy where he does not deviate. We have already seen in Case I that his reward is then bounded from above by φ i (g p ) + 2q ε M.
Case III: Player i deviates infinitely many times after history p under σ . We will prove that this case occurs with probability 0.
Let t denote the state where player i first deviates. First assume that α * t ≥ 0. Infinitely many deviations by player i obviously implies infinite play along nonabsorbing states. Therefore, the reward to player i will be zero if this happens. Let h denote the main plan in the prescription of π ε just before player i wants to make his first deviation at t. We have φ i (h) ≥ α * t , since s lies on h and since h is α * -viable. Thus, the reward to player i will be at least as good if he just follows the prescription of the main plan with probability 1, which is a strategy where he does not deviate. We have seen in Case I that his reward is then bounded from above by φ i (g p ) + 2q ε M. Now assume that α t < 0. By assuming that player i deviates infinitely many times, it is implied that infinitely many times a state in S i becomes active. The α * -value of subsequent states in S i does not increase, by Lemma 5. Therefore, after a while, the α *value of the visited states in S i becomes a constant, say c. Then we have c < 0, since we assume α t < 0. Then, by Lemma 5, the rank of visited states in S i strictly decreases until a lottery is executed by a player j = i. Since the rank of a state can decrease only finitely many times, the execution of a lottery will happen infinitely many times. If at the lottery, where the prescription is given by say main plan h together with a threat pair (y, w), player i y chooses plan w, then player i will not be able to deviate again at a state with the constant α * -value c. Thus, at every lottery the outcome must be continuation of the main plan. The probability of this happening is 0.

A fixed point theorem
There is one thing left to do, which is to prove that a vector α ∈ R S with properties Fi and F-ii exists. For this, we introduce, in Sect. 5.1, a non-empty set ⊆ R S of semi-stable vectors, for which we prove that This implies that, for α ∈ and U ∈ U(α), the updated vector δ(U , α) is finite and satisfies δ(U , α) ≥ α (see Sect. 5.2). Now, if we could prove additionally that δ(U , α) ∈ , then it would be an easy corollary to establish the existence of a fixed point in . However, as was demonstrated in Kuipers et al. (2016) by means of an example, for certain vectors α ∈ and U ∈ U(α), we have δ(U , α) / ∈ . This motivates the definition of a set * ⊆ of stable vectors, in Sect. 5.3. The results derived in Sects. 5.1 and 5.2 for vectors of the set hold for vectors of the set * as well, as * is by definition a subset of . The main effort in Sects. 5.3, 5.4, and 5.5 will therefore go into proving that, for all α ∈ * and all U ∈ U(α), the vector δ(U , α) is an element of * . The fixed point theorem is subsequently established in Sect. 5.6.

Semi-stable vectors and their properties
In this subsection, we present a condition for α ∈ R S , which we call semi-stability. This condition guarantees the existence of a (t, U , α)-admissible plan for all U ∈ U(α) and all t ∈ F(U ), even with the additional property that, for every edge (x, y) of the plan, x is α-safe at state y. For α ∈ R S , let us therefore define the edge set In the following, our aim is to impose an appropriate set of properties on the subsets of S, and then deduce the existence of a plan g ∈ admiss(t, U , α) in G(α) for all U ∈ U(α) and all t ∈ F(U ). For α ∈ R S and X ⊆ S, we define We also define Now we provide some intuition for the set X (α). Consider U ∈ U(α) and t ∈ F(U ). Let us assume the existence of a plan in G(α) that is element of admiss(t, U , α). To consider the critical case, let us assume that every such plan satisfies AD-iv, but not AD-i, AD-ii, or AD-iii. Choose such a plan, say g. Now, the crucial role is played by the states that are visited on g from the start at t to the point where a state in F(U ) is visited for the second time. Let X denote the set of these states. Notice that there is indeed a second occurrence of a state in F(U ), since g does not satisfy AD-iii, so X is well-defined. It is not difficult to argue from the assumptions that X ∈ X (α). So, the set X (α) contains all sets of states where AD-iv is crucial.
Let X ⊆ S and let e = (x, y) be an edge. We say that e is an α-exit from X if x ∈ X , y ∈ (S ∪ S * )\X , and if, for all v ∈ viable(y, α), The implication in (3) is trivially satisfied by the edge (x, y) if there exists z ∈ esc(X , α) with i z = i x and α z ≥ α x (and in particular if x ∈ esc(X , α)). To filter out such edges, we say that e = (x, y) is a legitimate α-exit from X if it is an α-exit from X and if α z < α x for all z ∈ esc(X , α) with i z = i x . Legitimate α-exits from X with X ∈ X (α) are used to derive the existence of a threat pair for admissible plans that happen to visit all states of X . To see how, let (x, y) be a legitimate α-exit from X . The fact that (x, y) is legitimate ensures that y / ∈ safestep(x, α), so it ensures the existence of v ∈ viable(y, α) with x / ∈ sat(v, α). From the fact that (x, y) is an α-exit from X , we conclude that t ∈ esc(X , α) exists with t / ∈ sat(v, α). We see that (x, v)is a candidate to serve as a threat pair for an admissible plan g that starts at t and visits all states of X .
We say that α ∈ R S is semi-stable if the following two conditions hold: (i) for all t ∈ S, there exists a plan in G(α) that is element of viable(t, α), (ii) for all X ∈ X (α), there exists a legitimate α-exit from X .
We denote the set of semi-stable vectors in R S by . In Lemma 12 we prove that, for all α ∈ , for all U ∈ U(α), and for all t ∈ F(U ), a plan in G(α) exists that is element of admiss(t, U , α). In Lemma 17 we prove that, for all α ∈ and for all U ∈ U(α), the vector δ(U , α) satisfies condition (i) of semi-stability.
Example 4 Let us return to Example 3 for an illustration of the definitions. For this example, we have C = {{1 a , 1 b , 2}, {1 a , 1 b , 2, 3}}. Therefore, for any α ∈ R S , the collection X (α) contains at most two subsets of S. For each stage of the update procedure, we calculated the set X (α) and the α-exits, we check that conditions (i) and (ii) are satisfied, and we verify that Lemma 17 holds. Initialization: We started the update procedure with α 1 a = −2, α 2 = 1, α 1 b = −1, α 3 = −1.
Proof Notice that, for all t ∈ S, the plan (t, t * ) is a ρ-viable plan in G(ρ). This shows immediately that the vector ρ satisfies condition (i) of semi-stability.
We now prove that X (ρ) = ∅. Notice that by proving this claim of the lemma, we also prove that condition (ii) of semi-stability holds trivially true for ρ, hence it finishes the proof. Let where the equality is by the fact that esc(X , ρ) = X and non-emptiness is by the fact that X ∈ P(ρ). This shows that Proof Proof of (i): For t * ∈ S * , we have safestep(t * , α) = {t * } by convention, and for t ∈ S, the set safestep(t, α) is non-empty due to condition (i) of semi-stability.
Proof of (ii): Let X ⊆ S and assume that esc(X , α) = ∅. Let t ∈ X . We will demonstrate that α t ≤ 0.
Choose a plan g in G(α) that is element of viable(t, α), which is possible by condition (i) of semi-stability. Notice that there is no path in G(α) from t to an element of S * , as the existence of such a path would also imply the existence of an edge (x, y) in G(α) with x ∈ X and y ∈ (S ∪ S * )\X , contradicting that esc(X , α) = ∅. Thus, plan g is a non-absorbing plan. As g is also α-viable, it follows that α t ≤ φ i t (g) = 0.
For any subgraph H of G and a subset X of the vertex set V (H) of H, we say that X is an ergodic set of H if (i) for all x, y ∈ X , there exists a path p in H from x to y that has positive length and that lies entirely in X , (ii) for all x ∈ X and y ∈ V (H)\X , there is no path in H from x to y.
The following lemma is an easy result in graph theory. It is stated without proof.

Lemma 8 Let H = (V (H), E(H)) be a directed graph, such that for every vertex
there is a path from x to an element of an ergodic set of H.

Lemma 9 Let α ∈ . Then
(i) for all t ∈ S, there exists a path in G(α) from t to an element in an ergodic set of G(α), (ii) for all t * ∈ S * , the set {t * } is an ergodic set of the graph G(α), (iii) an ergodic set of G(α) is either a singleton from the set S * or a subset of S.
Proof (i): The graph G(α) satisfies the condition that every vertex has an outgoing edge by Lemma 7-(i). Therefore, Lemma 8 applies.
(ii): The edge (t * , t * ) is a path of positive length in G(α) from t * to t * , and there is no other outgoing edge from t * in G(α). Thus, the set {t * } satisfies the conditions for an ergodic set.
(iii): This result follows from (ii) and the general fact that different ergodic sets in the same graph are always disjoint.
Lemma 10 Let α ∈ , let p be a path in G(α), and let g be an α-viable plan such that first(g) = last( p). Then the plan p, g is α-viable.

Lemma 11 Let α ∈ . Then for all t ∈ S and for all U ∈ U(α), there exists a plan in G(α) that is element of viacomp(t, U , α).
Proof Let t ∈ S and let U ∈ U(α). Consider the set κ of U -compatible paths in G(α) with first( p) = t. The set κ is non-empty as it contains the path (t) of length 0.
Among the paths in κ, we choose one, say p, for which the cardinality of the set F(U ) ∩ S( p) is maximal. We denote last( p) by s. We also choose r with r ∈ safestep(s, α), which is possible by Lemma 7-(ii). If s ∈ F(U ), then we further specify our choice of r and we choose r = U (s). Note that the choice of r makes the edge (s, r ) a U -compatible path. We complete our choices with a plan g in G(α) that is element of viable(r , α), which is possible, since α satisfies condition (i) of semi-stability.
We will now prove the lemma by showing that the plan h = p, (s, r ), g is a plan in G(α) and an element of viacomp(t, U , α). Plan h is obviously a plan in G(α) and it is an element of viable(t, α) by Lemma 10. To see that h = p, (s, r ), g is U -compatible, we prove that F(U ) ∩ S(g) ⊆ S( p). Suppose that this is not true. Then we choose u ∈ F(U ) that is on plan g and not on path p. If there is more than one candidate, we choose u as close to the beginning of g as possible. Let g denote the path that follows g from start until the first occurrence of u.
We claim that the path p = p, (s, r ), g is U -compatible. To prove the claim, consider a first occurrence of x ∈ F(U ) on p . If x = u, then we found the only occurrence of x on p and it is at the end of p . We then have no condition to check for U -compatibility. If x ∈ F(U )\{u}, then the first occurrence of x on p is on path p, and not on g , by the choice of u. If the first occurrence of x is on p, not at the end, then x is followed by U (x), because p is a U -compatible path. If the first occurrence of x is at the end of path p, then x = s and x is followed by r = U (s) = U (x). Thus, p is indeed U -compatible.
We proved that p ∈ κ. We obviously have F(U ) ∩ S( p ) ⊇ F(U ) ∩ S( p). The inclusion is strict, because we have u ∈ F(U ) ∩ S( p ) and u / ∈ F(U ) ∩ S( p). This contradicts the choice of p as an element of κ that maximizes the cardinality of F(U ) ∩ S( p). Thus, F(U ) ∩ S(g) ⊆ S( p) as claimed.
We now see that, for any u ∈ F(U ) appearing on h = p, (s, r ), g , its first occurrence lies on path p. Then we also see that the first occurrence of u on h is followed by U (u), as required for a U -compatible plan.
The main result of this section concerns the existence of admissible plans.

Lemma 12 Let α ∈ . Then for all U ∈ U(α) and for all t ∈ F(U ), there exists a plan in G(α) that is element of admiss(t, U , α).
Proof Let U ∈ U(α) and let t ∈ F(U ). By Lemma 11, there exists a plan in G(α) that is element of viacomp(t, U , α). If among these plans one satisfies AD-i, AD-ii or AD-iii, we are done. Assume further that no such plan exists. We will demonstrate that a plan satisfying AD-iv exists.
Let us say that a U -compatible path p is strongly U -compatible if each element of F(U ) appears at most once on p. Now define The proof relies on the fact that X ∈ X (α). This and more will be shown in the following.
I: Proof that t ∈ X and X ⊆ S: We have t ∈ X , since the path (t) of length 0 is a strongly U -compatible path from t to t.
To see that X ⊆ S, suppose to the contrary that s * ∈ X ∩ S * exists. Then let p be a strongly U -compatible path in G(α) from t to s * . The plan g = (s * , s * , . . .) is trivially in G(α) and is trivially α-viable. Now, the plan p, g is also trivially in G(α), it is obviously U -compatible, and it is α-viable by Lemma 10. Thus, p, g ∈ viacomp(t, U , α). Now notice that plan p, g satisfies condition AD-iii, by the fact that p is a strongly U -compatible path and the fact that g contains no elements of F(U ). This contradicts our assumption that there is no (t, U , α)-admissible plan in G(α) that satisfies AD-iii.

II: Proof that esc(X , α) ⊆ F(U ):
The set esc(X , α) is indeed defined, by the result of I. Suppose x ∈ esc(X , α)\F(U ). We will prove a contradiction by showing that y ∈ X for all y ∈ safestep(x, α). Let y ∈ safestep(x, α). Choose a strongly Ucompatible path p in G(α) from t to x. If y appears on path p, then obviously the part of p that goes from t to y is a strongly U -compatible path in G(α) from t to y. Then it follows immediately that y ∈ X . Assume further that y does not appear on p. It suffices to prove that the path q = p, (x, y) is a strongly U -compatible path in G(α) from t to y. The path q is a path in G(α), since p is a path in G(α) and the edge (x, y) is also an edge of G(α). The path q is U -compatible, by the fact that p is U -compatible and by the fact that the additional edge (x, y) in q does not originate from a state in F(U ). Further, the occurrence of y at the end of q cannot be the second occurrence of a state in F(U ), since we assume that y does not appear on p. Thus, each state of F(U ) appears at most once on q. So indeed, q is a strongly U -compatible path in G(α) from t to y, and hence y ∈ X . Contradiction.

III: Proof that U
We have U (x) ∈ safestep(x, α) by definition of an α-safe combination, so it remains to prove that U (x) ∈ X . Choose a strongly U -compatible path p in G(α) from t to x. If U (x) appears on path p, then obviously the part of p that goes from t to U (x) is a strongly U -compatible path in G(α) from t to U (x). Then it follows immediately that U (x) ∈ safestep(x, α) ∩ X . Assume further that U (x) does not appear on p. It suffices to prove that the path q = p, (x, U (x)) is a strongly U -compatible path in G(α) from t to U (x). The path q is a path in G(α), since p is a path in G(α) and the edge (x, U (x)) is also an edge of G(α). The path q is U -compatible, by the fact that p is U -compatible and by the fact that the (unique) occurrence of x ∈ F(U ) is followed by U (x) on q. Further, the occurrence of U (x) at the end of q cannot be the second occurrence of U (x), since we assume that U (x) does not appear on p. Thus, each state of F(U ) appears at most once on q. Therefore, q is a strongly U -compatible path in G(α) from t to U (x). This demonstrates that U (x) ∈ X .
IV: Proof that safestep(x, α) ∩ X = ∅ for all x ∈ X: Let x ∈ X . If x ∈ F(U ) ∩ X , then the fact that safestep(x, α)∩ X is non-empty follows by the result of III. Assume further that x ∈ X \F(U ). Then by the result of II, we have x ∈ X \esc(X , α). Hence, where non-emptiness is by Lemma 7-(i).
Here, the inclusion is trivial and non-emptiness is by the result of IV.
VI: Proof that X ∈ P(α): Suppose to the contrary that pos(X , α) = ∅. By the result of III, we have for all x ∈ X , an element y ∈ X such that (x, y) is an edge of G(α). Moreover, for every x ∈ F(U ) ∩ X we may choose y = U (x). Thus, it is possible to construct a non-absorbing U -compatible plan g with first(g) = t, with S(g) ⊆ X , and such that every edge of g is in the edge set of G(α). Notice that g ∈ viable(t, α), by the assumption that pos(X , α) = ∅, and by the fact that a non-absorbing plan gives reward 0 to all players. Then g is a plan in G(α) that is element of admiss(t, U , α) due to condition AD-ii. This contradicts our earlier assumption that no such plan exists.
VIII: Proof that, for all x ∈ X with i x = i t , we have α x ≤ α t : Suppose to the contrary that x ∈ X with i x = i t and α x > α t exists. Then we choose a strongly U -compatible path in G(α) from t to x, which is possible because x ∈ X . We also choose a plan g in G(α) that is element of viacomp(x, U , α), which is possible by I and Lemma 11. Now, the plan h = p, g is obviously U -compatible, and it is α-viable by Lemma 10. Thus, h ∈ viacomp(t, U , α). Observe that h satisfies condition AD-i of admissibility, due to the fact that x with i x = i t and α x > α t appears on plan h before an element of F(U ) has appeared for the second time. Thus, h ∈ admiss(t, U , α) due to AD-i. Also, by construction, h is a plan in G(α). This contradicts our assumption that there exists no plan in G(α) that is (t, U , α)-admissible due to AD-i, AD-ii or AD-iii.

IX: Construction of a plan in G(α) that is element of admiss(t, U , α):
We have X ∈ X (α) by V, VI and VII. Then, by condition (ii) of semi-stability, a legitimate α-exit from X exists. Choose one and denote it by (x, y).
Since x ∈ X , there exists a strongly U -compatible path p in G(α) from t to x. Also, there exists a plan g in G(α) that is element of viacomp(x, U , α), by Lemma 11. We claim that h = p, g is a plan in G(α) that is element of admiss(t, U , α).
Plan h = p, g is obviously in G(α). The plan is U -compatible, because both p and g are U -compatible. The plan is α-viable by Lemma 10. Thus, h ∈ viacomp(t, U , α). It now suffices to demonstrate that h satisfies AD-iv.
We have x ∈ X \esc(X , α) because the α-exit (x, y) is legitimate. Therefore y / ∈ safestep(x, α), and hence we can choose an α-viable plan v with first(v) = y and with x / ∈ sat(v, α). We will prove that (x, v) is a threat pair for h. (a): The location of the first occurrence of x on h is such that each element of F(U ) appears at most once on h before the first occurrence of x. Indeed, this follows from the fact that x lies on the strongly U -compatible path p.
(b): Proof that i x = i t : We have X ⊆ S by I and pos(X , α) = ∅ by VI. Then esc(X , α) = ∅ by Lemma 7-(ii), so we can choose s ∈ esc(X , α). It follows by II that s ∈ F(U ), so we have s ∈ F(U ) ∩ esc(X , α) and i s = i t . Now suppose that i x = i t . We then have α x ≤ α t by the result of VIII. Since s ∈ esc(X , α) and since i x = i t = i s , it follows by the definition of a legitimate αexit that α x > α s . Thus, α s < α x ≤ α t . This contradicts that α s = α t , as s, t ∈ F(U ).
(c): Plan v is obviously an α-viable plan with first(v) ∈ A(x).
(d): The state following x on plan h, say z, is not equal to state first(v) = y. Indeed, we have z ∈ safestep(x, α) because h is a plan in G(α), and we have y / ∈ safestep(x, α). (e): It remains to prove that t / ∈ sat(v, α). Suppose to the contrary that t ∈ sat(v, α). Then F(U ) ⊆ sat(v, α), since α s = α t for all s ∈ F(U ). It subsequently follows that esc(X , α) ⊆ sat(v, α), since esc(X , α) ⊆ F(U ), by the result of II. By definition of an α-exit, we have, so we conclude that x ∈ sat(v, α). This contradicts our choice of v such that x / ∈ sat(v, α).

Properties of an updated semi-stable vector
Let us continue with some fairly immediate consequences of Lemma 12.
Case 2: Assume that F(U ) ∩ S(g) = ∅, say s ∈ F(U ) is on plan g. Then it follows from the δ-viability of g that s ∈ sat(g, δ). It subsequently follows that t ∈ sat(g, δ) from the fact that δ t = δ s .
Case 2: Assume that F(U ) ∩ S(g) = ∅, say that s ∈ F(U ) is on plan g. Then it follows from the α-viability of g that s ∈ sat (g, α). It subsequently follows that t ∈ sat(g, α) from the fact that α t = α s .
Proof of (iv): This follows immediately from (ii).
Proof Let g ∈ admiss(t, U , α). By definition of the vector δ, we have δ t ≤ φ i t (g). For every state x ∈ F(U ) that appears on g, we then have δ where the inequality follows, since g ∈ admiss(t, U , α) ⊆ viable(t, α). Thus, plan g is indeed δ-viable.
Lemma 16 Let α ∈ , let U ∈ U(α), and let t ∈ F(U ). If g is an absorbing plan in G(α) that is element of admiss(t, U , α), then g is a plan in G(δ), where δ denotes the vector δ(U , α).
Proof Let g be an absorbing plan in G(α) that is element of admiss(t, U , α) and let (x, y) be an edge of g. We will prove that (x, y) is an edge of G(δ). We distinguish four different cases.
Case 1: x ∈ S * . In this case, we have y = x and (x, y) = (x, x) is trivially an edge of G(δ).
Let p denote the path that follows g from start t to the occurrence of x on g that corresponds with the edge (x, y). The path p, (x, y) is a part of g and is therefore a path in G(α). The plan v is α-viable by Lemma 14-(i). Therefore, the plan h = p, (x, y), v is α-viable by Lemma 10. Notice that the occurrence of x on g that corresponds with the edge (x, y) is not the first occurrence of x on g, as the first occurrence of x is followed by U (x) because g ∈ admiss(t, U , α). The path p contains the occurrence of x that corresponds with the edge (x, y), and hence p has at least two different occurrences of x.
We further distinguish between two subcases. (4a): Assume that plan v is non-absorbing. Then plan h = p, (x, y), v is also non-absorbing. Since we demonstrated that h is α-viable, it follows that α z ≤ 0 for all states z that lie on the path p. Now, path p contains a path of positive length from x to x, as it contains the first two occurrences of x on g. Denote this path by q and observe that the plan ( p, q, q, . . .) is a non-absorbing U -compatible and α-viable plan. Therefore, ( p, q, q, . . .) is an element of admiss(t, U , α) by AD-ii. It follows that δ x = δ t ≤ 0. It also follows that x ∈ sat(v, δ).
(4b): Assume that plan v is absorbing. If one of the elements of F(U ) is located on v, say s, then it is obvious that s ∈ sat(v, δ), hence that x ∈ sat(v, δ). We may therefore assume additionally that there are no elements of F(U ) located on v. We claim that h = p, (x, y), v is an element of admiss(t, U , α).
The path p, (x, y) is part of g and is therefore a U -compatible path. The plan v does not contain any states of F(U ), hence the plan h = p, (x, y), v is U -compatible. Since h is also α-viable, we conclude that h ∈ viacomp(t, U , α).
If plan g is (t, U , α)-admissible due to AD-i, then α t > 0 or there exists a state z on g with i z = i t and α z > α t , located on g before any state of F(U ) appears for the second time on g. In the former case, h is (t, U , α)-admissible due to AD-i. In the latter case, notice that state z must be located on p, since there are at least two occurrences of x ∈ F(U ) located on p. We then see that plan h is also (t, U , α)-admissible due to AD-i, since h has path p as its initial part.
Plan g is not (t, U , α)-admissible due to AD-ii, because g is absorbing. Also, plan g is not (t, U , α)-admissible due to AD-iii, because x ∈ F(U ) appears more than once on g.
If plan g is (t, U , α)-admissible due to AD-iv, then there exists a threat pair for g, say (z, w), such that state z appears on g before any state of F(U ) appears for the second time on g. State z must then be located on p, since there are at least two occurrences of x ∈ F(U ) located on p. We now see that plan h is also (t, U , α)-admissible due to AD-iv, since plan h has path p as its initial part.
Thus, we have indeed h ∈ admiss(t, U , α). It follows that δ We conclude this section with a proof that the vector δ(U , α) satisfies condition (i) of semi-stability, for all α ∈ and all U ∈ U(α).

Lemma 17 Let α ∈ and let U ∈ U(α). Then for all t ∈ S ∪ S * , there exists a plan in G(δ) that is element of viable(t, δ), where δ = δ(U , α).
Proof Let t ∈ S ∪ S * . We distinguish between the cases t ∈ F(U ) and t / ∈ F(U ). Case 1: Assume that t ∈ F(U ). Then we can choose a plan g in G(α) that is element of admiss(t, U , α), by Lemma 12. We distinguish three subcases.
(1a): Plan g is absorbing. Then g is the required plan, as it is in G(δ) by Lemma 16, and it is an element of viable(t, δ) by Lemma 15.
(1b): Plan g is a non-absorbing plan in G(δ). Then g is the required plan in G(δ) that is element of viable(t, δ), by Lemma 15.
(1c): Plan g is non-absorbing and not every edge of g is an edge of G(δ). Then let (x, y) denote the first edge of g that is not of G(δ).
We have y ∈ safestep(x, α), as (x, y) is an edge of G(α) and we have y / ∈ safestep(x, δ), as (x, y) is not an edge of G(δ). It follows that x ∈ F(U ) by Lemma 14-(ii). It also follows that y = U (x) by Lemma 14-(iii). The fact that x ∈ F(U ) and y = U (x) implies that x appears more that once on g and that the occurrence of x associated with the edge (x, y) is not the first occurrence, because g is a U -compatible plan. Now, let p denote the path with first( p) = t and last( p) = x that follows g from start to the occurrence of x on g that corresponds with the edge (x, y) of g. Let further q denote the path with first(q) = last(q) = x that follows g from the first occurrence of x until the second occurrence. Then the plan g = p, q, q, . . .) is a non-absorbing plan in G(δ). Plan g is also an element of viable(t, δ), since all states on the plan are states that also lie on the non-absorbing plan g. Thus, g is the required plan. Case 2: Assume that t / ∈ F(U ). Then we choose a plan g in G(α) that is element of viable(t, α), which is possible by Lemma 11. If plan g does not contain any elements of F(U ), then every edge of g is an edge of G(δ) by Lemma 14-(ii). In that case, g is the required plan. We assume further that plan g has at least one state of F(U ). Then let x ∈ F(U ) denote the first such state on g. Let p denote the path with first( p) = t and last( p) = x that follows g from start to the first occurrence of x. Notice that p is a path in G(δ) by Lemma 14-(ii). We now choose a plan h in G(δ) that is element of viable(x, δ), which is possible by the proof of Case 1. Then the plan g = p, h is a plan in G(δ). The plan g is also an element of viable(t, δ), which follows by Lemma 10 (applied to δ ∈ R S ). Thus, g is the required plan.
We showed that, for all α ∈ and all U ∈ U(α), the vector δ(U , α) is finite and satisfies δ(U , α) ≥ α (see Lemma 13). The vector δ(U , α) also satisfies condition (i) of semi-stability (see Lemma 17). If we could now prove that δ(U , α) satisfies condition (ii) of semi-stability as well (and hence δ(U , α) ∈ ), then it would be an easy corollary to establish the existence of a fixed point in . However, for certain vectors α ∈ we have δ(U , α) / ∈ , as was demonstrated in Kuipers et al. (2016) by means of an example. A similar example is given below in Fig. 9.
Example 5 For the game depicted in Fig. 9, one can verify that the vector α defined by is an element of , and that (3 a , 3 b ) is a legitimate α-exit from {1, 2, 3 a } ∈ X (α). Now, an update of state 3 b increases the value of α 3 b from −2 to −1. One can check that, for the updated vector α, we still have {1, 2, 3 a } ∈ X (α), but there is no longer a legitimate α-exit from {1, 2, 3 a }. Thus, the updated α is not in . (If we now continue with an update of state 3 a , then the value of α 3 a increases from −1 to 0. Finally, an update of state 2 increases the value of α 2 from −1 to ∞, since we then have admiss(2, U , α) = ∅, where U is defined by U (2) = 3 a .)

Stable vectors and exit sequences
The findings of Example 5 are the motivation for the definition of a set * of stable vectors. The set * is defined by replacing condition (ii) of semi-stability by a stronger condition and by keeping condition (i) the same. The strengthened condition (ii) requires the existence of a certain sequence of edges for every X ∈ X (α), which contains a legitimate α-exit from X , but also contains an edge (x, y) with α x > 0. The set * is thus by definition a subset of . Therefore, all results derived in Sects. 5.1 and 5.2 for vectors of the set hold for vectors of the set * as well. The main effort in this section will therefore go into proving that, for all α ∈ * and all U ∈ U(α), the vector δ(U , α) satisfies condition (ii) of stability, and that hence δ(U , α) ∈ * .
Let α ∈ , X ⊆ S and Z ⊆ X . We say that an edge (x, y) is an (α, Z )-exit from X if x ∈ X and y ∈ (S ∪ S * )\X , and if, for all v ∈ viable(y, α), Note that an (α, ∅)-exit from X is simply an α-exit from X . We say that a sequence of edges e = (x j , y j ) k j=1 is an α-exit sequence from X if, for all j ∈ {1, . . . , k}, the edge (x j , y j ) is an (α, {x 1 , . . . , x j−1 })-exit from X . For technical reasons, we allow k = 0, i.e., the empty sequence will also be called an α-exit sequence from X .
We say that an edge (x, y) = (x j , y j ) in the sequence e is legitimate if α x > α z for every z ∈ {x 1 , . . . , x j−1 } ∪ esc(X , α) with i z = i x . We say that the sequence e is legitimate if it is non-empty and if at least one of its edges is legitimate. We say that the edge (x, y) is positive if x ∈ pos(X , α). We say that the sequence e is positive if it is non-empty and if at least one of its edges is positive.
We now say that a vector α ∈ R S is stable if (i) for all t ∈ S, there exists a plan in G(α) that is element of viable(t, α), (ii) for all X ∈ X (α), there exists a positive α-exit sequence from X .
We denote the set of stable vectors in R S by * . The motivation for these definitions can be explained by observing what goes 'wrong' in Example 5. In that example, we initially have a legitimate α-exit (x, y) from a set X ∈ X (α). After the update, which is an update of a state not in X , the set X is still in X (α), but (x, y) is no longer a legitimate α-exit from X , since now x is α-safe at y. No other α-exit comes in its place, so the updated vector is not in . The issue is solved when we have a positive α-exit sequence from X ∈ X (α). Then it is easy to prove that the sequence contains a legitimate α-exit from X (see Lemma 18). After an update of a state not in X , essentially two things can happen. If, after the update, x is α-safe at y, where (x, y) is a positive edge in the sequence, then X is no longer an element of X (α), because we then have x ∈ esc(X , α) ∩ pos(X , α). Otherwise, if x is not α-safe at y after the update, then it is easy to prove that a positive α-exit sequence from X remains, where (x, y) will be a positive edge in the sequence.
The following lemma states some basic facts about exit sequences.

Lemma 18
Let α ∈ and let X ⊆ S.
(i) If e is a non-empty α-exit sequence from X , then its first edge is an α-exit from X. (ii) If e is an α-exit sequence from X , and if (x, y) is an edge of e that is not legitimate, then the sequence e obtained from e by deleting the edge (x, y) is also an α-exit sequence from X . Moreover, every edge that is legitimate in e is also legitimate in e. (iii) If a legitimate α-exit sequence from X exists, then a legitimate α-exit from X exists. (iv) If e and f are both α-exit sequences from X , then the concatenation of these two sequences, denoted by (e, f), is also an α-exit sequence from X . (v) If e is a positive α-exit sequence from X and if X ∈ X (α), then the first positive edge of e is a legitimate edge.
Proof Proof of (i): If e is a non-empty α-exit sequence from X , then its first edge is by definition an (α, ∅)-exit from X . That is also the definition of an α-exit from X .
Proof of (ii): Let e = (x j , y j ) k j=1 be an α-exit sequence from X . Suppose that the Now let e denote the sequence obtained by deleting the edge (x, y) from e. To see that e is an α-exit sequence from X , we need to check that (x j , y j ) is an (α, {x | < j, = h})-exit from X , for all j ∈ {1, . . . , k} with j = h.
In this case x j ∈ sat(g, α) follows immediately by the fact that (x j , y j ) is an α). Assumption (4) therefore implies that z ∈ sat(g, α). Then also x h = x ∈ sat(g, α), because α x ≤ α z and i x = i z . This result combined with assumption (4) implies α). Now x j ∈ sat(g, α) follows by the fact that (x j , y j ) is an (α, {x 1 , . . . , x j−1 })-exit from X .
We proved that e is an α-exit sequence from X . Now assume that the edge (x , y ) is a legitimate edge of the sequence e. We then have α x > α z for all z ∈ {x 1 , . . . , x −1 }∪ esc(X , α) with i z = i x . Obviously, = h, so the edge (x , y ) is also an edge of the sequence e. Also obviously, α x > α z for all z ∈ ({x 1 , . . . , x −1 }\{x h }) ∪ esc(X , α) with i z = i x . Then (x , y ) is a legitimate edge of the sequence e.
Proof of (iii): Let e = (x j , y j ) k j=1 be a legitimate α-exit sequence from X . Let h be the smallest index such that the edge (x h , y h ) is legitimate. Denote by e the edgesequence obtained from e by deleting all edges (x j , y j ) from e with j < h. Then the edge-sequence e is an α-exit sequence from X , by (ii). The first edge of e (i.e., (x h , y h )) is an α-exit from X , by (i). Then obviously, it is a legitimate α-exit from X .
Proof of (v): Let (x j , y j ) k j=1 be a positive α-exit sequence from X with X ∈ X (α) and let h denote the smallest index in {1, . . . , k} with the property x h ∈ pos(X , α). We need to prove that (x h , y h ) is a legitimate edge, i.e., we need to show that α Proof of (vi): Let e = (x j , y j ) k j=1 be an α-exit sequence from X . Suppose (x h , y h ) and (x , y ) with h < are two edges with i x h = i x and with α x h ≥ α x . Then the edge (x , y ) violates the conditions for a legitimate edge.

Lemma 19
We have ρ ∈ * ⊆ , where ρ is the vector defined by ρ t = r i t (t) for all t ∈ S.
Proof First we prove that ρ ∈ * . The vector ρ is semi-stable by Lemma 6; hence, it satisfies condition (i) of semi-stability. Then the vector ρ satisfies condition (i) of stability, as this is the same condition.
We have X (ρ) = ∅ by Lemma 6. Thus, the vector ρ trivially satisfies condition (ii) of stability. It follows that ρ ∈ * . Now we prove that * ⊆ . Let α ∈ * . The vector α trivially satisfies condition (i) of semi-stability. To see that α satisfies condition (ii) of semi-stability, let X ∈ X (α). Then a positive α-exit sequence from X exists, as α satisfies condition (ii) of stability. The sequence is a legitimate α-exit sequence from X , by Lemma 18-(v). Then a legitimate α-exit from X exists, by Lemma 18-(iii). We see that α indeed satisfies condition (ii) of semi-stability. Thus α ∈ .
Until further notice, we fix α ∈ * and U ∈ U(α). We further denote the vector δ(U , α) by δ. The vector δ then satisfies condition (i) of stability by Lemma 17, as α ∈ * ⊆ . The work needed to demonstrate that δ also satisfies condition (ii) of stability, and that hence δ ∈ * , will be divided over two subsections as follows. We partition the set X (δ) into two subsets: Note that the set V(U , δ) contains the sets X ∈ X (δ) for which the updated states are all outside of X , i.e., F(U ) ∩ X = ∅. Recall that this is the situation for which Example 5 demonstrated that the mere existence of a legitimate α-exit does not guarantee the existence of a legitimate δ-exit and which motivated us to consider the concept of a positive α-exit sequence. The intuition which we developed for that situation translates into a relatively easy proof that a positive δ-exit sequence from X exists, so that condition (ii) of stability indeed holds for these sets X : For every set X ∈ V(U , δ), we will show that X ∈ X (α) (Lemma 20) and that every α-exit sequence from X is also a δ-exit sequence from X (Lemma 21). Then, a positive δ-exit sequence from X exists, since we know that a positive α-exit sequence from X exists (Lemma 22). This is all handled in Sect. 5.4. We are then left to prove that a positive δ-exit sequence from X exists for the sets X ∈ W(U , δ). This turns out to be the difficult case. It is handled in Sect. 5.5.
In Sect. 5.4, we prove that, for all X ∈ V(U , δ), there exists a positive δ-exit sequence from X . In Sect. 5.5, we deal with the set W(U , δ). The two results together then imply that δ satisfies condition (ii) of stability.
Our approach for demonstrating that every X in the set V(U , δ) has a positive δ-exit sequence from X is straightforward. We first prove that every member of V(U , δ) is also present in X (α). We then prove that, for all X ∈ V(U , δ), every α-exit sequence from X is a δ-exit sequence from X .
Lemma 21 Let X ∈ V(U , δ). Then every α-exit sequence from X is a δ-exit sequence from X .
Proof Let e = (x j , y j ) k j=1 be an α-exit sequence from X . Let j ∈ {1, . . . , k}, let g ∈ viable(y j , δ) and assume that We need to prove that x j ∈ sat(g, δ). We distinguish between two cases. Case 1: x j ∈ F(U ). Then apparently F(U ) ∩ X = ∅, so we must have F(U ) ∩ esc(X , δ) = ∅, by the fact that X ∈ V(U , δ). The fact that a representative of F(U ), say t, exists in esc(X , δ) implies that t ∈ sat(g, δ), by Assumption (5). Then obviously also x j ∈ sat(g, δ).

Lemma 22 Let X ∈ V(U , δ). Then there exists a positive δ-exit sequence from X .
Proof We have X ∈ X (α) by Lemma 20. Therefore, a positive α-exit sequence from X exists. The sequence is a δ-exit sequence from X by Lemma 21. The sequence has an edge (x, y) with x ∈ pos(X , α), because it is a positive α-exit sequence. Then x ∈ pos(X , δ), as δ ≥ α. The sequence is thus a positive δ-exit sequence from X .

5.5
The existence of ı-exit sequences: the difficult case

Introduction
Recall that α ∈ * , U ∈ U(α) and δ = δ(U , α) are still fixed. We still need to prove that a positive δ-exit sequence from X exists for every X in the set W(U , δ).
In this subsection, we will sometimes make use of additional notation. If e = (x j , y j ) k j=1 is a sequence of edges, we will use the notation x j (e) = x j and y j (e) = y j for j ∈ {1, . . . , k}. We will also use the notation x(e) = {x 1 , . . . , x k }, y(e) = {y 1 , . . . , y k }, and k(e) = k. We will further use the notation ø for the empty sequence.
Let us give a quick overview of the approach we will take. We first define, for any X ⊆ S, a certain type of α-exit sequences from X , called α-exit sequences from X disregarding U , that do not involve the members of F(U ) in any way. We order such sequences e by the cardinality of the set x(e), and we are interested in the sequences that are maximal in this sense.
It easily follows from the definition that every α-exit sequence from X disregarding U is a δ-exit sequence from X (see Lemma 24). Therefore, if an α-exit sequence from X disregarding U exists that is positive, then we are done. Otherwise, a maximal sequence e * can serve as the initial part of a positive δ-exit sequence from X . The crucial result regarding the sequence e * , when non-positive, can be found in Lemma 32. It will imply that, for certain t ∈ F(U ) ∩ X and for every v ∈ viable(s, δ) with s ∈ A(t), we have The result hints at the fact that an edge of the form (t, s) with t ∈ F(U ) can be placed directly after the sequence e * to make a δ-exit sequence from X . The details of how to extend the sequence e * with an appropriate edge (t, s) are handled in Lemma 33. The final details are then handled in Lemma 34: If δ t > 0, then (e * , (t, s)) makes a positive δ-sequence from X , and we are done. If δ t ≤ 0, then we show that X ∈ X (α), so we can choose a positive α-exit sequence from X , say f. We then show that (e * , (t, s), f) is a positive δ-exit sequence from X .

Exit sequences disregarding U,˛-caps and˛-hats
Let X ⊆ S, and let e = (x j , y j ) k j=1 be an α-exit sequence from X . We will say that e is an α-exit sequence from X disregarding U if F(U ) ∩ {x 1 , . . . , x k } = ∅, and if for all j ∈ {1, . . . , k} and all v ∈ viable(y j , α), Later in Sect. 5.5 we will need the fact that Lemma 18-(iv) is also valid for exit sequences from X disregarding U . This is expressed in the following lemma.
Lemma 23 Let X ⊆ S. If e and f are both α-exit sequences from X disregarding U , then the concatenation of these two sequences, denoted by (e, f), is also an α-exit sequence from X disregarding U .
Proof Let e = (x j , y j ) k j=1 and f = (x j , y j ) j=k+1 be two α-exit sequences from X disregarding U . We need to prove that (x j , y j ) j=1 is an α-exit sequence from X disregarding U . To see this, let j ∈ {1, . . . , } and let v ∈ viable(y j , α) be such that {x 1 , . . . , x j−1 } ∪ (esc(X , α)\F(U )) ⊆ sat(v, α). If j ≤ k, we use the fact that e is an α-exit sequence from X disregarding U to deduce that x j ∈ sat(v, α). If j > k, we use the fact that f is an α-exit sequence from X disregarding U to deduce that x j ∈ sat(v, α).
Later in Sect. 5.5 we will also use the fact that the claim of Lemma 21 is valid for exit sequences from X disregarding U and does not need the restriction X ∈ V(U , δ). This is expressed in the following lemma.
Lemma 24 Let X ⊆ S. Then every α-exit sequence from X disregarding U is a δ-exit sequence from X .
Proof Let e = (x j , y j ) k j=1 be an α-exit sequence from X disregarding U . We will prove that e is a δ-exit sequence from X .
It remains to prove that x j ∈ sat(g, δ).
Then x j ∈ sat(g, α) follows by the fact that e is an α-exit sequence from X disregarding U . We also have x j / ∈ F(U ) by the fact that F(U ) ∩ {x 1 , . . . , x k } = ∅. The combination x j ∈ sat(g, α) and x j / ∈ F(U ) implies that x j ∈ sat(g, δ).
Let X ⊆ S and let e = (x j , y j ) k j=1 be an α-exit sequence from X disregarding U . It will be convenient to have terminology for an edge e = (x, y) such that the concatenation (e, e) fails to be an α-exit sequence from X disregarding U only because y ∈ X . We say that (x, y) is an α-cap for (X , e, U ) if x ∈ X \F(U ) and y ∈ X , and if we have for all v ∈ viable(y, α) that We denote by cap(X , e, U , α) the set of α-caps for (X , e, U ). Note that the set cap(X , ø, U , α) is well-defined, as ø is an α-exit sequence from X disregarding U .
We also introduce terminology for an edge e = (x, y) expressing that (e, e) fails to be an α-exit sequence from X disregarding U for the following two reasons: x ∈ F(U ) ∩ X and y ∈ X . We say that (x, y) is an α-hat for (X , e, U ) if x ∈ F(U ) ∩ X and y ∈ X , and if we have for all v ∈ viable(y, α) that We denote by hat(X , e, U , α) the set of α-hats for (X , e, U ). Note that the set hat(X , ø, U , α) is well-defined, as ø is an α-exit sequence from X disregarding U .
Non-emptiness of the set {(x, y) | y ∈ A(x) ∩ X } follows from the fact that x ∈ X and X ∈ C.

Graphs of˛-caps and˛-hats and their basic properties
For X ∈ W(U , δ) and an α-exit sequence e from X disregarding U , let us define K(X , e, U , α) as the graph with vertex set X and edge set cap(X , e, U , α) ∪ hat(X , e, U , α).
Notice that K(X , e, U , α) is indeed a graph with vertex set X , i.e., for every edge (x, y) of K(X , e, U , α), both x and y are trivially elements of X , by definition of the sets cap(X , e, U , α) and hat(X , e, U , α).
Let us also define H(X , e, U , α) as the graph with vertex set X and edge set The graph H(X , e, U , α) is a subgraph of K(X , e, U , α), since for every t ∈ F(U )∩ X , we have (t, U (t)) ∈ hat(X , e, U , α), by Lemmas 25-(iii) and 26.
Both graphs K(X , e, U , α) and H(X , e, U , α) have the property that, for every x ∈ X , there exists y ∈ X such that (x, y) is an edge of the graph. Indeed, for x ∈ X \(esc(X , α) ∪ F(U )), this is implied by Lemma 25-(i) and Lemma 26; for x ∈ esc(X , α)\F(U ), this is implied by Lemma 25-(ii) and Lemma 26; for x ∈ F(U ), it is implied by Lemmas 25-(iii) and 26. As we now see that Lemma8 applies to these two graphs, we can freely use the fact that they have ergodic sets, and that there is a path in these graphs from each x ∈ X to one of their ergodic sets.
Lemma 27 Let X ∈ W(U , δ), let e be an α-exit sequence from X disregarding U . Then Proof Let X ∈ W(U , δ) and let e be an α-exit sequence from X disregarding U .
Proof of (i): Let Z be an ergodic set of K(X , e, U , α). We will prove first that esc(Z , α) ⊆ Z ∩ esc(X , α). Let x ∈ esc(Z , α). We trivially have x ∈ Z , so it remains to prove that x ∈ esc(X , α).
Suppose that x ∈ X \(esc(X , α) ∪ F(U )). Then for all y ∈ safestep(x, α), the edge (x, y) is an edge of the graph K(X , e, U , α), by Lemmas 25-(i) and 26. Therefore, every such y ∈ safestep(x, α) is also an element of Z , by the properties of an ergodic set. It follows that safestep(x, α) ⊆ Z . This implies x / ∈ esc(Z , α). Contradiction. We can already conclude that x ∈ esc(X , α) ∪ F(U ). Suppose that x ∈ F(U ). Then for every y ∈ safestep(x, α) ∩ X , the edge (x, y) is an edge of the graph K(X , e, U , α), by Lemmas 25-(iii) and 26. Therefore, every such y is also an element of Z by the properties of an ergodic set. It follows that safestep(x, α)∩ X ⊆ Z . Now, we choose y ∈ safestep(x, α) with y ∈ (S ∪ S * )\Z , which is possible by the fact that x ∈ esc(Z , α). We must have y / ∈ X then. The fact that y ∈ safestep(x, α) and y / ∈ X demonstrates that x ∈ esc(X , α).
Proof of (ii): Let Y be an ergodic set of H(X , e, U , α).
Let X ∈ W(U , δ). The α-exit sequences e from X disregarding U can be ordered by the cardinality of the set x(e). Let us say that an α-exit sequence e from X disregarding U is maximal if |x(e)| is maximal among the α-exit sequences from X disregarding U .
Lemma 28 Let X ∈ W(U , δ), let e be an α-exit sequence from X disregarding U and let e * be a maximal α-exit sequence from X disregarding U . Then x(e) ⊆ x(e * ).
Proof Define f = (e * , e). The sequence f is an α-exit sequence from X disregarding U by Lemma 23. Then |x(e * )| ≥ |x(f)| = |x(e * ) ∪ x(e)|, where the inequality is by the maximality of e * . This is only possible if x(e) ⊆ x(e * ).
Lemma 29 Let X ∈ W(U , δ) and let e * be a maximal α-exit sequence from X disregarding U . Let Y be an ergodic set of the graph H(X , e * , U , α) and let f be an α-exit sequence from Y disregarding U . Then f is an α-exit sequence from X disregarding U .
Proof We trivially have x(f) ⊆ Y ⊆ X and x(f) ∩ F(U ) = ∅. The sequence f thus trivially satisfies these two requirements for an α-exit sequence from X disregarding U .
We next prove the requirement y(f) ⊆ (S ∪ S * )\X by contradiction. Therefore, suppose that y(f) (S ∪ S * )\X . Then there must be h ∈ {1, . . . , k(f)} with the property y h (f) ∈ X . Choose the smallest h with this property. Let f = (x h (f), y h (f)) and let f = (x j (f), y j (f)) h−1 j=1 . In our proof, we need the fact that f is an α-exit sequence from X disregarding U . The requirements x( f) ⊆ X and x( f)∩F(U ) = ∅ are trivially satisfied and the requirement y( f) ⊆ (S ∪ S * )\X is satisfied due to the choice of h. Let j ∈ {1, . . . , h − 1}, let v ∈ viable(y j ( f), α), and assume that We need to prove that x j ( f) ∈ sat(v, α). We have esc(Y , α)\F(U ) ⊆ esc(X , α)\ F(U ), which follows by Lemma 27-(ii) applied to e * . Also, we trivially have Then x j ( f) = x j (f) ∈ sat(v, α), as f is an α-exit sequence from Y disregarding U . This proves our claim that f is an α-exit sequence from X disregarding U . Now observe that the edge f is an α-cap for (X , f, U ). Indeed, the only reason why ( f, f ) fails to be an α-exit sequence from X disregarding U is because last( f ) = y h (f) ∈ X . We have x( f) ⊆ x(e * ) by Lemma 28, as f is an α-exit sequence from X disregarding U and e * is a maximal one. It subsequently follows by Lemma 26 that f ∈ cap(X , f, U , α) ⊆ cap(X , e * , U , α). Thus, f is an edge of the graph H(X , e * , U , α). Now, since first( f ) = x h (f) ∈ Y and since Y is an ergodic set of H(X , e * , U , α), it follows that last( f ) = y h (f) ∈ Y . This contradicts that f is an α-exit sequence from Y .
Proof We first prove that an ergodic set Z of the graph K(X , e, U , α) exists with pos(Z , α) = ∅. Suppose therefore that pos(Z , α) = ∅ for every ergodic set Z of K(X , e, U , α). We will demonstrate that pos(X , δ) = ∅, contradicting that X ∈ Assume that x is element of an ergodic set Z of K(X , e, U , α). Then obviously α x ≤ 0, by the assumption that pos(Z , α) = ∅. If x ∈ X \F(U ), then it follows immediately that δ x = α x ≤ 0. We assume further that x ∈ F(U ). The edge (z, U (z)) is an edge of K(X , e, U , α) for all z ∈ F(U ) ∩ X , since it is by definition an edge of H(X , e, U , α) and H(X , e, U , α) is a subgraph of K(X , e, U , α). Therefore, we have U (z) ∈ Z for all z ∈ F(U ) ∩ Z , by the properties of an ergodic set of K(X , e, U , α). Thus, it is possible to choose a non-absorbing and U -compatible plan g with S(g) ⊆ Z and with first(g) = x. Plan g is α-viable by the assumption that pos(Z , α) = ∅. Then g ∈ admiss(x, U , α) as it satisfies AD-ii. It follows that δ x ≤ φ i x (g) = 0. Now assume that x is not an element of any ergodic set of K(X , e, U , α). Then we can choose an ergodic set Z of K(X , e, U , α) and a U -compatible path p in K(X , e, U , α) from x to an element of Z . We can also choose a non-absorbing and U -compatible plan g with S(g) ⊆ Z and first(g) = last( p). Plan g is α-viable by the assumption pos(Z , α) = ∅. We claim that plan p, g is also α-viable.

Contradiction.
So, we can choose an ergodic set Z of K(X , e, U , α) with pos(Z , α) = ∅. Now suppose that pos(Y , α) = ∅ for every ergodic set Y of H(X , e, U , α) that is a subset of Z . We will derive a contradiction by demonstrating that pos(Z , α) = ∅. Let x ∈ Z .
Assume that x is element of an ergodic set Y of H(X , e, U , α). Then α x ≤ 0, since we suppose that pos(Y , α) = ∅. Now assume that x is not an element of any ergodic set of H(X , e, U , α). Then we choose an ergodic set Y of H(X , e, U , α) and a path p in H(X , e, U , α) from x to an element of Y . Notice that path p lies entirely inside Z , because p is a path in the graph K(X , e, U , α), the path starts in Z , and Z is an ergodic set of K(X , e, U , α). Similarly, we can argue that Y ⊆ Z . We can also choose a non-absorbing plan g with S(g) ⊆ Y and with first(g) = last( p). Plan g is α-viable by the assumption pos(Y , α) = ∅. We claim that plan p, g is also α-viable.

Existence of ı-exit sequences
Let X ∈ W(U , δ) and let e * a maximal α-exit sequence from X disregarding U . We will exploit the properties of the graphs H(X , e * , U , α) and K(X , e * , U , α) to establish the existence of a δ-exit sequence from X .

I:
We prove by contradiction that F(U )∩Y = ∅, which also proves F(U )∩ Z = ∅. Suppose therefore that F(U ) ∩ Y = ∅. We claim that Y ∈ X (α). We trivially have Y ∈ P(α) and we have Y ∈ C by the properties of an ergodic set. It remains to prove that Y ∈ E(α). We obviously have pos(Y , α) ⊆ pos(X , δ). We also have Here, the equality holds because we suppose F(U ) ∩ Y = ∅, the first inclusion is by Lemma 27-(ii), the second inclusion follows by Lemma 14-(iv), and the third inclusion is trivial. We now conclude that where the equality is because X ∈ X (δ) ⊆ E(δ). This proves that Y ∈ E(α).
So we have indeed Y ∈ X (α). We can thus choose a positive α-exit sequence e from Y , by the fact that α ∈ * . We now claim that e is an α-exit sequence from Y disregarding U .
The requirements x(e) ⊆ Y and y(e) ⊆ (S ∪ S * )\Y are obviously satisfied, because e is an α-exit sequence from Y . The requirement F(U ) ∩ x(e) = ∅ is satisfied due to the assumption F(U )∩Y = ∅. We further need to prove that, for all j ∈ {1, . . . , k(e)} and all v ∈ viable(y j (e), α), we have By definition of an α-exit sequence from Y , the sequence e satisfies, for all j ∈ {1, . . . , k(e)} and all v ∈ viable(y j (e), α), We see that (10) indeed holds, since we have esc(Y , α) = esc(Y , α)\F(U ) due to the fact that we suppose F(U ) ∩ Y = ∅. This shows that e is indeed an α-exit sequence from Y disregarding U . Now, it follows by Lemma 29 that e is an α-exit sequence from X disregarding U . It follows even that e is a positive α-exit sequence from X disregarding U , as it is a positive α-exit sequence from Y . But then e * must be a positive α-exit sequence from X disregarding U as well, by the maximality of e * . This contradicts that x(e * ) ∩ pos(X , α) = ∅, a given assumption of this lemma.
We will prove that t ∈ sat(g, δ). The proof will be by contradiction, so we suppose that t / ∈ sat(g, δ). The contradiction will be derived in six steps.

II-i:
For all x ∈ Y \{t}, we construct a plan h x ∈ viacomp(t, U , α) such that x lies on h x , such that all states on h x from start to the first occurrence of x are different, and such that h x / ∈ admiss(t, U , α). For all x ∈ Y \{t}, we can choose a path p x in H(X , e * , U , α) from t to x, as both t and x are elements of the ergodic set Y . We choose p x of minimum length to ensure that each state appears at most once on p x . We can also choose a path q x in H(X , e * , U , α) from x to t. We now define the plan h x by h x = p x , q x , (t, s), g .
Further, p x , q x is a path in K(X , e * , U , α), as p x and q x are both paths in the subgraph H(X , e * , U , α) of K(X , e * , U , α). Thus, Lemma 30 applies to the plan (t, s), g and the path p x , q x . It follows that h x = p x , q x , (t, s), g ∈ viable(t, α). Further, plan h x is U -compatible, because the path p x , q x , (t, s) and plan g are both U -compatible. Thus, h x ∈ viacomp(t, U , α). Now, by definition, we have t ∈ sat(v, δ) for all v ∈ admiss(t, U , α). Apparently then, h x / ∈ admiss(t, U , α), as we suppose t / ∈ sat(g, δ) = sat(h x , δ).
where the last equality is by the fact that X ∈ X (δ) ⊆ E(δ). This proves that Y ∈ E(α),

II-v:
We now prove the existence of a positive α-exit sequence from Y disregarding U .
Since we have Y ∈ X (α) by the result of II-iv and since we have α ∈ * , we can choose a positive α-exit sequence e from Y . We can choose e such that every edge is legitimate. Indeed, if this is not already the case, then we remove, one by one, every edge that is not legitimate. This results in a non-empty α-exit sequence from Y in which every edge is legitimate, by Lemma 18-(ii). The resulting sequence is still positive, by .
Let e be the sequence that results from e by deleting all edges (x j (e), y j (e)) with the property x j (e) ∈ F(U ). We claim that e is a positive α-exit sequence from Y disregarding U .
Let us investigate the relation between e and e a bit more. If F(U )∩esc(Y , α) = ∅, then the sequence e contains no edges of the form (x, y) with i x = i t and α x ≤ α t , as these are then not legitimate. Then in fact, the sequence contains e no edges of the form (x, y) with i x = i t at all, as there are no states in Y with i x = i t and α x > α t , by the result of II-ii. Thus, If F(U ) ∩ esc(Y , α) = ∅, then we may have e = e. Assume that this is indeed the case. Then let h ∈ {1, . . . , k(e)} be such that x h (e) ∈ F(U ). Now, there may exist j ∈ {1, . . . , k(e)} with i x j (e) = i x h (e) = i t and j = h. If this is so, then the relevant thing to see is that this implies j < h. Indeed, if we suppose j > h, then α x j (e) > α t by Lemma 18-(vi), which contradicts the result of II-ii. Thus, F(U ) ∩ esc(Y , α) = ∅ and x h (e) ∈ S i t ⇒ ∀ j ≤ h : x j (e) = x j (e). (14) Let us now prove that e is a positive α-exit sequence from Y disregarding U . We have x(e) ∩ F(U ) = ∅ due to the construction of e. We have x(e) ⊆ x(e) ⊆ Y and y(e) ⊆ y(e) ⊆ (S ∪ S * )\Y due to the fact that e is an α-exit sequence from Y . Now, let j ∈ {1, . . . , k(e)}, let v ∈ viable(y j (e)), and assume that {x 1 (e), . . . , x j−1 (e)} ∪ (esc(Y , α)\F(U )) ⊆ sat(v, α).
It follows that x j (e) ∈ sat(v, α) by the fact that e is an α-exit sequence from Y . Then x j (e) ∈ sat(v, α), as x j (e) = x j (e) by the result of (14). Now assume that x j (e) ∈ Y \S i t . Then the result of II-iii applies, since we have x j (e) ∈ Y \S i t and y j (e) ∈ A(x)\Y . Thus, we have t ∈ sat(v, α) or x j (e) ∈ sat(v, α). In fact, x j (e) ∈ sat(v, α) always holds. Indeed, when t ∈ sat(v, α) holds, we have s ∈ sat(v, α) for all s ∈ F(U ). It then follows that {x 1 (e), . . . , where i ∈ {1, . . . , k(e)} is such that x i (e) = x j (e). It subsequently follows that x j (e) = x i (e) ∈ sat(v, α), by the fact that e is an α-exit sequence from Y . We proved that e is indeed an α-exit sequence from Y disregarding U . The sequence e is also positive, since e is positive and since e was obtained from e by the deletion of edges (x, y) that satisfy α x = α t ≤ 0.

II-vi:
We can now derive the desired contradiction. By Lemma 29, the sequence e is not only an α-exit sequence from Y disregarding U , but it is also an α-exit sequence from X disregarding U . It follows that x(e) ⊆ x(e * ), by Lemma 28. Hence, x(e) ∩ pos(X , α) ⊆ x(e * ) ∩ pos(X , α) = ∅. This contradicts that e is positive.
Lemma 33 Let X ∈ W(U , δ) and let e * be a maximal α-exit sequence from X disregarding U . If x(e * ) ∩ pos(X , α) = ∅, then there exists a (δ, x(e * ))-exit from X of the form (t, s) with t ∈ F(U ).
To see that (t, s) is indeed the required edge, it now suffices to prove that t ∈ sat(g, δ).
This proves Z ∈ X (α). We can therefore choose a positive α-exit sequence f from Z , as α ∈ * . Define f = (x j (f), y j (f)) h j=1 , where h is the largest index such that x j (f) / ∈ F(U ) for all j ≤ h. Thus, h = 0 in case x 1 (f) / ∈ F(U ) and h = k(f) in case x(f) ⊆ F(U ). We claim that f is an α-exit sequence from X disregarding U . We have x( f) ∩ F(U ) = ∅ by construction, and we trivially have x(f) ⊆ Z ⊆ X . To prove that f satisfies the other requirements as well, we define as the largest index in {0, . . . , h} such that {y 1 (f), . . . , y (f)} ⊆ (S ∪ S * )\X . We will first prove that the sequence f = (x j (f), y j (f)) j=1 is an α-exit sequence from X disregarding U , and we will then prove that = h.
Then indeed x j (f) ∈ sat(g, α), since f is an α-exit sequence from Z . We proved that the sequence f = (x j (f), y j (f)) j=1 is an α-exit from X disregarding U . We now prove that f = f, i.e., that = h, by contradiction. So suppose that < h. Observe then that the edge (x +1 (f), y +1 (f)) is an α-cap for (X , f, U ). The edge is then also an α-cap for (X , e * , U ), by Lemma 26 and the maximality of e * . This means that (x +1 (f), y +1 (f)) is an edge of the graph K(X , e * , U , α). It follows that y +1 (f) ∈ Z , since Z is an ergodic set of K(X , e * , U , α). This contradicts that f is an α-exit sequence from Z .
We proved that f is an α-exit sequence from X disregarding U . We therefore have x( f) ⊆ x(e * ) by the maximality of e * . Since f is a positive sequence and e * is not, we have x(f) x(e * ). Thus, we have f = f, i.e., we have h < k(f). The edge (x h+1 (f), y h+1 (f)) therefore exists. The edge is of the form (t, s) with t ∈ F(U ), because h was defined as the largest index such that x j (f) / ∈ F(U ) for all j ≤ h. Now, let g ∈ viable(s, δ), and assume that x(e * ) ∪ esc(X , δ) ⊆ sat (g, δ).
Here, the inclusion follows by Lemma 14-(iv) and the final equality is by the fact that X ∈ E(δ).
We can choose a positive α-exit sequence f from X , since we proved that X ∈ X (α) and since we have α ∈ * . We claim that h = (e * , (t, s), f) is the required positive δ-exit sequence from X .
We need to prove that x ∈ sat(g, δ). If x ∈ F(U ), this follows immediately, because t ∈ sat(g, δ) is implied by assumption 21. We assume further that x ∈ X \F(U ).
Then x = x j (f) ∈ sat(g, α) by the fact that f is an α-exit sequence from X . As we assume x ∈ X \F(U ), it follows immediately that x ∈ sat(g, δ).
The following result is a direct consequence of Lemmas 17, 22 and 34 .

Existence of a fixed point in Ä *
We have arrived at the main result of this section, which is the existence of a fixed point with respect to the update procedure described in Sect. 3. Of course, since α ∈ * and U ∈ U(α) were fixed, but chosen arbitrarily, the results of Sects. 5.3, 5.4 and 5.5 are valid for all α ∈ * and U ∈ U(α). From here, we do not assume anymore that α ∈ * and U ∈ U(α) are fixed.
To prove (23), let α ∈ * * , let U ∈ U(α), and let t ∈ S. If t ∈ S\F(U ), then we have δ t (U , α) = α t , in which case δ t (U , α) = 0 or the existence of x ∈ S such that δ t (U , α) = r i t (x) follows trivially from the fact that α t has this property.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.