Submixing and shift-invariant stochastic games

We study optimal strategies in two-player stochastic games that are played on a finite graph, equipped with a general payoff function. The existence of optimal strategies that do not make use of memory and randomisation is a desirable property that vastly simplifies the algorithmic analysis of such games. Our main theorem gives a sufficient condition for the maximizer to possess such a simple optimal strategy. The condition is imposed on the payoff function, saying the payoff does not depend on any finite prefix (shift-invariant) and combining two trajectories does not give higher payoff than the payoff of the parts (submixing). The core technical property that enables the proof of the main theorem is that of the existence of ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}-subgame-perfect strategies when the payoff function is shift-invariant. Furthermore, the same techniques can be used to prove a finite-memory transfer-type theorem: namely that for shift-invariant and submixing payoff functions, the existence of optimal finite-memory strategies in one-player games for the minimizer implies the existence of the same in two-player games. We show that numerous classical payoff functions are submixing and shift-invariant.


Introduction
The games that we study are played between two players on a nite graph. Every vertex of the graph belongs to one of the players, the one that decides which edge should be taken next. The result of such a play is an in nite path in the graph. The objective of the game is given using a payo function, which maps in nite paths to real numbers. The maximizer or Player 1, wants to maximize the payo , while his adversary (the minimizer) wants the opposite.
The study of such games has been an active area of research for a few decades, in a variety communities; especially in that of theoretical computer science and economics. They are used to model simpli ed adversarial (zero-sum) situations. In computer science they are used in verifying properties of systems, but also as a very bene cial theoretical tool in logic and automata theory.
In this paper we consider stochastic games, a more general model where in every step, after an action is chosen, there is a probability distribution on the set of vertices according to which the next vertex is chosen. In this scenario, Player 1 wants to maximize the expected payo , and his adversary to minimize it.
Well-known examples of games played on graphs are the discounted games, meanpayo games, games equipped with the limsup payo function and parity games. These four classes of games share a common property: both players have very simple optimal strategies, namely optimal strategies that are both deterministic and stationary. These are strategies that guarantee maximal expected payo and choose actions deterministically (without randomisation) and this deterministic choice depends only on the current vertex (it does not use memory). When games admit such strategies for the maximizer they are called half-positional, when they admit such strategies for both players they are called positional. This property is highly desirable and it is often the starting point for further algorithmic analysis.
The broad purpose of the present paper is to study what is the common quality of games that makes it possible for them to admit deterministic and stationary optimal strategies.
Context. There have been numerous papers about the existence of deterministic and stationary optimal strategies in games with di erent payo functions. Shapley proved that stochastic games with discounted payo function are positional using an operator approach [Sha53]. Derman showed the positionality of one-player games with expected mean-payo reward, using an Abelian theorem and a reduction to discounted games [Der62]. Gilette extended Derman's result to two-player games [Gil57] but his proof was found to be wrong and corrected by Ligget and Lippman [LL69]. The positionality of one-player parity games was addressed in [CY90] and later on extended to two-player games in [CJH03,Zie04]. Counter games were extensively studied in [BBE10] and several examples of positional counter games are given. There are also several examples of one-player and two-player positional games in [Gim07,Zie10]. A whole zoology of half-positional games is presented in [Kop09] and another example is given by mean-payo co-Büchi games [CHJ05]. The proofs of these various results are quite heterogeneous, making it di cult to nd a common property that explains why they are positional or half-positional. Some e ort has been made to better understand conditions that make games (half) positional, which has made apparent that payo functions that are shift-invariant and submixing play a crucial role. Our contributions lie in this direction.
Contributions. The results of the present paper can be summarised as follows.
Theorem 1.1. Games equipped with a payo function that is shift-invariant and submixing are half-positional.
As mentioned above, half-positional games are those where the maximizer has a simple kind of strategy that is optimal. There is nothing special about this player, if instead of the submixing condition, we de ne an "inverse" submixing condition, namely one that requires that the combined payo is larger than the minimum of the parts, we would have an analogous theorem that proves the existence of simple optimal strategies for the minimizer. Furthermore there are payo functions for which both versions of the submixing condition hold, and for these games the theorem proves positionality. The conditions in the statement of the theorem are not necessary; we will provide examples and discuss this fact. The proof of Theorem 1.1 is by induction on number of edges, it uses Lévy's 0-1 law, as well as the following crucial property of the games under consideration. Namely that games equipped with a payo function that is both bounded and Borel-measurable admit -subgame-perfect strategies, for every > 0. A proof of this fact can be found in [MY15].
The second contribution says that having a shift-invariant payo function is su cient for the existence of -subgame-perfect strategies.
Theorem 1.2. Games equipped with a payo function that is shift-invariant, for every > 0, admit -subgame-perfect strategies.
The proof of this theorem uses martingale theory, and takes a large part of the paper, however it is independent of the rest.
A third contribution comes as a corollary of the techniques developed for the main theorem. It is a transfer-type theorem that lifts the existence of optimal nite-memory strategies in one-player games (also known as Markov decision processes) to the same for two-player games.
Theorem 1.3. Let be a payo function that is both shift-invariant and submixing.
Assume that in all games equipped with and fully controlled by the minimizer, for every > 0, the minimizer has an -optimal strategy with nite memory. Then in every (two-player) game, for every > 0, the minimizer has an -subgame-perfect strategy that has nite memory.
The statement also holds for = 0, that is: if the minimizer has an optimal strategy with nite memory in every game that he fully controls, then in every (two-player) game as well he has a subgame-perfect strategy with nite memory.
Furthermore this theorem is proved by e ectively constructing the -subgameperfect strategies in the two-player games. Those are obtained by combining and simplifying -optimal strategies in one-player games.
A more general result about the transfer of simple class of strategies for the minimizer from one-player to two-player games is also formulated in Theorem 6.2.
Related work. For one-player games it was proved by the rst author that every one-player game equipped with a payo function that is both shift-invariant and submixing is positional [Gim07]. This result was successfully used in [BBE10] to prove positionality of counter games. A weaker form of this condition was presented in [GZ04] to prove positionality of deterministic games (i.e. games where transition probabilities are equal to 0 or 1, not stochastic). Kopczynski proved that two-player deterministic games equipped with a shift-invariant and submixing payo function that takes only two values is half-positional [Kop06].
A result of Zielonka [Zie10] provides a necessary and su cient condition for the positionality of one-player games. The condition is expressed in terms of the existence of particular optimal strategies in multi-armed bandit games. When trying to prove the positionality for a particular payo function, the condition in [Zie10] is harder to check than the submixing property which is purely syntactic.
Some results on nite-memory determinacy have been obtained in [BRO + 20], with di erent requirements: the size of the memory should be independent from the arena, whereas in this paper we do not make such an assumption.
The pre-print version of this present paper [GK14] has already been used in a number of works, mostly pertaining the algorithmic game theory community. We mention the papers that we are aware of. In [CD16], Chatterjee and Doyen study payo functions that are a conjunction of mean-payo objectives, and prove that they are in co-NP for nite-memory strategies. They use Theorem 1.1; and for Theorem 1.2 they observe that in the special case of nite-memory strategies there is a simple combinatorial proof, which bypasses the use of martingale theory. In [BKW18] the authors consider arbitrary boolean combination of expected mean-payo objectives and the main theorem of the present paper appears as Theorem 1, and is the starting point of their further algorithmic analysis. Games played on nite graphs where the information ow is perturbed by non-deterministic signalling delays are considered in [BvdB15], where submixing and shift-invariant payo functions play a central rôle. Our results and proof techniques were also used by Mayr, Schewe, Totzke and Wojtczak to establish a nite-memory transfer theorem analogous to the second part of Theorem 1.3 and to prove that games with energy-parity objectives and almost-sure semantics lie in NP ∩ co-NP [MSTW21].
Organisation of the paper. We x the notation and give the relevant de nitions in Section 2, where one can also nd an overview of the proof. We give examples of shift-invariant and submixing payo functions in Section 3, as well as show how the Theorem 1.1 can be used to recover numerous classical determinacy results. In Section 4, we de ne reset strategies as a method of obtaining -subgame-perfect strategies, which exist due to Theorem 1.2. The proof of the main theorem, Theorem 1.1, is given in Section 5, and that of the transfer theorem for nite-memory strategies, Theorem 1.3, in Section 7.

Preliminaries
The purpose of this section is to introduce the basic notions that we need about stochastic games with perfect information, that is the de nitions of: games, payo functions, strategies and values.
Games A game is speci ed by the arena and the payo function. While the arena determines how the game is played, the payo function speci es the objectives that the players want to reach.
We use the following notations throughout the paper. Let be a nite set. The set of nite (respectively in nite) sequences on is denoted * (respectively ). A probability distribution on is a function ∶ → [0, 1] such that ∑ ∈ ( ) = 1. The set of probability distributions on , we denote by Δ( ).
De nition 2.1 (Arena). A stochastic arena with perfect information is a tuple: • is a nite set of states (that is nodes of the graph) partitioned in two sets ( 1 , 2 ), • is a nite set of actions, • for each state ∈ , a non-empty set ( ) ⊆ of actions available in , • and transition probabilities ∶ × → Δ( ).
An arena is fully controlled by the minimizer if ( ) is a singleton for every ∈ 1 . An in nite play in an arena  is an in nite sequence = 0 1 1 2 ⋯ ∈ ( ) such that for every ∈ ℕ, +1 ∈ ( ). A nite play in  is a nite sequence in ( ) * which is the pre x of an in nite play.
With each in nite play is associated a payo computed by a payo function. Player 1 (the maximizer) wants to maximize the expected payo while Player 2 (the minimizer) has the exact opposite preference. Formally, a payo function for the arena  is a bounded and Borel-measurable function ∶ ( ) → ℝ which associates with each in nite play ℎ a payo (ℎ).
De nition 2.2 (Stochastic game with perfect information). A stochastic game with perfect information is a pair where  is an arena and a payo function for the arena .
Strategies A strategy in an arena  for Player 1 is a function such that for any nite play 0 1 ⋯ , and every action ∈ , if ( 0 1 ⋯ )( ) > 0 then the action belongs to ( ), i.e. the played action is available. Strategies for Player 2 are de ned similarly and are typically denoted . General strategies can have in nite memory as well as randomise among the available actions at every step. We are interested in a very simple sub-class of strategies, namely those that do not use any memory, or randomisation.
De nition 2.3 (Deterministic and stationary strategies). A strategy for Player 1 is deterministic if for every nite play ℎ ∈ ( ) * 1 and action ∈ , A strategy is stationary if (ℎ) only depends on the last state of ℎ. In other words is stationary if for every state ∈ 1 and for every nite play ℎ = 0 1 ⋯ , Given an initial state ∈ and strategies and for players 1 and 2 respectively, the set of in nite plays that start at state is naturally equipped with a sigma-eld and a probability measure denoted ℙ , that are de ned as follows. Given a nite play ℎ and an action , the set of in nite plays ℎ( ) and ℎ ( ) are cylinders that we abusively denote ℎ and ℎ . The sigma-eld is the one generated by cylinders and ℙ , is the unique probability measure on the set of in nite plays that start at such that for every nite play ℎ that ends in state , for every action ∈ and state ∈ , For ∈ ℕ, we denote and the random variables de ned by Values and optimal strategies Let be a game with a bounded measurable payo function . The expected payo associated with an initial state and two strategies and is the expected value of under ℙ , , denoted , [ ]. The maxmin and minmax values of a state ∈ in the game are: By de nition of maxmin and minmax, for every state ∈ , maxmin( )( ) ≤ minmax( )( ). As a corollary of the Martin's determinacy theorem for Blackwell games [Mar98, Section 1], the converse inequality holds as well: This common value is called the value of state in the game and denoted val( )( ).
The existence of a value guarantees the existence of -optimal strategies for both players and every > 0.
De nition 2.5 (Optimal and -optimal strategies). Let be a game, > 0 and a strategy for Player 1. Then is -optimal if for every strategy and every state ∈ , The de nition for Player 2 is symmetric. A 0-optimal strategy is simply called optimal.
A stronger class of -optimal strategies are -subgame-perfect strategies, which are strategies that are not only -optimal from the initial state but stay -optimal throughout the game. More precisely, given a nite play ℎ = 0 ⋯ and a function whose domain is the set of (in) nite plays, by [ℎ] we denote the function shifted by ℎ: De nition 2.6 ( -Subgame-Perfect Strategy). Let be a game equipped with a payo function . A strategŷ for Player 1 is said to be -subgame-perfect if for every nite play ℎ ∶= 0 ⋯ , Shift-invariant and submixing Without loss of generality we can assume that there is a nite set (colours assigned to the states of the game) such that the payo function is a function that is Borel-measurable and bounded. We de ne the two conditions with respect to such payo functions.
De nition 2.7 (Shift-Invariant). The payo function is shift-invariant if and only if for all nite pre xes ∈ * and trajectories ∈ , Note that shift-invariance is a stronger condition than saying: if one can get ′ ∈ from ∈ by replacing nitely many letters then ( ) = ( ′ ). Sometimes in the literature this stronger condition is called "pre x-independent" or "tail-measurable". Intuitively shift-invariant payo functions are such that they only measure asymptotic properties, and do not talk about indices.

De nition 2.8 (Submixing). The payo function is submixing if and only if for all
, , ∈ such that is a shu e of and we have The submixing condition says that one cannot shu e two losing trajectories to make a winning one. This requirement simpli es the kind of strategies that the players need.
The submixing condition is not symmetric over the players, and it implies di erent results for di erent players (notice the di erence between Theorem 1.1 and Theorem 1.3). We de ne the inverse-submixing condition which is its re ection about the players: De nition 2.9 (Inverse-Submixing). The payo function is inverse-submixing if and only if for all , , ∈ such that is a shu e of and we have There are payo functions that are both submixing and inverse-submixing (e.g. the parity function); for such payo s Theorem 1.1 implies simple optimal strategies for both players, i.e. positionality.

Applications and Examples
In this section we give a variety of examples of payo functions that are shift-invariant and submixing, some of them very well-known, others less so. Thus we unify a number of classical positional determinacy results and also sketch how straightforward it is to apply Theorem 1.1 to novel payo functions. Furthermore, we comment on the hypothesis of Theorem 1.1: Are the conditions necessary? What do they imply about the optimal strategies of the minimizer? Under what operations is this class of payo functions closed? We start by listing a few well-known examples.
The parity condition is used in automata theory and logics [GTW02]. Each state is labeled with some color ( ) ∈ {0, … , }. The payo is 1 if the highest color seen in nitely often is even, and 0 otherwise. For 0 1 ⋯ ∈ {0, … , } , The limsup payo function has been used in the theory of gambling games [MS96]. States are labeled with immediate rewards and the payo is the limit supremum of the rewards: The liminf payo function can be de ned similarly.
The two following propositions follow easily from Theorem 1.1.
Proposition 3.2. In every two-player stochastic game equipped with the parity, limsup, liminf, mean or discounted payo function, Player 1 has a deterministic and stationary strategy which is optimal. The same is true for Player 2 for the parity, limsup and liminf payo .
One comment should be made about the discounted payo function: While it is not shift-invariant, it is possible to reduce games equipped with this function to games with the mean-payo function, by interpreting discount factors as stopping probabilities as was done in the seminal paper of Shapley [Sha53]. One can nd details of this reduction in [Gim07,Gim06].
Thus we have uni ed a number of classical results, thereby giving a common reason for the half-positionality of seemingly unrelated games. The approaches that can be found in the literature for proving that these games are (half-)positional are diverse, as one can see, for example, by consulting the papers [CY90] and [MS96] that show positionality for parity games and limsup games, respectively. The existence of deterministic and stationary optimal strategies in mean-payo games has a colourful history attached. The rst proof was given by Gilette [Gil57] based on a variant of Hardy and Littlewood theorem. Later on, Ligget and Lippman found the variant to be wrong and proposed an alternative proof based on the existence of Blackwell optimal strategies plus a uniform boundedness result of Brown [LL69]. For one-player games, Bierth [Bie87] gave a proof using martingales and elementary linear algebra while [VTRF83] provided a proof based on linear programming and a modern proof can be found in [NS03] based on a reduction to discounted games and the use of analytical tools. For two-player games, a proof based on a transfer theorem from one-player to two-player games can be found in [Gim06,GZ09,GZ16].

Other Examples
We mention a few more recent examples of games.
One-counter stochastic games have been introduced in [BBE10], in these games each state ∈ is labeled by a relative integer ( ) ∈ ℤ. Three di erent winning conditions were de ned and studied in [BBE10]: The positive average condition de ned by (5) is a variant of mean-payo payo , which may be more suitable to model quality of service constraints or decision makers with a loss aversion. One can naturally de ned a payo function posavg , that outputs 1 if the condition holds, and 0 otherwise. Although posavg seems similar to the mean function, maximizing the expected value of posavg and doing the same for mean , are two di erent goals. For example, a positive average maximizer prefers seeing the sequence 1, 1, 1, … for sure rather than seeing with equal probability 1 2 the sequences 0, 0, 0, … or 3, 3, 3, … while a mean-value maximizer prefers the second situation to the rst one. To the best knowledge of the authors, the classical techniques developed in [Bie87, NS03, VTRF83] cannot be used to prove positionality of games equipped with the positive average condition. However, since posavg can be de ned as the composition of the submixing function mean with an increasing function it is submixing itself. As a consequence of the main theorem of the present paper, it then follows that games that are equpped with posavg are half-positional.
Another recent example are the generalized mean payo games, that were introduced in [CDHR10]. Each state is labeled by a xed number of immediate rewards (1) , … , ( ) , which de ne as many mean payo conditions 1 mean , … , mean . The winning condition is: In the special case of mean-payo co-Büchi games, a subset of the states are called Büchi states, and the payo of Player 1 is −∞ if Büchi states are visited in nitely often and the mean-payo value of the rewards otherwise. One can easily check that such a payo mapping is shift-invariant and submixing. Although we do not explicitly handle payo mappings that take in nite values, it is possible to approximate the payo function by replacing −∞ by arbitrary small values to prove half-positionality of mean-payo co-Büchi games. The general payo s captured by the condition in (6) are not submixing, however, a natural variant is: Optimistic generalized mean-payo games are de ned similarly except the winning condition is It is an exercise to show that this winning condition is submixing. More generally, if 1 , … , are submixing payo mappings then max{ 1 , … , } is submixing as well. As a consequence of this observation and Theorem 1.1, games with the optimistic generalized mean-payo condition are half-positional. Such games are not positional however. One can show that the minimizer requires ( nite) memory. Intuitively, he needs to use the memory to remember which dimensions have to be decreased, in order to render the condition false. There are even examples of shift-invariant and submixing payo functions where the minimizer requires in nite memory to play optimally. Here is one of them.
The set of colours is { , }. The payo function is equal to −1 if and only if the word ∈ { , } that it inputs contains in nitely many s, in nitely many s, and moreover = 1 2 3 ⋯ , is such that lim inf = ∞, otherwise it is equal to 0. One nal but interesting example of a payo function that is shift-invariant, submixing, and even inverse-submixing (hence positional for both players in two-players games) is the positive frequency payo . Every state is labeled by a color from a set , each of which has a payo ( ). An in nite play generates an in nite word of colors: For a color and ∈ ℕ de ne #( , 0 1 ⋯ ) to be the number of occurrences of the color in the pre x 0 1 ⋯ . The frequency of the color in is de ned as: and the payo Other examples can be found in [Gim07,Kop09,Gim06], and in the papers cited in the introduction.

The Class of Shift-Invariant and Submixing Functions
In this section we have already used two operators under which the class of shiftinvariant and submixing functions is closed: • If 1 , … , are shift-invariant and submixing then so is

• If is shift-invariant and submixing, and is an increasing function then
• is shift-invariant and submixing.
The proofs are routine. The class of shift-invariant and submixing functions does not seem to have any non-trivial closure property. For example, even though this class is closed under max above, it is not closed under addition. That is if 1 and 2 are submixing, then ( ) ∶= 1 ( ) + 2 ( ) need not be. To see this, consider the example with colors and , and 1 such that it maps to 1 if occurs in nitely often, and 0 otherwise, and 2 de ned symmetrically.
Furthermore, neither condition is necessary in Theorem 1.1: discounted games are positional but not shift-invariant, and mean with lim inf instead of lim sup is positional but not submixing. However, as we have seen, this class contains many interesting payo functions, and it is the salient property that allows one to prove the existence of positional optimal strategies. Perhaps even more importantly, it is typically trivial to check whether a given payo function is shift-invariant and submixing.

-Subgame-Perfect Strategies
The proof of Theorem 1.1 hinges on a crucial property of games with perfect information, namely the fact that they admit -subgame-perfect strategies, for all > 0.
Theorem 1.2. Games equipped with a payo function that is shift-invariant, for every > 0, admit -subgame-perfect strategies.
Note that we cannot lift the shift-invariant hypothesis from Theorem 1.2. That is, one can easily nd an example of a game where there are no -subgame-perfect strategies, even a game with only one player.
Note that Theorem 1.2 is true for arbitrary payo functions and the weaker notion of -subgame-perfect strategy, requiring that for every nite play ℎ = 0 ⋯ , Indeed, this was proved independently by Mashiah-Yaakovi, [MY15, Proposition 11] for concurrent games. That result implies Theorem 1.2, since for shift-invariant games, the condition (7) coincides with that of De nition 2.6. The weak and strong notions of conditions coincide when the payo function is shift-invariant. On the one hand, our proof only works for the strong notion of subgame-perfectness in De nition 2.6. On the other hand, our proof makes transparent how to construct -subgame-perfect strategies from /2-optimal ones, in a way that preserves some important properties of the strategy, notably its use of nite memory.
The proof of the theorem will be symmetric with respect to the players, so we will only show that Player 1 has -subgame-perfect strategies. We will do this by taking an -optimal strategy with some more structure, and using it to construct a reset strategŷ , which will be 2 -subgame-perfect. The reset strategy is conceptually very simple: a strategy is not 2 -subgame-perfect if and only if there exists some nite play ℎ ∶= 0 ⋯ such that the reset strategy simply resets its memory when this happens. We give the formal de nitions.
It is plain that one can factorise any in nite play into ℎ 1 ℎ 2 ⋯ where each ℎ is a ( , )-drop, but no strict pre x of ℎ is ( , )-drop. For example: De nition 4.2. We de ne the date of the most recent (or latest) drop for all 0 ⋯ inductively as: The date of the most recent drop in the example above looks as follows: The reset strategy resets its memory whenever a drop occurs, i.e. it keeps the memory since the most recent drop: De nition 4.3 (Reset Strategy). For any strategy we de ne the reset strategŷ as: By construction, the reset strategy has the property that if it is -optimal then it is also 2 -subgame-perfect.
Lemma 4.4. Let̂ be a reset strategy that is -optimal, then it is also 2 -subgame-perfect.
Proof. Let 0 ⋯ be a nite play, the goal is to show that: If there is a drop occurring in date , that is Λ( , by the de nition of a reset strategy that is -optimal. Assume then that the most recent drop is < , which means that: where = Λ( , )( 0 ⋯ −1 ). Towards a contradiction, assume that the goal (9) does not hold, i.e. there exists a strategy that gives payo strictly less than val( ) − 2 , then we will construct another strategy ′ that will contradict (10). Let D be the set pre xes from to the next ( , )-drop, that is and D the event that is generated by the cylinders in D (note that the complement ¬D is the event that no drop occurs). De ne ′ to be the strategy that plays like except when a pre x in D is met, in which case it switches to the -response strategy ′′ . To simplify the notation let: From the assumption that the goal does not hold we have the following inequality 1 In the equality the strategy 1 , respectively , has been replaced by 2 , respectively ′ because on in nite plays without a drop they coincide. For the rst term above we have: by de nition of the -optimal reset strategy and the fact that is shift-invariant. For the last term on the other hand we have: by construction of the -response strategy ′′ and ′ . The strategies 1 and 2 on one hand, and , ′ on the other, coincide up to the rst drop, consequently we can interchange them when measuring cylinders 0 ⋯ , which implies that the two inequalities above give: This contradicts (10) when plugged it in (11).
As a consequence of this lemma, in order to prove Theorem 1.2, we only have to demonstrate that there exists a reset strategy that is -optimal. In the rest of this section we will prove that there are strategies with more and more desirable properties, culminating in the proof that there is some whose reset strategy is -optimal.

Properties of the Reset Strategy
We will show that there is a strategy with the following properties: 1. is -optimal, 2. is locally optimal, 2 3. for any when playing witĥ and almost surely there are only nitely many ( , )-drops, and 4.̂ is -optimal.
We will do this in a manner that accumulates more structure, that is, for strategies with properties 1 and 2 we can prove the third property; and for strategies with all of the rst three properties it is possible to prove that the reset strategy is -optimal. Each subsection below corresponds to the proof of one of the last three properties (Property 1 is a consequence of Martin's theorem Theorem 2.4).
We are going to make use of some results from the theory of martingales 3 , which we introduce rst.

De nition 4.5 (Martingale). A sequence of real-valued random variables
It is called a supermartingale, respectively submartingale, if instead of the equality we have ≥, respectively ≤.
In our case the sequence val( 0 ), val( 1 ), … under suitable strategies will be a supermartingale, which will allow us to use in particular the following results. It follows from the de nition of martingales that for all ∈ ℕ, the expected value of is equal to the expected value of 0 . In other words, the process that is stopped at time is on average is equal to the process at time 0. The next theorem from martingale theory that we will make use of, has an analogous statement, namely that the process stopped at some random time is on average equal to the process stopped at time zero. This theorem is known as Doob's optional stopping theorem. See for example Section 10.10 in [Wil91]. We give a variant of this theorem.
De nition 4.7 (Stopping Time). A random variable taking values in ℕ ∪ {∞} is called a stopping time with respect to random variables 0 , 1 , … if the event { = } for ∈ ℕ is ( 0 , … , )-measurable, meaning that it depends only on the random variables 0 , … , .
Theorem 4.8 (Doob's Optional Stopping Theorem). Let be a stopping time with respect to the random variables 0 , 1 , … and ( ) ∈ℕ a uniformly bounded martingale such that for all ∈ ℕ, is ( 0 , … , )-measurable. De ne the random variable which represents the process stopped at time as: if is nite and equal to , Then the expectation of is equal to that of 0 . Analogous statements hold for supermartingales and submartingales.
Proof. The random variable is well-de ned as a consequence of Theorem 4.6. For every ∈ ℕ de ne: def = min( , ) .
The process ( ) ∈ℕ is a uniformly bounded martingale that converges almost-surely, as well. When the process is a supermartingale or a submartingale one can write an analogous proof.

Locally Optimal
An action is locally optimal if the average value of the successor states is equal to the value of the current state. Formally: A strategy that only plays locally optimal actions is called locally optimal.
The salient point is the following observation about the process val( 0 ), val( 1 ), … when players use locally optimal strategies. Observation 4.10. When Player 1 (respectively Player 2) uses a locally optimal strategy the process val( 0 ), val( 1 ), … is a supermartingale (respectively a submartingale).
This observation readily follows from the de nition above and the fact that the values are bounded.
One can get away with playing solely locally optimal actions in games with perfect information. In other words, suppose that the action 0 ∈ ( 0 ) (say belonging to Player 1) in game is not locally optimal, and denote by ′ the same game except that it does not have action 0 in state 0 . We will prove that the values of those two games coincide; this then clearly implies that Player 1 has -optimal strategies that are locally optimal as well. The analogue fact for Player 2 can be proved symmetrically.
Player 1 has less choice in ′ , so for every ∈ val( ′ )( ) ≤ val( )( ), hence we only have to prove the inverse inequality. Towards this end, we rst prove that: Let def = val( )( 0 ) − ∈ ( 0 , 0 , ) val( )( ) > 0, and the strategy that plays according to the strategy ′ that is -optimal in ′ -as long as the opponent does not choose the action 0 , in which case it switches de nitely to the strategy ′′ which is /2-optimal in . Let  be the event that the action 0 is never chosen, i.e.
De ne ( ) to be the event that the action 0 is about to be played by strategy , that is Let > 0 and for any strategy , de nẽ to be the strategy that plays like unless the latter is about to play the action 0 in 0 , in which case it switches to the strategy ′ which is -optimal in ′ . Set to be the strategy that plays according to some strategy ′ which is -optimal in ′ as long as the opponent does not play the action 0 , otherwise it switches to some strategy that is -optimal in . By de nitions of these strategies and (12) The strategies and̃ on one hand, and and ′ on the other, coincide up to the date when is about to play the action 0 , as a consequence: (14), for all and we have Since this holds for any > 0, taking the supremum over all proves (13).

Now by construction of the strategies and
We have thus proved that for all > 0, both players have strategies that are both locally optimal and -optimal.
We gather one more observation about games where at least one of the players utilises a locally optimal strategy. In this case, a stronger type of locally optimal action is the only one played in nitely many times. Proposition 4.12. For all strategies , at least one of which is locally optimal and ∈ we have ℙ , (for all but nitely many , is value-conserving in ) = 1.
Proof. Fix and and assume that is locally optimal, the other case is symmetrical. Suppose that 0 ∈ ( 0 ) is not value-conserving. It su ces to prove that the event {for in nitely many , = 0 and = 0 }, has measure zero. Assume towards a contradiction that the event above has non-zero probability, then the event which says that for in nitely many , we have = 0 , = 0 and +1 = also has non-zero probability; where ∈ is a successor state of 0 under 0 that has value strictly smaller than that of 0 (its existence is guaranteed because 0 is not value-conserving). This means that there is non-zero probability that for in nitely many , which contradicts Theorem 4.6, since (val( )), ∈ ℕ is a supermartingale as per Observation 4.10.
Similarly for events such as "there is a ( , )-drop" or "two ( , )-drops after date ". Our goal is to prove that for a reset strategy that is based on a that is both -optimal and locally optimal (which exists because of (15)) almost surely there will only be nitely many ( , )-drops. To this end x a > 0, and a strategy that is both locally optimal and -optimal, which allows us to simply say drop instead of ( , )-drop. The proof of the goal is relatively lengthy, however the idea and the plan is simple.
An intermediate fact that we have to prove is that when Player 1 plays with the reset strategy there is some ∈ ℕ such that the probability that there is a drop after date is bounded away from 1. This fact is easier to prove if we assume that the adversary is using a locally optimal strategy. Then Proposition 4.12 helps us lift this restriction on the strategies of Player 2. Therefore the plan is to prove this intermediate fact rst (1) for locally optimal strategies, then (2) for strategies that are locally optimal after date , and nally (3) for general strategies. The intermediate fact then nalises the goal of the preset section, that is when Player 1 plays with the reset strategŷ almost surely there will be only nitely many drops.
Lemma 4.13. There exists a > 0 such that for all and locally optimal , Proof. Let be the date of the rst drop, that is with the convention that min ∅ = ∞. Notice that is a stopping time with respect to the process (val( )), ∈ ℕ. Let ′ be a strategy that plays like as long as no drop occurs, and once it does it switches to the strategy ′′ that is a /2-optimal response. By construction, and ′ coincide on trajectories without drops so de ne: Replacing this inequality in the one above and decomposing the expectation of val( ) 4 we conclude that for all : where the expectation of val( ) is smaller than the val( ) for the following reason.
Since is a stopping time and ′ plays like before the rst drop, hence it plays locally optimal actions, consequently the process val( ), ∈ ℕ is a submartingale at least until the rst drop 5 , so we can apply Theorem 4.8. Finally from the inequality above we have: a uniform bound that does not depend on the choice of .
Next we approximate strategies by a sequence for every natural as follows. The strategies play like only up to date , otherwise they choose some locally optimal action, formally: ( 0 ⋯ ) if < or ( 0 ⋯ ) chooses locally optimal actions, some locally optimal action in otherwise.
Lemma 4.14. There is some > 0 such that for all strategies , and ∈ ℕ, we have ℙ̂ , (there is a drop after date ) ≤ 1 − .
Proof. For ∈ ℕ de ne the stopping time to be the date of the rst drop after the date , that is with the convention that min ∅ = ∞, and set 2 to be the date of the second drop after , that is . We prove that there is some > 0 such that for all ∈ ℕ, strategy and state we have The statement of the lemma then follows from (16) and sigma-additivity of measures.
De ne  to be the set of nite plays, strictly longer than , that are drops but such that they have no pre x longer than that is a drop. In other words  contains all the plays up to the rst drop after the date . Then by construction of the reset strategy: where in the last equality we have replaced the reset strategy by , because these two strategies are the same up to the rst drop. Since > , by construction the strategy [ 0 ⋯ ] is locally optimal, consequently applying Lemma 4.13 gives which when plugged into the equation above proves (16).
In the third lemma there is no restriction upon the strategy .
Lemma 4.15. For all strategies and there is some ∈ ℕ such that ℙ̂ , (there is a drop after date ) < 1.
Proof. Fix a strategy and a state . Let be the stopping time that gives the date of the last action that was played that is not value-conserving, if it exists, otherwise let it be ∞. Since the strategies and coincide on all paths where the last action that is not value-conserving is played before (that is on the event < ), then for all ∈ ℕ and events  we have: ℙ̂ , () = ℙ̂ , ( < ) ⋅ ℙ̂ , ( | < ) + ℙ̂ , ( ≥ ) ⋅ ℙ̂ , ( | ≥ ) .
The strategy has been assumed to be locally optimal, and therefore the strategŷ is locally optimal as well. As a consequence of Proposition 4.12 we have for any event . The proof of the lemma now concludes by choosing the event "there is a drop after date " for , a suitable natural number and applying Lemma 4.14.
This lemma makes it possible now to prove the third property of the strategy , namely that for all strategies and , ℙ̂ , (∃ no drops after date ) = 1.
Let be the stopping time that gives the date of the last drop, if it exists otherwise let it be equal to ∞. For a natural , let be the stopping time that gives the date of the rst drop after (same as in the proof of Lemma 4.14) if it exists, otherwise say that it is equal to ∞.
Fix > 0 and choose the strategỹ and statẽ such that The random variable ̃ is well-de ned because we are measuring the in nite plays where ̃ is nite; on the third equality we have used the de nition of the reset strategy and the last two (in)equalities we have used (18)   Since this holds for any > 0, (17) follows.

-Optimal
The last property of that we have to prove is that if we assume that it has the previous properties, namely that it is -optimal, locally optimal, and it has nitely many drops, then the reset strategŷ is -optimal as well. So x an > 0 and a strategy that is both locally optimal and -optimal, and for which (17) holds. We de ne for all naturals , strategieŝ that reset only up to date , and prove that they are -optimal rst. De ne T to be the function that truncates nite plays to length : The reset strategy that resets only up to date is then de ned as: where def = Λ( , ) (T ( 0 ⋯ )) .
Proof. The proof is by induction on . The base case is trivial sincê 0 = , therefore assume that the lemma is true for − 1, we prove that it is also true for . Namely we x a state and a strategy and prove that where in the second equality we have used the de nition of̂ and in the inequality the -optimality of . De ne the strategy ′ to be the strategy that plays like except if there is a drop at date , in which case it resets to a /2-response called ′′ . Then we have Now since the strategieŝ −1 and̂ on one hand, and strategies and ′ on the other, behave the same for all plays of length smaller than and on in nite plays where there is no drop at date , it follows that the right-most terms in the two inequalities above, as well as the factors ℙ , on the left are equal. Consequently we can combine the two inequalities above to conclude that This concludes the induction step and the proof of the lemma.
We now prove that̂

Remark on Optimal Strategies
Martin's theorem, Theorem 2.4 implies that the games that we are interested in have -optimal strategies for every > 0. We have then proved that there are locally optimal (2) strategies that are also -optimal (1). We then showed that for strategies with properties (1) and (2), we can prove that they also posses the properties (3) and (4), which respectively stated that there are nitely many drops and that the reset strategy is also -optimal. By inspection, in the proofs of in Section 4.1.2 and Section 4.1.3 respectively, the variable need not be strictly positive.
Since optimal strategies are necessarily locally optimal, the following lemma follows from Lemma 4.4. We can summarize our results for = 0 or > 0 as: Lemma 4.17. Let be a game equipped with a shift-invariant payo function. Let ≥ 0 be a non-negative real number and be an -optimal strategy in . Assume that is locally optimal. Then the reset strategŷ is 2 -subgame-perfect in .

Half-Positional Games
We prove the main theorem: Theorem 1.1. Games equipped with a payo function that is shift-invariant and submixing are half-positional.
Neither of the conditions in the statement is necessary, as we saw from the examples given in Section 3. Necessary and su cient conditions for positionality are known for deterministic games [GZ05]. However the shift-invariant and submixing conditions are general enough to recover several known classical results, and to provide several new examples of games with deterministic stationary optimal strategies. Before we proceed with the proof we remark: Remark 5.1. A symmetric proof to that of Theorem 1.1, the subject of this section, can be used to prove a statement like that of Theorem 1.1, where Player 1 is replaced by Player 2 and submixing is replaced by inverse-submixing. A corollary of this is that games with shift-invariant, submixing and inverse-submixing payo functions are positional.
Consider a game ful lling the conditions of the theorem. The proof proceeds by induction on the actions of the maximizer, that is on the quantity It proceeds by removing more and more actions of the maximizer and showing that at every step the value has not decreased, until we are left with a single choice from every state that belongs to the maximizer. The unique choice will then be the positional optimal strategy.
If ( ) = 0 there is no choice for maximizer, hence he has a deterministic and stationary optimal strategy. If ( ) > 0 there must be a statẽ ∈ such that Player 1 has at least two actions iñ , i.e. (̃ ) has at least two elements. We split the game in two strictly smaller subgames 1 and 2 .
De nition 5.2 (Split of a game). Let be a game with ( ) > 0 and̃ ∈ a state of controller by Player 1 in which there are at least two actions available, i.e. (̃ ) has at least two elements. Partition (̃ ) into two non-empty sets: 1 and 2 . Let 1 and 2 be the games obtained from by restricting the actions in the statẽ to 1 and 2 respectively. Then ( 1 , 2 ) is called a split of oñ .
The induction step relies on the two results stated in the next theorem. The rst result says that the value of̃ in the original game cannot be larger than that of the restricted games. The second result shows that Player 1 can play optimally in by selecting one of the subgames and play optimally in it. Proof of Theorem 1.1. The proof is by induction on ( ). If ( ) = 0 there is no choice for maximizer, hence he has a deterministic and stationary optimal strategy. If ( ) > 0 then we choose a split ( 1 , 2 ) of on a pivot statẽ . By symmetry, we can choose a split such that val( 1 )(̃ ) ≥ val( 2 )(̃ ). Then, according to (22) in Theorem 5.3, a strategy for Player 1 which is optimal in 1 is also optimal in . By induction hypothesis, there exists a positional optimal strategy in 1 , thus is half-positional.
The rest of the section is dedicated to the proof of Theorem 5.3. We x a game and a split ( 1 , 2 ) of on the statẽ . The inequality val( )(̃ ) ≥ max{val( 1 )(̃ ), val( 2 )(̃ )} Figure 1: The play ℎ is the concatenation of nite plays starting iñ , represented by blocks whose colours depend on the rst action played after̃ , blue if the action belongs to 1 and pink if it belongs to 2 . The projection ℎ 1 = 1 (ℎ) in 1 is the concatenation of the blue blocks while ℎ 2 = 2 (ℎ) is the concatenation of the pink blocks. The projections lose some information about the play in : swapping two contigous blocks of di erent colors in ℎ does not modify the projections ℎ 1 and ℎ 2 . is clear, since Player 1 has more choice in than he has in 1 and 2 . We witness the converse inequality with a strategy for Player 2, called the merge strategy, which merges two -subgame-perfect strategies in the respective smaller games. This is done in Section 5.3. The de nition of the merge strategy hinges on the projection of plays in the main game to plays in the restricted games 1 and 2 , which is done in section 5.1. Then we analyse the two possible outcomes: (a) after some date the play remains only in game 1 (or only in game 2 ), (b) the play switches in nitely often between the two smaller games. This analysis is performed in sections 5.4 and 5.5. For the latter case (b) we use the submixing property to show that Player 1 cannot get a better payo by switching between the two smaller games that he could get by staying in one of the subgames.

Projecting a play in to a couple of plays in the subgames
There is a natural way to project a play ℎ of the game starting iñ to a couple of plays ℎ 1 and ℎ 2 in the restricted games 1 and 2 respectively, starting from̃ as well. The two projections are computed simultaneously and inductively. Initially, ℎ =̃ and both projections ℎ 1 and ℎ 2 are also equal tõ . Each step of the play in is appended to either ℎ 1 or ℎ 2 , depending on the action played the last time the statẽ was visited: if belongs to 1 then the new step is appended to ℎ 1 , otherwise it is appended to ℎ 2 . The computation of ℎ 1 and ℎ 2 is illustrated on Figure 1. Formally, we de ne two maps 1 , 2 from nite plays in starting from̃ to nite plays in 1 and 2 respectively, starting from̃ as well. Let ℎ = 0 0 1 … be a nite play in starting iñ and ℎ a continuation of ℎ in , with one more transition ( , , ). Let last(ℎ ) be the action played in ℎ after the last visit tõ i.e. Then And 2 is de ned symmetrically with respect to 1 and 2 . This de nition can be extended to in nite plays in a natural way. Let ℎ = 0 0 1 … be an in nite play in starting iñ . Then 1 (ℎ) is the limit of the sequence ( 1 ( 0 0 1 … )) ∈ℕ .
The projection 1 (ℎ) can be either nite or in nite, depending whether the play ultimately stays in 2 or not. If after some time the last action chosen iñ is always in 2 , all subsequent moves in are appended to the projection in 2 , while the projection to 1 never gets updated and stays nite.

Linking the payo in to the payo s in the subgames
The payo in can be related to the payo in the subgames. We introduce the events If Stay ≥ ( 2 ) holds, we say that the play stays in 2 after step whereas if Stay ( 2 ) holds, we say that the play ultimately stays in 2 .
Those two events can be described equivalently as a non-update of the projection to 1 after some point. For that, we make use of the random variables: Recall that and are the random variables which output respectively the -th state and action when the play is 0 0 1 1 ⋯. We see that Π is simply the identity map outputing the play in while Π is essentially equivalent to , it is a random variable that maps the in nite play in game to its nite or in nite projection in game . Then  The following lemma shows that the payo in is tightly related to the payo s in the subgames 1 and 2 .
Proof. Since both projections in 1 and 2 cannot be nite at the same time then (Stay ( 1 ), Stay ( 2 ), Switch) is a partition of the in nite plays in . If Π 1 is nite then Π and Π 2 share an in nite su x and the pre x-independence of implies (23). The case where Π 2 is nite is symmetric, hence (24). If both Π 1 and Π 2 are in nite then the sequence of actions (last( 0 … +1 )) ∈ℕ switches in nitely often between 1 and 2 thus̃ is visited in nitely often. Moreover, in this case Π is a shu e of Π 1 and Π 2 and since is submixing, (25) follows.

The Merge Strategy
In light of Lemma 5.4, it is intuitively clear that to play well in , Player 2 has to play well in both subgames 1 and 2 . Fix > 0. The merge strategy for Player 2 is the composition of two strategies ♯ 1 and ♯ 2 for Player 2 in the subgames 1 and 2 respectively. We require ♯ 1 and ♯ 2 to be -subgame-perfect in the corresponding subgames; their existence is guaranteed by Theorem 1.2.
De nition 5.5. The merge strategy ♯ is a strategy in for Player 2 which ensures that Π 1 is consistent with ♯ 1 and Π 2 is consistent with ♯ 2 when the play starts from̃ . Let ℎ be a nite play in from̃ and ending in a state controlled by Player 2, then The merge strategy is well-de ned because if last(ℎ) ∈ 1 then both ℎ and 1 (ℎ) end with the same state, controlled by Player 2.
In the next two sections, we show that the merge strategy guarantees to Player 2 some upper-bounds on the expected payo s, which re ect the bounds given in Lemma 5.4 for payo s of individual plays.

On plays consistent with the merge strategy and ultimately staying in 2
In this section, we show that in case the play ultimately stays in 2 , then the expected payo is upper-bounded by val( 2 )(̃ ) + . For simplicity, we require to be small enough so that 2 does not select any value-increasing action, in the following sense.
Lemma 5.6. In 2 , x a state controlled by Player 2 and an action available in that state. Denote Then ( , ) ≥ 0.
In case ( , ) > 0 then is said to be value-increasing in . In that case, if moreover is strictly smaller that ( , ), then ♯ 2 never selects the action in a play ending in state .
Proof. Since the payo function is pre x-independent, and is controlled by Player 2, then ( , ) ≥ 0, because after Player 2 chooses in , he can proceed with an ′ -optimal strategy from the states such that ( , , ) > 0, for an arbitrary ′ > 0. Assume strictly smaller that ( , ). Then ♯ 2 never selects in state , otherwise this would contradict the -subgame perfection of 2 : Player 1 could proceed with some ( ( , ) − )/2-optimal strategy in 2 and get an expected payo strictly greater than val( 2 )( ) + .
Lemma 5.7. Assume that is pre x-independent and is small enough to guarantee that ♯ 2 never selects any value-increasing action. Let be a strategy for Player 1 in such that ℙ , ♯ Stay ( 2 ) > 0. Then Proof. The rst ingredient of the proof is the sequence of random variables ( ) ∈ℕ , where denotes the value in 2 of the last vertex of 2 ( 0 1 ⋯ ). Since the play starts in statẽ , 0 = val( 2 )(̃ ) . The value of does not change unless the projection of the play to 2 via 2 does. Since Π 2 is consistent with ♯ 2 and since ♯ 2 never selects any value-increasing action, ( ) ∈ℕ is a super martingale .
The second ingredient in the proof is a stopping time , de ned as follows. For every nite play ℎ = 0 0 … in starting iñ and consistent with and ♯ , denote (ℎ) = ℙ , ♯ Stay ≥ ( 2 ) | ℎ is a pre x of the play .
Fix some ′ > 0 and denote the stopping time with the usual convention min(∅) = ∞.
Since ( ) converges almost-surely to then = val( 2 )(̃ ). Finally, we prove (33). Denote ℎ the random variable de ned when is nite, which outputs the pre x of the play of length , i.e. ℎ = 0 0 … , and let ℎ such that ℙ , ♯ (ℎ = ℎ) > 0. Denote the last state of ℎ. Let 0 be strategy in 2 which coincides with [ℎ] as long as the play stays in 2 . Then: The rst equality holds because ℙ , ♯ (ℎ = ℎ) > 0 thus no strict pre x ℎ ′ of ℎ satis es (ℎ ′ ) ≥ 1 − , and if ℎ is a pre x of the play then ℎ = ℎ. The second equality holds by pre x-independence of . The rst inequality holds because (ℎ) ≥ 1 − ′ thus the strategies [ℎ] and 0 coincide with probability ≥ 1 − ′ , and when they do not the payo di erence is at most 2|| ||| ∞ . The third equality holds because [ℎ] coincides with ♯ 2 [ 2 (ℎ)] when the play stays in 2 . The second inequality is by -subgame optimality of ♯ 2 in 2 . The last equality holds by de nition of . Since this holds for every possible value ℎ of ℎ when < ∞, and there are at most countably many such values, the inequality (33) follows.

On plays consistent with the merge strategy and switching in nitely often between the two subgames
In this section, we provide an upper-bound on the payo of plays which switch in nitely often between 1 and 2 .
Lemma 5.8. Assume that is pre x-independent and submixing. For all strategies , where the rst and last equalities hold by de nition of the probability measure, the rst inequality by induction hypothesis and the second equality is by de nition of 1 . The second inequality (38) holds because the event ℎ 1 ≺ Π 1 is contained in the event ℎ 1 ⪯ Π 1 . This inclusion and the corresponding inequality might be strict: for example if Stay ≥0 ( 2 ) holds, i.e. if the play always stay in 2 , then the event̃ ≺ Π 1 has probability 0 while the event̃ ⪯ Π 1 has probability 1.
The rst equality is by de nition of the probability measure. The inequality is by induction hypothesis. The second equality holds because the event ℎ 1 ⪯ Π 1 is the disjoint union of the events (ℎ ′ ) ℎ ′ ∈ (ℎ 1 ) : if the projection of an in nite play ℎ to 1 starts with ℎ 1 , then there is a single pre x of this play in (ℎ 1 ), this is the shortest ( nite) pre x of ℎ whose projection in 1 is ℎ 1 . The last equality holds by a similar argument. The third equality is by de nition of ♯ . The fourth equality is by de nition of the probability measure. To show the fth equality, we establish (ℎ 1 ) = {ℎ ′ | ℎ ′ ∈ (ℎ 1 )}. We start with the inclusion {ℎ ′ | ℎ ′ ∈ (ℎ 1 )} ⊆ (ℎ 1 ). Let ℎ ′ ∈ (ℎ 1 ).
Observe that E is stable by nite disjoint unions, hence E contains all nite disjoint unions of cylinders, which forms a boolean algebra. Moreover E is a monotone class, so we can apply the monotone class theorem (see for example [Bil08,Theorem 3.4]). This implies that E contains the sigma-eld that is generated by cylinders, which by de nition is the set of all measurable events in the game 1 . This completes the proof of (36).
By de nition of  1 , this last equality is equivalent to (35) with = 1.

Proof of Theorem 5.3
Proof of Theorem 5.3. To prove the rst statement (21) in Theorem 5.3, we combine the two lemmas proved in the two previous sections in order to show: The bound (41) can be obtained as follows. According to Lemma 5.4, the three events Stay ( 1 ), Stay ( 2 ) and Switch partition the set of in nite plays. In case Stay ( 1 ) occurs, Lemma 5.7 guarantees that the expected payo is no more than val( 1 )(̃ )+ . By symmetry, in case Stay ( 2 ) occurs, the expected payo is no more than val( 2 )(̃ ) + . And in case Switch occurs, Lemma 5.8 guarantees that the payo is almost-surely no more than max{val( 1 )(̃ ), val( 2 )(̃ )} + . Thus (41) holds. The inequality val( )(̃ ) ≥ max{val( 1 )(̃ ), val( 2 )(̃ )} is clear, since Player 1 has more choice in than he has in 1 and 2 . And can be chosen arbitrarily small in (41), hence the rst statement (21) of Theorem 5.3.
According to (21), we already now that this equality holds for̃ , and we shall extend it to all states ∈ . Recall that the merge strategy was de ned only for plays that start in statẽ ; we enlarge this de nition, pro ting from the assumption (42). First, extend the de nition of last(ℎ) to any play ℎ that has visited̃ at least once, in which case last(ℎ) denotes the action that is played right after the last visit of ℎ tõ . Second, for all nite plays ℎ that end in a state controlled by Player 2, ♯ (ℎ) def = ♯ 1 ( 1 (ℎ)) if ℎ never visited̃ or last(ℎ) ∈ 1 ♯ 2 ( 2 (ℎ)) if ℎ has visited̃ at least once and last(ℎ) ∈ 2 .
The merge strategy is well-de ned because if ℎ never visited̃ or if last(ℎ) ∈ 1 then both ℎ and 1 (ℎ) end with the same state, controlled by Player 2. And if ℎ has visited̃ at least once and last(ℎ) ∈ 2 then both ℎ and 2 (ℎ) end with the same state, controlled by Player 2.
We prove that ♯ guarantees a payo smaller than val( 1 )( ) + for every state . Fix a strategy for Player 1 in , and de ne ′ to be the strategy that plays like as long as the play does not reach the pivot statẽ . Whenever the pivot state is reached, the strategy ′ switches de nitively to a strategy ♯ 1 that is optimal in the game 1 , whose existence is guaranteed by the induction hypothesis. The plays consistent with ′ and ♯ stay in the subgame 1 . Since ♯ coincides with ♯ 1 on plays staying in 1 , and since ♯ 1 is -optimal in 1 , we can write for all ∈ : Let ℎ be a nite play that is consistent with and ♯ , whose last state is̃ and which does not visit̃ before the last step. Then The rst and third equalities hold because is pre x-independent. The rst inequality holds because the strategy ♯ [ℎ] is -optimal from statẽ , for the following reason.

From One-player Games to Two-player Games
The construction of the merge strategy in the previous section reveals that games that are equipped with shift-invariant and submixing payo s have the following interesting property. While they yield very simple optimal strategies for Player 1, they allow his opponent to recombine strategies that work for one-player games (also known as Markov decision processes) and use them in a two-player game. A general result allows to lift the existence of -optimal strategies from  in one-player games to two-player games.
An arena is said to be fully controlled by the minimizer if all states are controlled by Player 2. Fix a payo function that is both shift-invariant and submixing.
De nition 6.1. Let  be a class of strategies for minimizer.
Say that the class  is stable by the reset operation if for every game equipped with and every strategy of minimizer in , if belongs to  then the reset strategŷ belongs to  as well.
Say that the class  is stable by the merge operation if for every game equipped with , for every split ( 1 , 2 ) of and for every strategies ♯ 1 and ♯ 2 in 1 and 2 , if both ♯ 1 and ♯ 2 belong to  then their merge ♯ belongs to  as well. Like in Proposition 7.1, say that the arena  ′ is a restriction of the arena  if one gets  ′ from  by erasing some actions from some states.
Theorem 6.2. Let a shift-invariant payo function, a family of arenas that are closed under restrictions and a family of strategies for minimizer which are stable by both reset and merge operations.
Assume that in every game ( , ) with ∈ that is fully controlled by the minimizer, for every > 0 the minimizer has an -optimal strategy that belongs to . Then, in every two-player game ( , ) with ∈ , the minimizer has an -subgame perfect strategy that belongs to .
The statement holds for = 0 as well, that is: assume that in all games ( , ) with ∈ that is fully controlled by the minimizer, the minimizer has an optimal strategy that belongs to . Then, in every two-player game ( , ) with ∈ , the minimizer has a subgame perfect strategy that belongs to .
Proof. Let = ( , ) with ∈ . The proof of both statements is by induction on ( ), as in the proof of the main theorem in the previous section. The base of the induction follows from the assumption about games fully controlled by the minimizer, since we can give to the minimizer the control of states in which the maximizer has a single action, without changing the value of the game.
When ( ) > 0, the induction step is performed using a split ( 1 , 2 ) of on a pivot statẽ . Remark that both arenas belong to , therefore the induction hypothesis for the rst (resp. the second) statement says that for every > 0, there are two strategies ♯ 1 and ♯ 2 in the games 1 and 2 , respectively, which belong to  and are -subgame perfect (resp. subgame perfect) in their respective subgames. Since  is stable by the merge operation, then the strategy ♯ obtained by merging ♯ 1 and ♯ 2 also belongs to .
We carry over the induction step for the rst (resp. the second) statement. According to Observation 5.9, ♯ is 2 -optimal (resp. is optimal). We apply Lemma 4.17 to ♯ which guarantees that the reset strategy obtained from ♯ is 4 -optimal (resp. is optimal). Moreover by hypothesis this strategy belongs to .

The Finite Memory Transfer Theorem
We give the proof of Theorem 1.3 that was announced in the introduction. Theorem 1.3. Let be a payo function that is both shift-invariant and submixing.
Assume that in all games equipped with and fully controlled by the minimizer, for every > 0, the minimizer has an -optimal strategy with nite memory. Then in every (two-player) game, for every > 0, the minimizer has an -subgame-perfect strategy that has nite memory.
The statement also holds for = 0, that is: if the minimizer has an optimal strategy with nite memory in every game that he fully controls, then in every (two-player) game as well he has a subgame-perfect strategy with nite memory. Theorem 1.3 follows from Theorem 6.2 and the following results, which establish that the class of nite-memory strategies is stable by the reset (Proposition 7.1) and merge (Lemma 7.2) operations.
Proposition 7.1. Let be a family of arenas that are closed under restrictions and a shift-invariant payo function. If for games whose arena is in and whose payo function is , and for every > 0, Player 1 (respectively Player 2) has an -optimal strategies with nite memory, then he also has an -subgame-perfect strategies with nite memory, namely the reset strategieŝ . This holds as well for optimal strategies, i.e. if = 0.
Proof. Let  ∈ be an arena. Remove the actions of Player 1 that are not locally optimal (with respect to the payo function ) to get a restriction  ′ . From the hypothesis, it follows that there are -optimal strategies in ( ′ , ) that have nite memory, and consequently there are -optimal strategies in (, ) that have nite memory and are locally optimal. According to Lemma 4.17, the strategŷ is 2subgame-perfect, and Proposition 7.3 implies that it has nite memory.
Lemma 7.2. Let ( 1 , 2 ) be a split of a game on a pivot statẽ . Let ♯ 1 and ♯ 2 two strategies for Player 2 in 1 and 2 , respectively. If both ♯ 1 and ♯ 2 have nite-memory then ♯ has nite memory as well.
Proof. The strategies ♯ 1 and ♯ 1 with nite memory are given by the transducers: ( 1 , init 1 , up 1 , out 1 ) and ( 2 , init 2 , up 2 , out 2 ), for Player 2 in 1 and 2 respectively. The strategy ♯ obtained by merging ♯ 1 and ♯ 2 is also a nite-memory strategy, whose memory is The initial memory state in state is (1, init 1 ( ), init 2 ( )). The updates on the components  1 and  2 are performed with up 1 and up 2 respectively. The rst component is updated only when the play leaves the pivot statẽ ; it is switched to 1 or 2 depending whether Player 1 chooses an action in 1 or 2 . The choice of action, or the output, depends on the rst component: in memory state ( , 1 , 2 ) the action played by ♯ is out ( ).
The nite-memory property is preserved when passing from tô that is if the strategy has nite memory to begin with, so will the strategŷ . First we de ne precisely what we mean by nite memory strategy.
A strategy is said to have nite memory if it is given using a transducer, namely it is a tuple: .
The map init and up are used to initialise the memory and update it, as the game unfolds: after the nite play 0 0 ⋯ has unfolded, the transducer reaches the memory state ∈  which is de ned inductively as: The output function is used to choose the action that the strategy plays, i.e.
Proposition 7.3. Let > 0. If is a nite memory strategy and is -optimal then the -reset of has nite-memory as well.
Proof. Let be a nite memory strategy, that is given by the tuple (, init, up, out), and let be such that is -optimal, which xes a reset strategŷ . Without loss of generality we can assume that the strategy is such that its memory state identi es the current state in the game, in other words assume that  can be partitioned into: such that for any nite play 0 ⋯ , if 1 , … , is the sequence of memory states of the transducer of during this play, then ∈  .
We gather the subset of memory states where drops occur as follows. For ∈ and ∈  , denote by the strategy that is the same as except that the initial memory state for is instead of init( ). De ne the subset of memory states where drops occur  ⊂  as  def = { ∈  ∶ ∈ and is not 2 -optimal from state }.
Construct the nite memory strategy ′ that avoids the memory states in  as follows.
For any ∈ and ∈  ∩ , since is -optimal, ≠ init( ). In the strategy ′ modify the function up in such a way that all the transitions that lead to are redirected to the state init( ) instead (the memory is reset). Do this simultaneously for any pair ( , ) as above. Comparing the de nition of̂ and ′ we conclude that they coincide.
On the size of the memory. How large is the memory  needed by Player 2 to play optimally in some = (, )? Every deterministic and stationary strategy for Player 1 in induces a game that is controlled by Player 2. Let M be the maximal memory size required by Player 2 to play optimally in the games . According to the proof of the theorem above, the memory  needed by Player 2 to play optimally in is of size 2 ⋅ | 1 | ⋅ | 2 |. By induction we derive the following bound: | | ≤ (2M) 2 ∑ | ( )| .
When M = 1, i.e. when Player 2 has deterministic and stationary strategies in games he controls, then in [GZ05] it is shown that the same holds for two player games as well, hence the upper-bound can be downsized to 1. In the general case where M ≥ 2, we do not have examples where the memory size required by Player 2 to play optimally has the same order of magnitude as the upper bound above.