1 Introduction

The games that we study are played between two players on a finite graph. Every vertex of the graph belongs to one of the players, the one that decides which edge should be taken next. The result of such a play is an infinite path in the graph. The objective of the game is given using a payoff function, which maps infinite paths to real numbers. The maximizer or Player 1, wants to maximize the payoff, while his adversary (the minimizer) wants the opposite.

The study of such games has been an active area of research for a few decades, in a variety communities; especially in that of theoretical computer science and economics. They are used to model simplified adversarial (zero-sum) situations. In computer science they are used in verifying properties of systems, but also as a very beneficial theoretical tool in logic and automata theory.

In this paper we consider stochastic games, a more general model where in every step, after an action is chosen, there is a probability distribution on the set of vertices according to which the next vertex is chosen. In this scenario, Player 1 wants to maximize the expected payoff, and his adversary to minimize it.

Well-known examples of games played on graphs are the discounted games, mean-payoff games, games equipped with the limsup payoff function and parity games. These four classes of games share a common property: both players have very simple optimal strategies, namely optimal strategies that are both deterministic and stationary. These are strategies that guarantee maximal expected payoff and choose actions deterministically (without randomisation) and this deterministic choice depends only on the current vertex (it does not use memory). When games admit such strategies for the maximizer they are called half-positional, when they admit such strategies for both players they are called positional. This property is highly desirable and it is often the starting point for further algorithmic analysis.

The broad purpose of the present paper is to study what is the common quality of games that makes it possible for them to admit deterministic and stationary optimal strategies.

1.1 Context

There have been numerous papers about the existence of deterministic and stationary optimal strategies in games with different payoff functions. Shapley proved that stochastic games with discounted payoff function are positional using an operator approach (Shapley 1953). Derman showed the positionality of one-player games with expected mean-payoff reward, using an Abelian theorem and a reduction to discounted games (Derman 1962). Gilette extended Derman’s result to two-player games (Gilette 1957) but his proof was found to be wrong and corrected by Liggett and Lippman (1969). The positionality of one-player parity games was addressed in Courcoubetis and Yannakakis (1990) and later on extended to two-player games in Chatterejee et al. (2003); Zielonka (2004). Counter games were extensively studied in Brázdil et al. (2010) and several examples of positional counter games are given. There are also several examples of one-player and two-player positional games in Gimbert (2007); Zielonka (2010). A whole zoology of half-positional games is presented in Kopczynski (2009) and another example is given by mean-payoff co-Büchi games (Chatterjee et al. 2005). The proofs of these various results are quite heterogeneous, making it difficult to find a common property that explains why they are positional or half-positional.

Some effort has been made to better understand conditions that make games (half) positional, which has made apparent that payoff functions that are shift-invariant and submixing play a crucial role. Our contributions lie in this direction.

1.2 Contributions

The results of the present paper can be summarised as follows.

First, the main theorem says that a sufficient condition for the game to be half-positional is for the payoff function to be shift-invariant and submixing. We give an informal explanation of this condition. Payoff functions f map infinite paths of the graph

$$\begin{aligned} s_0s_1s_2s_3\cdots \end{aligned}$$

to real numbers. A payoff function is shift-invariant if it does not depend on finite prefixes, in other words

$$\begin{aligned} f(p\ s_0s_1s_2s_3\cdots ) = f(s_0s_1s_2s_3\cdots ), \end{aligned}$$

for any finite prefix p, i.e. we can shift the trajectory to the left without changing the payoff. A payoff function is submixing on the other hand, if for any two infinite paths

$$\begin{aligned}&{s_0s_1s_2s_3\cdots }\\&{t_0t_1t_2t_3\cdots } \end{aligned}$$

shuffling (or combining) them such as

$$\begin{aligned} {s_0s_1s_2}{} & {} {s_3s_4}{} & {} {s_5s_6s_7s_8} \cdots \\&{t_0t_1}{} & {} {t_2t_3t_4t_5t_6}{} & {} {t_7t_8} \cdots \end{aligned}$$

does not give better payoff, that is:

$$\begin{aligned} f({s_0s_1s_2}{t_0t_1}{s_3s_4}{t_2t_3t_4t_5t_6}{s_5s_6s_7s_8}{t_7t_8}\cdots ) \le \max \{f({s_0s_1s_2\cdots }),f({t_0t_1t_2\cdots })\}. \end{aligned}$$

Theorem 1.1

Games equipped with a payoff function that is shift-invariant and submixing are half-positional.

As mentioned above, half-positional games are those where the maximizer has a simple kind of strategy that is optimal. There is nothing special about this player, if instead of the submixing condition, we define an “inverse” submixing condition, namely one that requires that the combined payoff is larger than the minimum of the parts, we would have an analogous theorem that proves the existence of simple optimal strategies for the minimizer. Furthermore there are payoff functions for which both versions of the submixing condition hold, and for these games the theorem proves positionality. The conditions in the statement of the theorem are not necessary; we will provide examples and discuss this fact.

The conclusion of the theorem, however, cannot be made stronger in the following sense: There are examples of games equipped with shift-invariant and submixing payoff functions where the minimizer does not have simple optimal strategies. For instance, consider the game with only two states st both controlled by the minimizer, and actions such that the underlying graph is a complete directed graph. The minimizer wins the game if and only if both states are visited infinitely often, and furthermore the blocks of contiguous visits to s have unbounded length. In other words, if the outcome is:

$$\begin{aligned} \underbrace{s\cdots s}_{n_1\text { times }}\ \ \ \underbrace{t\cdots t}_{m_1\text { times }}\ \ \ \underbrace{s\cdots s}_{n_2\text { times }}\cdots , \end{aligned}$$

then the sequence \(n_1,n_2,\ldots\) has to be unbounded for the minimizer to win. The payoff function described above is shift-invariant and submixing, yet the optimal strategy of the minimizer either has to use randomisation, or infinite memory to remember how long was the longest block of contiguous visits to s.

The proof of Theorem 1.1 is by induction on number of edges, it uses Lévy’s 0-1 law, as well as the following crucial property of the games under consideration. Namely that games equipped with a payoff function that is both bounded and Borel-measurable admit \(\epsilon\)-subgame-perfect strategies, for every \(\epsilon >0\). A proof of this fact can be found in Mashiah-Yaakovi (2015).

A second contribution comes as a corollary of the techniques developed for the main theorem. It is a transfer-type theorem that lifts the existence of optimal finite-memory strategies in one-player games (also known as Markov decision processes, or MDPs) to the same for two-player games.

Theorem 1.2

Let f be a payoff function that is both shift-invariant and submixing. Assume that in all games equipped with f and fully controlled by the minimizer, the minimizer has optimal strategies with finite memory. Then the minimizer has the same in all games equipped with f.

Furthermore this theorem is proved by effectively constructing the optimal strategy which calls the optimal strategies in one-player games, thereby showing how to make optimal strategies for the minimizer in case of submixing payoffs, and maximizer in case of inverse-submixing payoffs in two-player games, by reusing optimal strategies in Markov decision processes, or one-player games.

1.3 Related work

For one-player games it was proved by the first author that every one-player game equipped with a payoff function that is both shift-invariant and submixing is positional (Gimbert 2007). This result was successfully used in Brázdil et al. (2010) to prove positionality of counter games. A weaker form of this condition was presented in Gimbert and Zielonka (2004) to prove positionality of deterministic games (i.e. games where transition probabilities are equal to 0 or 1, not stochastic). Kopczynski proved that two-player deterministic games equipped with a shift-invariant and submixing payoff function that takes only two values is half-positional (Kopczynski 2006).

A result of Zielonka (2010) provides a necessary and sufficient condition for the positionality of one-player games. The condition is expressed in terms of the existence of particular optimal strategies in multi-armed bandit games. When trying to prove the positionality for a particular payoff function, the condition in Zielonka (2010) is harder to check than the submixing property which is purely syntactic.

Some results on finite-memory determinacy have been obtained in Bouyer et al. (2020), with different requirements: the size of the memory should be independent from the arena, whereas in this paper we do not make such an assumption.

The pre-print version of this present paper (Gimbert and Kelmendi 2014) has already been used in a number of works, mostly pertaining to algorithmic game theory community. We mention the papers that we are aware of. In Chatterjee and Doyen (2016), Chatterjee and Doyen study payoff functions that are a conjunction of mean-payoff objectives, and prove that they are in co-NP for finite-memory strategies. They use Theorem 1.1; and for Theorem 4.1 they observe that in the special case of finite-memory strategies there is a simple combinatorial proof, which bypasses the use of martingale theory. In Basset et al. (2018) the authors consider arbitrary boolean combination of expected mean-payoff objectives and the main theorem of the present paper appears as Theorem 1, and is the starting point of their further algorithmic analysis. Games played on finite graphs where the information flow is perturbed by non-deterministic signalling delays are considered in Berwanger and van den Bogaard (2015), where submixing and shift-invariant payoff functions play a central rôle. Our result is also used by Mayr, Schewe, Totzke and Wojtczak on their proof of the fact that games with energy-parity objectives and almost-sure semantics lie in NP \(\cap\) co-NP (Mayr et al. 2021).

1.4 Organisation of the paper

We fix the notation and give the relevant definitions in Sect. 2, where one can also find an overview of the proof. We give examples of shift-invariant and submixing payoff functions in Sect. 3, as well as show how the Theorem 1.1 can be used to recover numerous classical determinacy results. In Sect. 4, we define reset strategies as a method of obtaining \(\epsilon\)-subgame-perfect strategies, which exist due to Theorem 4.1. The proof of the main theorem, Theorem 1.1, is given in Sect. 5, and that of the transfer theorem for finite-memory strategies, Theorem 1.2, in Sect. 6.

2 Preliminaries

The purpose of this section is to introduce the basic notions that we need about stochastic games with perfect information, that is the definitions of: games, payoff functions, strategies and values.

2.1 Games

A game is specified by the arena and the payoff function. While the arena determines how the game is played, the payoff function specifies the objectives that the players want to reach.

We use the following notations throughout the paper. Let \(\textbf{S}\) be a finite set. The set of finite (respectively infinite) sequences on \(\textbf{S}\) is denoted \(\textbf{S}^*\) (respectively \(\textbf{S}^\omega\)). A probability distribution on \(\textbf{S}\) is a function \(\delta : \textbf{S}\rightarrow [0,1]\) such that \(\sum _{s\in \textbf{S}} \delta (s) =1\). The set of probability distributions on \(\textbf{S}\), we denote by \(\Delta (\textbf{S})\).

Definition 2.1

(Arena) A stochastic arena with perfect information is a tuple:

$$\begin{aligned} \left( \textbf{S},\ \textbf{S}_1,\ \textbf{S}_2,\ \textbf{A},\ \left( \textbf{A}(s)\right) _{s\in \textbf{S}},\ p\right) \end{aligned}$$

where

  • \(\textbf{S}\) is a finite set of states (that is nodes of the graph) partitioned in two sets \((\textbf{S}_1,\textbf{S}_2)\),

  • \(\textbf{A}\) is a finite set of actions,

  • for each state \(s\in \textbf{S}\), a non-empty set \(\textbf{A}(s)\subseteq \textbf{A}\) of actions available in s,

  • and transition probabilities \(p:\textbf{S}\times \textbf{A}\rightarrow \Delta (\textbf{S})\).

An arena is fully controlled by the minimizer if \(\textbf{A}(s)\) is a singleton for every \(s\in \textbf{S}_1\).

An infinite play in an arena \(\mathcal {A}\) is an infinite sequence \(p=s_0a_1s_1a_2\cdots \in (\textbf{S}\textbf{A})^\omega\) such that for every \(n\in {\mathbb {N}}\), \(a_{n+1}\in \textbf{A}(s_n)\). A finite play in \(\mathcal {A}\) is a finite sequence in \(\textbf{S}(\textbf{A}\textbf{S})^*\) which is the prefix of an infinite play.

With each infinite play is associated a payoff computed by a payoff function. Player 1 (the maximizer) wants to maximize the expected payoff while Player 2 (the minimizer) has the exact opposite preference. Formally, a payoff function for the arena \(\mathcal {A}\) is a bounded and Borel-measurable function

$$\begin{aligned} f:(\textbf{S}\textbf{A})^\omega \rightarrow {\mathbb {R}}\end{aligned}$$

which associates with each infinite play h a payoff \(f(h)\).

Definition 2.2

(Stochastic game with perfect information) A stochastic game with perfect information is a pair

$$\begin{aligned} (\mathcal {A},f) \end{aligned}$$

where \(\mathcal {A}\) is an arena and \(f\) a payoff function for the arena \(\mathcal {A}\).

2.2 Strategies

A strategy in an arena \(\mathcal {A}\) for Player 1 is a function

$$\begin{aligned} \sigma \ :\ (\textbf{S}\textbf{A})^*\textbf{S}_1 \rightarrow \Delta (\textbf{A}) \end{aligned}$$

such that for any finite play \(s_0a_1\cdots s_n\), and every action \(a\in \textbf{A}\), if \(\sigma (s_0a_1\cdots s_n)(a) > 0\) then the action a belongs to \(\textbf{A}(s_n)\), i.e. the played action is available. Strategies for Player 2 are defined similarly and are typically denoted \(\tau\). General strategies can have infinite memory as well as randomise among the available actions at every step. We are interested in a very simple sub-class of strategies, namely those that do not use any memory, or randomisation.

Definition 2.3

(Deterministic and stationary strategies) A strategy \(\sigma\) for Player 1 is deterministic if for every finite play \(h\in (\textbf{S}\textbf{A})^*\textbf{S}_1\) and action \(a\in \textbf{A}\),

$$\begin{aligned} \sigma (h)(a)>0 \qquad \Leftrightarrow \qquad \sigma (h)(a)=1. \end{aligned}$$

A strategy \(\sigma\) is stationary if \(\sigma (h)\) only depends on the last state of h. In other words \(\sigma\) is stationary if for every state \(t\in \textbf{S}_1\) and for every finite play \(h=s_0a_1\cdots a_k t\),

$$\begin{aligned} \sigma (h)=\sigma (t). \end{aligned}$$

Given an initial state \(s\in \textbf{S}\) and strategies \(\sigma\) and \(\tau\) for players 1 and 2 respectively, the set of infinite plays that start at state s is naturally equipped with a sigma-field and a probability measure denoted \({\mathbb {P}}_s^{\sigma ,\tau }\) that are defined as follows. Given a finite play h and an action a, the set of infinite plays \(h(\textbf{A}\textbf{S})^\omega\) and \(ha(\textbf{S}\textbf{A})^\omega\) are cylinders that we abusively denote h and ha. The sigma-field is the one generated by cylinders and \({\mathbb {P}}_s^{\sigma ,\tau }\) is the unique probability measure on the set of infinite plays that start at s such that for every finite play h that ends in state t, for every action \(a\in \textbf{A}\) and state \(r\in \textbf{S}\),

$$\begin{aligned} {\mathbb {P}}_{s}^{\sigma ,\tau }\left( {ha \mid h}\right)&= {\left\{ \begin{array}{ll} \sigma (h)(a)&{} \hbox { if}\ t\in \textbf{S}_1,\\ \tau (h)(a) &{} \hbox { if}\ t\in \textbf{S}_2,\\ \end{array}\right. } \end{aligned}$$
(1)
$$\begin{aligned} {\mathbb {P}}_{s}^{\sigma ,\tau }\left( {har \mid ha}\right)&=p\left( t,a,r\right) . \end{aligned}$$
(2)

For \(n\in {\mathbb {N}}\), we denote \(S_n\) and \(A_n\) the random variables defined by

$$\begin{aligned} S_n(s_0a_1s_1\cdots )&\mathop {=}\limits ^{\textsf {def}}s_n,\\ A_n(s_0a_1s_1\cdots )&\mathop {=}\limits ^{\textsf {def}}a_n. \end{aligned}$$

2.3 Values and optimal strategies

Let \(\textbf{G}\) be a game with a bounded measurable payoff function \(f\). The expected payoff associated with an initial state s and two strategies \(\sigma\) and \(\tau\) is the expected value of \(f\) under \({\mathbb {P}}_s^{\sigma ,\tau }\), denoted \({\mathbb {E}}_{s}^{\sigma ,\tau }\left[ f\right]\). The maxmin and minmax values of a state \(s\in \textbf{S}\) in the game \(\textbf{G}\) are:

$$\begin{aligned} {{\,\textrm{maxmin}\,}}(\textbf{G})(s)&\mathop {=}\limits ^{\textsf {def}}\sup _\sigma \inf _\tau {\mathbb {E}}_{s}^{\sigma ,\tau }\left[ f\right] ,\\ {{\,\textrm{minmax}\,}}(\textbf{G})(s)&\mathop {=}\limits ^{\textsf {def}}\inf _\tau \sup _\sigma {\mathbb {E}}_{s}^{\sigma ,\tau }\left[ f\right] . \end{aligned}$$

By definition of \({{\,\textrm{maxmin}\,}}\) and \({{\,\textrm{minmax}\,}}\), for every state \(s\in \textbf{S}\), \({{\,\textrm{maxmin}\,}}(\textbf{G})(s) \le {{\,\textrm{minmax}\,}}(\textbf{G})(s)\). As a corollary of the Martin’s determinacy theorem for Blackwell games (Martin 1998, Section 1), the converse inequality holds as well:

Theorem 2.4

(Martin’s second determinacy theorem, (Martin 1998, Section 1)) Let \(\textbf{G}\) be a game with a Borel-measurable and bounded payoff function \(f\). Then for every state \(s\in \textbf{S}\):

$$\begin{aligned} {{\,\textrm{val}\,}}(\textbf{G})(s)\mathop {=}\limits ^{\textsf {def}}{{\,\textrm{maxmin}\,}}(\textbf{G})(s)= {{\,\textrm{minmax}\,}}(\textbf{G})(s). \end{aligned}$$

This common value is called the value of state s in the game \(\textbf{G}\) and denoted \({{\,\textrm{val}\,}}(\textbf{G})(s)\).

The existence of a value guarantees the existence of \(\epsilon\)-optimal strategies for both players and every \(\epsilon >0\).

Definition 2.5

(Optimal and \(\epsilon\)-optimal strategies) Let \(\textbf{G}\) be a game, \(\epsilon >0\) and \(\sigma\) a strategy for Player 1. Then \(\sigma\) is \(\epsilon\)-optimal if for every strategy \(\tau\) and every state \(s\in \textbf{S}\),

$$\begin{aligned} {\mathbb {E}}_{s}^{\sigma ,\tau }\left[ f\right] \ge {{\,\textrm{minmax}\,}}(\textbf{G})(s) - \epsilon . \end{aligned}$$

The definition for Player 2 is symmetric. A 0-optimal strategy is simply called optimal.

A stronger class of \(\epsilon\)-optimal strategies are \(\epsilon\)-subgame-perfect strategies, which are strategies that are not only \(\epsilon\)-optimal from the initial state s but stay \(\epsilon\)-optimal throughout the game. More precisely, given a finite play \(h=s_0\cdots s_n\) and a function g whose domain is the set of (in)finite plays, by g[h] we denote the function g shifted by h:

$$\begin{aligned} g[h](t_0a_1t_1\cdots )\mathop {=}\limits ^{\textsf {def}}{\left\{ \begin{array}{ll} g(h a_1t_1\cdots ) &{}\hbox { if}\ s_n=t_0,\\ g(t_0a_1t_1\cdots ) &{}\text { otherwise.} \end{array}\right. } \end{aligned}$$

Definition 2.6

(\(\epsilon\)-Subgame-perfect strategy) Let \(\textbf{G}\) be a game equipped with a payoff function \(f\). A strategy \({\hat{\sigma }}\) for Player 1 is said to be \(\epsilon\)-subgame-perfect if for every finite play \(h:=s_0\cdots s_n\),

$$\begin{aligned} \inf _\tau {\mathbb {E}}_{s_n}^{{\hat{\sigma [h]}},\tau }\left[ f[h]\right] \ge \sup _{\sigma }\inf _\tau {\mathbb {E}}_{s_n}^{\sigma ,\tau }\left[ f[h]\right] -\epsilon . \end{aligned}$$

2.4 Shift-invariant and submixing

Without loss of generality we can assume that there is a finite set \({\textbf{C}}\) (colours assigned to the states of the game) such that the payoff function f is a function

$$\begin{aligned} f\ :\ {\textbf{C}}^\omega \rightarrow {\mathbb {R}}, \end{aligned}$$

that is Borel-measurable and bounded. We define the two conditions with respect to such payoff functions.

Definition 2.7

(Shift-invariant) The payoff function f is shift-invariant if and only if for all finite prefixes \(p\in {\textbf{C}}^*\) and trajectories \(u\in {\textbf{C}}^\omega\),

$$\begin{aligned} f(p\ u)=f(u). \end{aligned}$$

Note that shift-invariance is a stronger condition than saying: if one can get \(u'\in {\textbf{C}}^\omega\) from \(u\in {\textbf{C}}^\omega\) by replacing finitely many letters then \(f(u)=f(u')\). Sometimes in the literature this stronger condition is called “prefix-independent” or “tail-measurable”. Intuitively shift-invariant payoff functions are such that they only measure asymptotic properties, and do not talk about indices.

A factorisation of \(u\in {\textbf{C}}^\omega\) is a sequence \(u_1,u_2,\ldots\) of non-empty finite words (i.e. elements of \({\textbf{C}}^+\)) such that

$$\begin{aligned} u=u_1u_2u_3\cdots . \end{aligned}$$

For \({u},{v}, w\in {\textbf{C}}^\omega\), we say that w is a shuffle of u and v if there are respective factorisations \({ u_1},{u_2},\ldots\), and \({v_1},{ v_2},\ldots\) such that

$$\begin{aligned} w={u_1}\ {v_1}\ {u_2}\ {v_2}\cdots . \end{aligned}$$

Definition 2.8

(Submixing) The payoff function f is submixing if and only if for all \({u}, { v}, w\in {\textbf{C}}^\omega\) such that w is a shuffle of u and v we have

$$\begin{aligned} f(w)\le \max \{f({ u}), f({v})\}. \end{aligned}$$

The submixing condition says that one cannot shuffle two losing trajectories to make a winning one. This requirement simplifies the kind of strategies that the players need.

The submixing condition is not symmetric over the players, and it implies different results for different players (notice the difference between Theorems 1.1 and 1.2). We define the inverse-submixing condition which is its reflection about the players:

Definition 2.9

(Inverse-submixing) The payoff function f is inverse-submixing if and only if for all \({u}, {v}, w\in {\textbf{C}}^\omega\) such that w is a shuffle of u and v we have

$$\begin{aligned} f(w)\ge \min \{f({u}), f({v})\}. \end{aligned}$$

There are payoff functions that are both submixing and inverse-submixing (e.g. the parity function); for such payoffs Theorem 1.1 implies simple optimal strategies for both players, i.e. positionality.

3 Applications and examples

In this section we give a variety of examples of payoff functions that are shift-invariant and submixing, some of them very well-known, others less so. Thus we unify a number of classical positional determinacy results and also sketch how straightforward it is to apply Theorem 1.1 to novel payoff functions. Furthermore, we comment on the hypothesis of Theorem 1.1: Are the conditions necessary? What do they imply about the optimal strategies of the minimizer? Under what operations is this class of payoff functions closed? We start by listing a few well-known examples.

3.1 Unification of classical results

The mean-payoff function has been introduced by Gilette (1957). It measures average performances. Each state \(s\in \textbf{S}\) is labeled with an immediate reward \(r(s)\in {\mathbb {R}}\). With an infinite play \(s_0a_1s_1\cdots\) is associated an infinite sequence of rewards \(r_0=r(s_0),r_1=r(s_1),\ldots\) and the payoff is:

$$\begin{aligned} f_\text {mean}(r_0r_1\cdots ) \mathop {=}\limits ^{\textsf {def}}\limsup _{n}\frac{1}{n+1}\sum _{i=0}^n r_i. \end{aligned}$$

The discounted payoff has been introduced by Shapley (1953). It measures long-term performances with an inflation rate: immediate rewards are discounted. Each state s is labeled not only with an immediate reward \(r(s)\in {\mathbb {R}}\) but also with a discount factor \(0\le \lambda (s) <1\). With an infinite play h labeled with the sequence \((r_0,\lambda _0)(r_1,\lambda _1)\cdots \in ({\mathbb {R}}\times [0,1))^\omega\) of daily payoffs and discount factors is associated the payoff:

$$\begin{aligned} f_{\text {disc}}\left( (r_0,\lambda _0)(r_1,\lambda _1)\cdots \right) \mathop {=}\limits ^{\textsf {def}}r_0 + \lambda _0 r_1 + \lambda _0\lambda _1 r_2 +\cdots . \end{aligned}$$

The parity condition is used in automata theory and logics (Grädel et al. 2002). Each state s is labeled with some color \(c(s)\in \{0,\ldots ,d\}\). The payoff is 1 if the highest color seen infinitely often is even, and 0 otherwise. For \(c_0c_1\cdots \in \{0,\ldots ,d\}^\omega\),

$$\begin{aligned} f_\text {par}(c_0c_1\cdots )\mathop {=}\limits ^{\textsf {def}}{\left\{ \begin{array}{ll} 0 \text { if }\limsup _{n} c_n \text { is even,}\\ 1 \text { otherwise.} \end{array}\right. } \end{aligned}$$

The limsup payoff function has been used in the theory of gambling games (Maitra and Sudderth 1996). States are labeled with immediate rewards and the payoff is the limit supremum of the rewards:

$$\begin{aligned} f_\text {lsup}(r_0r_1\cdots ) \mathop {=}\limits ^{\textsf {def}}\limsup _n r_n. \end{aligned}$$

The liminf payoff function can be defined similarly.

The two following propositions follow easily from Theorem 1.1.

Proposition 3.1

The payoff functions \(f_\text {lsup}\), \(f_\text {linf}\), \(f_\text {par}\) and \(f_\text {mean}\) are shift-invariant and submixing. Moreover \(f_\text {lsup}\), \(f_\text {linf}\), and \(f_\text {par}\) are inverse-submixing as well.

Proposition 3.2

In every two-player stochastic game equipped with the parity, limsup, liminf, mean or discounted payoff function, Player 1 has a deterministic and stationary strategy which is optimal. The same is true for Player 2 for the parity, limsup and liminf payoff.

One comment should be made about the discounted payoff function: while it is not shift-invariant, it is possible to reduce games equipped with this function to games with the mean-payoff function, by interpreting discount factors as stopping probabilities as was done in the seminal paper of Shapley (1953).

Thus we have unified a number of classical results, thereby giving a common reason for the half-positionality of seemingly unrelated games. The approaches that can be found in the literature for proving that these games are (half-)positional are diverse, as one can see, for example, by consulting the papers (Courcoubetis and Yannakakis 1990) and Maitra and Sudderth (1996) that show positionality for parity games and limsup games, respectively. The existence of deterministic and stationary optimal strategies in mean-payoff games has a colourful history attached. The first proof was given by Gilette (1957) based on a variant of Hardy and Littlewood theorem. Later on, Ligget and Lippman found the variant to be wrong and proposed an alternative proof based on the existence of Blackwell optimal strategies plus a uniform boundedness result of Brown (Liggett and Lippman 1969). For one-player games, Bierth (1987) gave a proof using martingales and elementary linear algebra while (Vrieze et al. 1983) provided a proof based on linear programming and a modern proof can be found in Neyman and Sorin (2003) based on a reduction to discounted games and the use of analytical tools. For two-player games, a proof based on a transfer theorem from one-player to two-player games can be found in Gimbert (2006); Gimbert and Zielonka (2009, 2016).

3.2 Other examples

We mention a few more recent examples of games.

One-counter stochastic games have been introduced in Brázdil et al. (2010), in these games each state \(s\in \textbf{S}\) is labeled by a relative integer \(c(s)\in {\mathbb {Z}}\). Three different winning conditions were defined and studied in Brázdil et al. (2010):

$$\begin{aligned}&\limsup _n \sum _{0\le i \le n} c_i = + \infty \end{aligned}$$
(3)
$$\begin{aligned}&\limsup _n \sum _{0\le i \le n} c_i = - \infty \end{aligned}$$
(4)
$$\begin{aligned}&f_\text {mean}(c_0c_1\ldots ) > 0 \end{aligned}$$
(5)

The winning conditions given by (3) and (4) are clearly shift-invariant, it is furthermore plain that they are also submixing. The positive average condition defined by (5) is a variant of mean-payoff payoff, which may be more suitable to model quality of service constraints or decision makers with a loss aversion. One can naturally define a payoff function \(f_\text {posavg}\), that outputs 1 if the condition holds, and 0 otherwise.

Although \(f_\text {posavg}\) seems similar to the \(f_\text {mean}\) function, maximizing the expected value of \(f_\text {posavg}\) and doing the same for \(f_\text {mean}\), are two different goals. For example, a positive average maximizer prefers seeing the sequence \(1,1,1,\ldots\) for sure rather than seeing with equal probability \(\frac{1}{2}\) the sequences \(0,0,0,\ldots\) or \(3,3,3,\ldots\) while a mean-value maximizer prefers the second situation to the first one. To the best knowledge of the authors, the classical techniques developed in Bierth (1987); Neyman and Sorin (2003); Vrieze et al. (1983) cannot be used to prove positionality of games equipped with the positive average condition. However, since \(f_\text {posavg}\) can be defined as the composition of the submixing function \(f_\text {mean}\) with an increasing function, it is submixing itself. As a consequence of the main theorem of the present paper, it then follows that games that are equipped with \(f_\text {posavg}\) are half-positional.

Further relatively recent examples can be derived as variants of generalized mean payoff games, that were introduced in Chatterjee et al. (2010). Let us explain them in turn. In mean-payoff co-Büchi games, the states are labeled by immediate rewards, and a distinguished subset of the states are called the Büchi states. The payoff of Player 1 is \(-\infty\) if Büchi states are visited infinitely often and the mean-payoff value of the rewards otherwise. One can easily check that such a payoff mapping is shift-invariant and submixing. Although we do not explicitly handle payoff mappings that take infinite values, it is possible to approximate the payoff function by replacing \(-\infty\) by arbitrary small values to prove half-positionality of mean-payoff co-Büchi games.

Another variant is that of optimistic generalized mean-payoff games. In these games, each state is labeled by a fixed number of immediate rewards \(\left( r^{(1)},\ldots ,r^{(k)}\right)\), which define as many mean payoff conditions \(\left( f_\text {mean}^1,\ldots , f_\text {mean}^k\right)\). The winning condition is:

$$\begin{aligned} \exists i\ \ \ \ 1\le i \le k\ \ \text { and }\ \ f_\text {mean}^i\left( r^{(i)}_0r^{(i)}_1\ldots \right) > 0. \end{aligned}$$
(6)

It is an exercise to show that this winning condition is submixing. More generally, if \(f_1,\ldots , f_n\) are submixing payoff mappings then \(\max \{ f_1,\ldots ,f_n \}\) is submixing as well. As a consequence of this observation and Theorem 1.1, games with the optimistic generalized mean-payoff condition are half-positional. Such games are not positional however. One can show that the minimizer requires (finite) memory. Intuitively, he needs to use the memory to remember which dimensions have to be decreased, in order to render the condition false.

The generalized mean-payoff games of Chatterjee et al. (2010), (where the winning condition is as in (6), but with a universal instead of a existential quantifier), however are not submixing.

One final but interesting example of a payoff function that is shift-invariant, submixing, and even inverse-submixing (hence positional for both players in two-players games) is the positive frequency payoff. Every state is labeled by a color from a finite set C, each of which has a payoff u(c). An infinite play generates an infinite word of colors:

$$\begin{aligned} w\mathop {=}\limits ^{\textsf {def}}c_0c_1c_2\cdots , \end{aligned}$$

For a color c and \(n\in {\mathbb {N}}\) define \(\#(c,c_0c_1\cdots c_n)\) to be the number of occurrences of the color c in the prefix \(c_0c_1\cdots c_n\). The frequency of the color c in w is defined as:

$$\begin{aligned} \textrm{freq}(c,w)\mathop {=}\limits ^{\textsf {def}}\limsup _{n\rightarrow \infty } \frac{\#(c,c_0c_1\cdots c_n)}{n}, \end{aligned}$$

and the payoff

$$\begin{aligned} f_{\textrm{freq}}(w)\mathop {=}\limits ^{\textsf {def}}\max \{u(c)\ :\ c\in C,\ \textrm{freq}(c,w)>0\}. \end{aligned}$$

Other examples can be found in Gimbert (2007); Kopczynski (2009); Gimbert (2006), and in the papers cited in the introduction.

3.3 The class of shift-invariant and submixing functions

In this section we have already used two operators under which the class of shift-invariant and submixing functions is closed:

  • If \(f_1,\ldots , f_k\) are shift-invariant and submixing then so is

    $$\begin{aligned} f(w)\mathop {=}\limits ^{\textsf {def}}\max \{f_1(w),\ldots ,f_k(w)\}. \end{aligned}$$
  • If f is shift-invariant and submixing, and g is an increasing function then

    $$\begin{aligned} g\circ f \end{aligned}$$

    is shift-invariant and submixing.

The proofs are routine.

The class of shift-invariant and submixing functions does not seem to have any non-trivial closure property. For example, even though this class is closed under \(\max\) above, it is not closed under addition. That is if \(f_1\) and \(f_2\) are submixing, then \(f(w):=f_1(w)+f_2(w)\) need not be. To see this, consider the example with colors a and b, and \(f_1\) such that it maps to 1 if a occurs infinitely often, and 0 otherwise, and \(f_2\) defined symmetrically.

Furthermore, neither condition is necessary in Theorem 1.1: discounted games are positional but not shift-invariant, and \(f_\text {mean}\) with \(\liminf\) instead of \(\limsup\) is positional but not submixing. However, as we have seen, this class contains many interesting payoff functions, and it is the salient property that allows one to prove the existence of positional optimal strategies. Perhaps even more importantly, it is typically trivial to check whether a given payoff function is shift-invariant and submixing.

4 \(\epsilon\)-Subgame-perfect strategies

The proof of Theorem 1.1 hinges on a crucial property of games with perfect information, namely the fact that they admit \(\epsilon\)-subgame-perfect strategies, for all \(\epsilon >0\).

Theorem 4.1

Games equipped with a payoff function that is shift-invariant, bounded and Borel-measurable, admit \(\epsilon\)-subgame-perfect strategies, for every \(\epsilon >0\).

In this section we sketch the proof of this theorem, by giving a construction that takes some \(\epsilon\)-optimal strategy \(\sigma\) and turns it into a \(2\epsilon\)-subgame-perfect strategy \({\hat{\sigma }}\). A full proof can be found in Mashiah-Yaakovi’s paper (Mashiah-Yaakovi 2015, Proposition 11) in a more general setting, in the preprint version of the current paper (Gimbert and Kelmendi 2014, Section 3), as well as in Flesch et al. (2021).

The theorem is completely symmetric about the players, so it is sufficient to show that, say, the maximizer has \(\epsilon\)-subgame-perfect strategies.

An \(\epsilon\)-optimal strategy \(\sigma\) is a strategy that ensures (within \(\epsilon\)) the largest payoff that it can be ensured from the starting position of the game. As the game progresses, if the adversary makes a non-optimal move at some point, that allows one to gain more, an \(\epsilon\)-optimal strategy does not necessarily take advantage of this slip up. The \(\epsilon\)-subgame-perfect strategies are that subset of strategies that do take advantage of any such non-optimal action of the adversary. How does one turn an \(\epsilon\)-optimal strategy to an \(\epsilon\)-subgame-perfect strategy? One simple observation suffices: if after the non-optimal action is played, \(\sigma\) simply forgets the past, i.e. resets its memory, then the payoff will be larger. We explain why the strategy that always resets the memory when the adversary has played a non-optimal action is \(2\epsilon\)-subgame-perfect. First, let us make this construction precise.

A strategy \(\sigma\) is not \(2\epsilon\)-subgame-perfect if and only if there exists some finite play \(h:=s_0\cdots s_n\) such that

$$\begin{aligned} \inf _\tau {\mathbb {E}}_{s_n}^{\sigma [h],\tau }\left[ f[h]\right] < \sup _{\sigma '}\inf _{\tau }{\mathbb {E}}_{s_n}^{\sigma ',\tau }\left[ f[h]\right] -2\epsilon ; \end{aligned}$$
(7)

the reset strategy simply resets its memory when this happens. We give the formal definitions.

Definition 4.2

The finite play \(h:=s_0\cdots s_n\) is called a \((\epsilon ,\sigma )\)-drop if (7) holds. We write

$$\begin{aligned} \Delta (\epsilon ,\sigma )(h)\qquad \Leftrightarrow \qquad \text {h is a} (\epsilon ,\sigma )-\text {drop}. \end{aligned}$$

It is plain that one can factorise any infinite play that has infinitely many drops, into \(h_1 h_2\cdots\) where each \(h_i\) is a \((\epsilon ,\sigma )\)-drop, but no strict prefix of \(h_i\) is \((\epsilon ,\sigma )\)-drop. For example:

figure a

Definition 4.3

We define the date of the most recent (or latest) drop for all \(s_0\cdots s_n\) inductively as:

$$\begin{aligned} \Lambda (\epsilon ,\sigma )(s_0)&\mathop {=}\limits ^{\textsf {def}}0\\ \Lambda (\epsilon ,\sigma )(s_0\cdots s_n)&\mathop {=}\limits ^{\textsf {def}}{\left\{ \begin{array}{ll} n \qquad &{}\text {if }h\text { is a }(\epsilon ,\sigma )\text {-drop}\\ \Lambda (\epsilon ,\sigma )(s_0\cdots s_{n-1})\qquad &{}\text {otherwise}, \end{array}\right. } \end{aligned}$$

where

$$\begin{aligned} h\mathop {=}\limits ^{\textsf {def}}s_{\ell }\cdots s_n,\ \ \ \text {and} \ \ \ \ell \mathop {=}\limits ^{\textsf {def}}\Lambda (\epsilon ,\sigma )(s_0\cdots s_{n-1}). \end{aligned}$$

The date of the most recent drop in the example above looks as follows:

figure b

The reset strategy resets its memory whenever a drop occurs, i.e. it keeps the memory since the most recent drop:

Definition 4.4

(Reset Strategy) For any strategy \(\sigma\) we define the reset strategy \(\hat{\sigma }\) as:

$$\begin{aligned} \hat{\sigma }(s_0\cdots s_n)=\sigma (s_\ell \cdots s_n), \end{aligned}$$

where

$$\begin{aligned} \ell \mathop {=}\limits ^{\textsf {def}}\Lambda (\epsilon ,\sigma )(s_0\cdots s_n). \end{aligned}$$

The crux of the proof of Theorem 4.1 is to show that under the strategy \({\hat{\sigma }}\) only finitely many resets occur, almost surely. Intuitively, this is because after every \((\epsilon ,\sigma )\)-drop, by resetting the memory, the maximizer gains at least some amount \(\delta\) that is bounded away from zero. But it is not possible to gain infinitely many times \(\delta\) because the payoff function is assumed to be bounded.

One way of formally proving this observation is to use martingale theory. We start with a strategy \(\sigma\) that is \(\epsilon\)-optimal. Then one shows that only finitely many drops occur almost surely with the strategy \({\hat{\sigma }}\), and that furthermore the strategy \({\hat{\sigma }}\) is itself \(\epsilon\)-optimal. For the former, Doob’s optional stopping theorem, and forward convergence theorem (Williams 1991, Theorem 11.5) are useful. Finally one proves that, by construction, if \({\hat{\sigma }}\) is \(\epsilon\)-optimal, then it is also \(2\epsilon\)-subgame-perfect.

There is a technical detail which deserves some comments: to guarantee that \({\hat{\sigma }}\) is \(2\epsilon\)-subgame-perfect, we need the game to be value-preserving, in the following sense.

Definition 4.5

(Value-preserving Game) Let f be a shift-invariant payoff function. A game equipped with f is value-preserving for Player 1 if for every state s and every action a available in s,

$$\begin{aligned} \left( \sum _{t\in S} p(s,a,t) {{\,\textrm{val}\,}}(\textbf{G})(t)\right) \ge {{\,\textrm{val}\,}}(\textbf{G})(s). \end{aligned}$$
(8)

The game is value-preserving for Player 2 if the converse inequality holds from every state of the game.

Since f is shift-invariant, inequality (8) holds when s is controlled by Player 2 (otherwise Player 2 could guarantee the expected payoff from s to be strictly smaller than the value of s). Any game equipped with f can be turned into a value-preserving game by deleting the actions available in Player 1 states in case they violate (8). This does not change the value of the state of the game, and moreover the \(\epsilon\)-subgame perfect strategies of the new game are also \(\epsilon\)-subgame perfect in the original game.

The \(\epsilon\)-optimal strategies that we use, to turn into \(2\epsilon\)-subgame-perfect strategies \({\hat{\sigma }}\), are guaranteed to exist by Martin’s theorem, Theorem 2.4. However, if in some game, one of the players, happens to have an optimal strategy (i.e. a 0-optimal strategy), then via the construction above one can prove the existence of a subgame-perfect strategy (i.e. a 0-subgame-perfect strategy).

Remark 4.6

In value-preserving games, if \(\sigma\) is \(\epsilon\)-optimal, then the reset strategy \({\hat{\sigma }}\) is \(2\epsilon\)-subgame-perfect. Moreover, if \(\sigma\) is optimal, then \({\hat{\sigma }}\) is subgame-perfect.

Another property that is preserved by passing from \(\sigma\) to \({\hat{\sigma }}\) is that of finite memory, that is if the strategy \(\sigma\) has finite memory to begin with, so will the strategy \(\hat{\sigma }\). First we define precisely what we mean by finite memory strategy.

A strategy \(\sigma\) is said to have finite memory if it is given using a transducer, namely it is a tuple:

$$\begin{aligned} \underbrace{\mathcal {M}}_{\text {a finite set}},\qquad \underbrace{{{\,\textrm{init}\,}}\ :\ \textbf{S}\rightarrow \mathcal {M}}_{\text {memory initialiser}},\qquad \underbrace{{{\,\textrm{up}\,}}\ :\ \mathcal {M}\times \textbf{A}\times \textbf{S}\rightarrow \mathcal {M}}_{\text {update function}},\qquad \underbrace{{{\,\textrm{out}\,}}\ :\ \mathcal {M}\rightarrow \Delta (\textbf{A})}_{\text {output function}}. \end{aligned}$$

The map \({{\,\textrm{init}\,}}\) and \({{\,\textrm{up}\,}}\) are used to initialise the memory and update it, as the game unfolds: after the finite play \(s_0a_0\cdots s_n\) has unfolded, the transducer reaches the memory state \(m_n\in \mathcal {M}\) which is defined inductively as:

$$\begin{aligned} m_0&\mathop {=}\limits ^{\textsf {def}}{{\,\textrm{init}\,}}(s_0), \text {and}\\ m_{k+1}&\mathop {=}\limits ^{\textsf {def}}{{\,\textrm{up}\,}}(m_k,a_{k+1},s_{k+1}). \end{aligned}$$

The output function is used to choose the action that the strategy plays, i.e.

$$\begin{aligned} \sigma (s_0\cdots s_n)={{\,\textrm{out}\,}}(m_n). \end{aligned}$$

Proposition 4.7

If \(\sigma\) is a finite memory strategy, so is \(\hat{\sigma }\).

Proof

The reset strategy is constructed with respect to \(\sigma\) and some \(\epsilon >0\), since it depends on \((\epsilon ,\sigma )\)-drops to reset the memory. We prove the proposition for any \(\epsilon\) such that \(\sigma\) is \(\epsilon\)-optimal.

Let \(\sigma\) be a finite memory strategy, that is given by the tuple

$$\begin{aligned} (\mathcal {M},{{\,\textrm{init}\,}},{{\,\textrm{up}\,}},{{\,\textrm{out}\,}}), \end{aligned}$$

and let \(\epsilon\) be such that \(\sigma\) is \(\epsilon\)-optimal, which fixes a reset strategy \(\hat{\sigma }\).

Without loss of generality we can assume that the strategy is such that its memory state identifies the current state in the game, in other words assume that \(\mathcal {M}\) can be partitioned into:

$$\begin{aligned} \mathcal {M}= \biguplus _{s\in \textbf{S}}\mathcal {M}_s, \end{aligned}$$

such that for any finite play \(s_0\cdots s_n\), if \(m_1,\ldots ,m_n\) is the sequence of memory states of the transducer of \(\sigma\) during this play, then

$$\begin{aligned} m_n\in \mathcal {M}_{s_n}. \end{aligned}$$

We gather the subset of memory states where drops occur as follows. For \(s\in \textbf{S}\) and \(m\in \mathcal {M}_s\), denote by \(\sigma _m\) the strategy that is the same as \(\sigma\) except that the initial memory state for s is m instead of \({{\,\textrm{init}\,}}(s)\). Define the subset of memory states where drops occur \({\mathcal {D}}\subset \mathcal {M}\) as

$$\begin{aligned} {\mathcal {D}}\mathop {=}\limits ^{\textsf {def}}\{m\in \mathcal {M}_s\ :\ s\in \textbf{S}\text { and }\sigma _m \text {is not }2\epsilon -\text {optimal from state }s\}. \end{aligned}$$

Construct the finite memory strategy \(\sigma '\) that avoids the memory states in \({\mathcal {D}}\) as follows. For any \(s\in \textbf{S}\) and \(m\in \mathcal {M}_s\cap {\mathcal {D}}\), since \(\sigma\) is \(\epsilon\)-optimal, \(m\ne {{\,\textrm{init}\,}}(s)\). In the strategy \(\sigma '\) modify the function \({{\,\textrm{up}\,}}\) in such a way that all the transitions that lead to m are redirected to the state \({{\,\textrm{init}\,}}(s)\) instead (the memory is reset). Do this simultaneously for any pair (sm) as above. Comparing the definition of \(\hat{\sigma }\) and \(\sigma '\) we conclude that they coincide. \(\square\)

This proposition, together with the salient property of the reset strategy, namely that it is \(\epsilon\)-subgame-perfect, implies that if there are \(\epsilon\)-optimal strategies with finite memory, then there are also \(\epsilon\)-subgame-perfect strategies with finite memory. We make this statement more precise.

Proposition 4.8

Let \(\textbf{G}\) be a game and \(\epsilon > 0\). Assume that the game is value-preserving for Player 1 and that Player 1 has an \(\epsilon\)-optimal strategy \(\sigma\) with finite memory. Then Player 1 also has a \(2\epsilon\)-subgame-perfect strategy with finite memory, namely the reset strategy \(\hat{\sigma }\). The same statement holds even if \(\epsilon =0\), i.e. for optimal strategies. The symmetric statements hold for Player 2.

Proof

From the discussion of this section the strategy \(\hat{\sigma }\) is \(2\epsilon\)-subgame-perfect, and Proposition 4.7 implies that it has finite memory. The case \(\epsilon =0\) follows from Remark 4.6. \(\square\)

5 Half-positional games

We prove the main theorem:

Theorem 1.1

Games equipped with a payoff function that is shift-invariant and submixing are half-positional.

Neither of the conditions in the statement is necessary, as we saw from the examples given in Sect. 3. Necessary and sufficient conditions for positionality are known for deterministic games (Gimbert and Zielonka 2005). However the shift-invariant and submixing conditions are general enough to recover several known classical results, and to provide several new examples of games with deterministic stationary optimal strategies. Before we proceed with the proof we remark:

Corollary 5.1

Games with payoff functions which are at the same time shift-invariant, submixing and inverse-submixing are positional.

Proof

A symmetric proof to that of Theorem 1.1, the subject of this section, can be used to prove a statement like that of Theorem 1.1, where Player 1 is replaced by Player 2 and submixing is replaced by inverse-submixing. \(\square\)

Consider a game \(\textbf{G}\) fulfilling the conditions of the theorem. The proof proceeds by induction on the actions of the maximizer, that is on the quantity

$$\begin{aligned} N(\textbf{G})\mathop {=}\limits ^{\textsf {def}}\sum _{s\in \textbf{S}_1}\left( |\textbf{A}(s)|-1\right) . \end{aligned}$$

It proceeds by removing more and more actions of the maximizer and showing that at every step the value has not decreased, until we are left with a single choice from every state that belongs to the maximizer. The unique choice will then be the positional optimal strategy.

If \(N(\textbf{G})=0\) there is no choice for the maximizer, hence he has a deterministic and stationary optimal strategy. If \(N(\textbf{G})>0\) there must be a state \({\tilde{s}}\in \textbf{S}\) such that Player 1 has at least two actions in \({\tilde{s}}\), i.e. \(\textbf{A}({\tilde{s}})\) has at least two elements. We split the game \(\textbf{G}\) in two strictly smaller subgames \(\textbf{G}_1\) and \(\textbf{G}_2\).

Definition 5.2

(Split of a game) Let \(\textbf{G}\) be a game with \(N(\textbf{G})>0\) and \({\tilde{s}}\in \textbf{S}\) a state of \(\textbf{G}\) controlled by Player 1 in which there are at least two actions available, i.e. \(\textbf{A}({\tilde{s}})\) has at least two elements. Partition \(\textbf{A}({\tilde{s}})\) into two non-empty sets: \(\textbf{A}_1\) and \(\textbf{A}_2\). Let \(\textbf{G}_1\) and \(\textbf{G}_2\) be the games obtained from \(\textbf{G}\) by restricting the actions in the state \({\tilde{s}}\) to \(\textbf{A}_1\) and \(\textbf{A}_2\) respectively. Then \((\textbf{G}_1,\textbf{G}_2)\) is called a split of \(\textbf{G}\) on \(\tilde{s}\).

The induction step relies on the two results stated in the next theorem. The first result says that the value of \({\tilde{s}}\) in the original game cannot be larger than that of the restricted games. The second result shows that Player 1 can play optimally in \(\textbf{G}\) by selecting one of the subgames and play optimally in it.

Theorem 5.3

Let \(\textbf{G}\) be a game equipped with a payoff function that is shift-invariant and submixing. Let \((\textbf{G}_1,\textbf{G}_2)\) a split of \(\textbf{G}\) on \({\tilde{s}}\). Then

$$\begin{aligned} {{\,\textrm{val}\,}}(\textbf{G})({\tilde{s}}) = \max \{{{\,\textrm{val}\,}}(\textbf{G}_1)(\tilde{s}),{{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\}. \end{aligned}$$
(9)

Assume moreover that \({{\,\textrm{val}\,}}(\textbf{G}_1)({\tilde{s}})\ge {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\). Then, for every \(s\in \textbf{S}\),

$$\begin{aligned} {{\,\textrm{val}\,}}(\textbf{G})(s)={{\,\textrm{val}\,}}(\textbf{G}_1)(s). \end{aligned}$$
(10)

Theorem 1.1 is a simple corollary of Theorem 5.3.

Proof of Theorem 1.1

The proof is by induction on N(G). If \(N(\textbf{G})=0\) there is no choice for the maximizer, hence he has a deterministic and stationary optimal strategy. If \(N(\textbf{G})>0\) then we choose a split \((\textbf{G}_1,\textbf{G}_2)\) of \(\textbf{G}\) on a pivot state \({\tilde{s}}\). By symmetry, we can choose a split such that \({{\,\textrm{val}\,}}(\textbf{G}_1)(\tilde{s})\ge {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\). Then, according to (10) in Theorem 5.3, a strategy for Player 1 which is optimal in \(\textbf{G}_1\) is also optimal in \(\textbf{G}\). By induction hypothesis, there exists a positional optimal strategy in \(\textbf{G}_1\), thus \(\textbf{G}\) is half-positional. \(\square\)

The rest of the section is dedicated to the proof of Theorem 5.3. We fix a game \(\textbf{G}\) and a split \((\textbf{G}_1,\textbf{G}_2)\) of \(\textbf{G}\) on the state \({\tilde{s}}\). The inequality

$$\begin{aligned} {{\,\textrm{val}\,}}(\textbf{G})({\tilde{s}}) \ge \max \{{{\,\textrm{val}\,}}(\textbf{G}_1)(\tilde{s}),{{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\} \end{aligned}$$

is clear, since Player 1 has more choice in \(\textbf{G}\) than he has in \(\textbf{G}_1\) and \(\textbf{G}_2\). We witness the converse inequality with a strategy for Player 2, called the merge strategy, which merges two \(\epsilon\)-subgame-perfect strategies in the respective smaller games. This is done in Sect. 5.3. The definition of the merge strategy hinges on the projection of plays in the main game \(\textbf{G}\) to plays in the restricted games \(\textbf{G}_1\) and \(\textbf{G}_2\), which is done in Sect. 5.1. Then we analyse the two possible outcomes: (a) after some date the play remains only in game \(\textbf{G}_1\) (or only in game \(\textbf{G}_2\)), (b) the play switches infinitely often between the two smaller games. This analysis is performed in Sects. 5.4 and  5.5. For the latter case (b) we use the submixing property to show that Player 1 cannot get a better payoff by switching between the two smaller games that he could get by staying in one of the subgames.

5.1 Projecting a play in \(\textbf{G}\) to a couple of plays in the subgames

There is a natural way to project a play h of the game \(\textbf{G}\) starting in \({\tilde{s}}\) to a couple of plays \(h_1\) and \(h_2\) in the restricted games \(\textbf{G}_1\) and \(\textbf{G}_2\) respectively, starting from \({\tilde{s}}\) as well. The two projections are computed simultaneously and inductively. Initially, \(h={\tilde{s}}\) and both projections \(h_1\) and \(h_2\) are also equal to \({\tilde{s}}\). Each step of the play in \(\textbf{G}\) is appended to either \(h_1\) or \(h_2\), depending on the action a played the last time the state \({\tilde{s}}\) was visited: if a belongs to \(\textbf{A}_1\) then the new step is appended to \(h_1\), otherwise it is appended to \(h_2\).

Before giving the formal definition of \(\pi _1\) and \(\pi _2\) below, we provide an example. Consider the game \(\textbf{G}\) with a single state s controlled by Player 1 (s is the pivot state \({\tilde{s}}\)) and the set of actions \(\textbf{A}=\{a_1,a_2\}\) partitioned into \(\textbf{A}_1=\{a_1\}\) and \(\textbf{A}_2=\{a_2\}\). Then \(\pi _1\) simply erases the loops on the action \(a_2\): \(\pi _1(s) = s\), \(\pi _1(sa_2s) = s\), \(\pi _1(sa_2sa_1s) = sa_1s\) and for every finite sequence of integers \(k_0,\ldots , k_m\), \(\pi _1(s(a_2s)^{k_0}a_1s(a_2s)^{k_1}\ldots a_1s(a_2s)^{k_m} ) = s(a_1s)^{m}\). Symmetrically, \(\pi _2\) erases the loops on the action \(a_2\).

Formally, we define two maps \(\pi _1\), \(\pi _2\) from finite plays in \(\textbf{G}\) starting from \({\tilde{s}}\) to finite plays in \(\textbf{G}_1\) and \(\textbf{G}_2\) respectively, starting from \({\tilde{s}}\) as well. Let \(h = s_0a_0s_1\ldots s_n\) be a finite play in \(\textbf{G}\) starting in \({\tilde{s}}\) and has a continuation of h in \(\textbf{G}\), with one more transition \((s_n,a,s)\). Let \({{\,\textrm{last}\,}}(has)\) be the action played in has after the last visit to \({\tilde{s}}\) i.e.

$$\begin{aligned} {{\,\textrm{last}\,}}(has) = a_{\max \{ j \in 0\ldots n \ \mid \ s_j = {\tilde{s}} \}} = {\left\{ \begin{array}{ll} a &{} \hbox { if}\ s_n={\tilde{s}}\\ {{\,\textrm{last}\,}}(h) &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$

Then

$$\begin{aligned} \pi _1(has) = {\left\{ \begin{array}{ll} \pi _1(h)a s &{} \hbox { if}\ {{\,\textrm{last}\,}}(has ) \in \textbf{A}_1\\ \pi _1(h) &{} \hbox { if}\ {{\,\textrm{last}\,}}(has ) \in \textbf{A}_2. \end{array}\right. } \end{aligned}$$

And \(\pi _2\) is defined symmetrically with respect to \(\textbf{A}_1\) and \(\textbf{A}_2\).

This definition can be extended to infinite plays in a natural way. Let \(h=s_0a_0s_1\ldots\) be an infinite play in \(\textbf{G}\) starting in \({\tilde{s}}\). Then \(\pi _1(h)\) is the limit of the sequence

$$\begin{aligned} \left( \pi _1(s_0a_0s_1\ldots s_n)\right) _{n\in {\mathbb {N}}}. \end{aligned}$$

The projection \(\pi _1(h)\) can be either finite or infinite, depending whether the play ultimately stays in \(\textbf{G}_2\) or not. If after some time the last action chosen in \({\tilde{s}}\) is always in \(\textbf{A}_2\), all subsequent moves in \(\textbf{G}\) are appended to the projection in \(\textbf{G}_2\), while the projection to \(\textbf{G}_1\) never gets updated and stays finite. In the single-state example provided above before the formal definition, \(\pi _1(s a_1 s a_2 s a_1 s a_2 \ldots )\) is the single possible infinite play \(sa_1sa_1\ldots\) in the subgame \(\textbf{G}_1\) while \(\pi _1(s a_1 s a_2 s a_2 s a_2 \ldots )\) is the finite play \(sa_1s\) in \(\textbf{G}_1\).

5.2 Linking the payoff in \(\textbf{G}\) to the payoffs in the subgames

The payoff in \(\textbf{G}\) can be related to the payoffs in the subgames \(\textbf{G}_1\) and \(\textbf{G}_2\). For that we introduce the following events in the game \(\textbf{G}\) (an event is a measurable subset of infinite plays ). Recall that \(S_n\) and \(A_n\) are the random variables which output respectively the nth state \(s_n\) and action \(a_n\) when the play is \(s_0a_0s_1a_1\cdots\).

$$\begin{aligned} {{\,\textrm{Stay}\,}}_{\ge n}(\textbf{G}_2)&\mathop {=}\limits ^{\textsf {def}}\{ \forall m\ge n, {{\,\textrm{last}\,}}(S_0A_0\ldots S_mA_{m}S_{m+1})\in \textbf{A}_2 \}\\ {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)&\mathop {=}\limits ^{\textsf {def}}\bigcup _{n\in {\mathbb {N}}} {{\,\textrm{Stay}\,}}_{\ge n}(\textbf{G}_2). \end{aligned}$$

If \({{\,\textrm{Stay}\,}}_{\ge n}(\textbf{G}_2)\) holds, we say that the play stays in \(\textbf{G}_2\) after step n whereas if \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\) holds, we say that the play ultimately stays in \(\textbf{G}_2\).

Those two events can be described equivalently as a non-update of the projection to \(\textbf{G}_1\) after some point. For that, we make use of the random variables:

$$\begin{aligned} \Pi \mathop {=}\limits ^{\textsf {def}}S_0A_0S_1\cdots \qquad \Pi _1 \mathop {=}\limits ^{\textsf {def}}\pi _1(S_0A_0S_1\cdots ),\qquad \Pi _2 \mathop {=}\limits ^{\textsf {def}}\pi _2(S_0A_0S_1\cdots ). \end{aligned}$$

We see that \(\Pi\) is simply the identity map outputting the play in \(\textbf{G}\) while \(\Pi _i\) is essentially equivalent to \(\pi _i\), it is a random variable that maps the infinite play in game \(\textbf{G}\) to its finite or infinite projection in game \(\textbf{G}_i\). Then

$$\begin{aligned}&{{\,\textrm{Stay}\,}}_{\ge n}(\textbf{G}_2)=\{\Pi _1 = \pi _1(S_0A_1\cdots S_n)\}\\&{{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)=\{\Pi _1 \text { is finite}\} . \end{aligned}$$

The events \({{\,\textrm{Stay}\,}}_{\ge n}(\textbf{G}_1)\) and \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_1)\) are defined symmetrically. Define the event

$$\begin{aligned} {{\,\textrm{Switch}\,}}\mathop {=}\limits ^{\textsf {def}}\left( \lnot {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_1)\wedge \lnot {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\right) = \{ \text { both }\Pi _1\text { and }\Pi _2\text { are infinite } \}. \end{aligned}$$

The following lemma shows that the payoff in \(\textbf{G}\) is tightly related to the payoffs in the subgames \(\textbf{G}_1\) and \(\textbf{G}_2\).

Lemma 5.4

Let f be a shift-invariant and submixing payoff function. Every infinite play in \(\textbf{G}\) belongs to exactly one of the three events \(\{{{\,\textrm{Stay}\,}}_\omega (\textbf{G}_1),{{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2),{{\,\textrm{Switch}\,}}\}\). Moreover,

$$\begin{aligned}&\text {if } {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_1)\text { holds then } f(\Pi ) = f(\Pi _1). \end{aligned}$$
(11)
$$\begin{aligned}&\text {If } {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\text { holds then } f(\Pi ) = f(\Pi _2). \end{aligned}$$
(12)
$$\begin{aligned}&\text {If } {{\,\textrm{Switch}\,}}\text { holds then } f(\Pi ) \le \max (\ f(\Pi _1)\ ,\ f(\Pi _2)\ ) . \end{aligned}$$
(13)

Proof

Since both projections in \(\textbf{G}_1\) and \(\textbf{G}_2\) cannot be finite at the same time then \(({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_1),{{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2),{{\,\textrm{Switch}\,}})\) is a partition of the infinite plays in \(\textbf{G}\). If \(\Pi _1\) is finite then \(\Pi\) and \(\Pi _2\) share an infinite suffix and the shift-invariance of f implies (11). The case where \(\Pi _2\) is finite is symmetric, hence (12). If both \(\Pi _1\) and \(\Pi _2\) are infinite then the sequence of actions \(({{\,\textrm{last}\,}}(S_0\ldots S_nA_{n}S_{n+1}))_{n\in {\mathbb {N}}}\) switches infinitely often between \(\textbf{A}_1\) and \(\textbf{A}_2\) thus \({\tilde{s}}\) is visited infinitely often. Moreover, in this case \(\Pi\) is a shuffle of \(\Pi _1\) and \(\Pi _2\) and since f is submixing, (13) follows. \(\square\)

5.3 The merge strategy

In light of Lemma 5.4, it is intuitively clear that to play well in \(\textbf{G}\), Player 2 has to play well in both subgames \(\textbf{G}_1\) and \(\textbf{G}_2\). Fix \(\epsilon >0\). The merge strategy for Player 2 is the composition of two strategies \(\tau ^\sharp _1\) and \(\tau ^\sharp _2\) for Player 2 in the subgames \(\textbf{G}_1\) and \(\textbf{G}_2\) respectively. We require \(\tau ^\sharp _1\) and \(\tau ^\sharp _2\) to be \(\epsilon\)-subgame-perfect in the corresponding subgames; their existence is guaranteed by Theorem 4.1.

Definition 5.5

The merge strategy \(\tau ^\sharp\) is a strategy in \(\textbf{G}\) for Player 2 which ensures that \(\Pi _1\) is consistent with \(\tau ^\sharp _1\) and \(\Pi _2\) is consistent with \(\tau ^\sharp _2\) when the play starts from \({\tilde{s}}\). Let h be a finite play in \(\textbf{G}\) from \({\tilde{s}}\) and ending in a state controlled by Player 2, then

$$\begin{aligned} \tau ^\sharp (h) = {\left\{ \begin{array}{ll} \tau ^\sharp _1(\pi _1(h)) &{} \text { if }{{\,\textrm{last}\,}}(h)\in \textbf{A}_1,\\ \tau ^\sharp _2(\pi _2(h)) &{} \hbox { if }\ {{\,\textrm{last}\,}}(h)\in \textbf{A}_2. \end{array}\right. } \end{aligned}$$

The merge strategy is well-defined because in case \({{\,\textrm{last}\,}}(h)\in \textbf{A}_i\), with \(i\in \{1,2\}\), then both h and \(\pi _i(h)\) end with the same state, controlled by Player 2.

In the next two sections, we show that the merge strategy guarantees to Player 2 some upper-bounds on the expected payoffs, which reflect the bounds given in Lemma 5.4 for payoffs of individual plays.

5.4 On plays consistent with the merge strategy and ultimately staying in \(\textbf{G}_2\)

In this section, we show that in case the play ultimately stays in \(\textbf{G}_2\), then the expected payoff is upper-bounded by \({{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}}) + \epsilon\).

For simplicity, we require \(\epsilon\) to be small enough so that \(\tau _2\) does not select any value-increasing action, in the following sense.

Lemma 5.6

In \(\textbf{G}_2\), fix a state s controlled by Player 2 and an action a available in that state. Denote

$$\begin{aligned} \delta (s,a) = \left( \sum _{t\in S} p(s,a,t) {{\,\textrm{val}\,}}(\textbf{G}_2)(t)\right) - {{\,\textrm{val}\,}}(\textbf{G}_2)(s). \end{aligned}$$

Then \(\delta (s,a)\ge 0\).

In case \(\delta (s,a) > 0\) then a is said to be value-increasing in s. In that case, if moreover \(\epsilon\) is strictly smaller that \(\delta (s,a)\), then \(\tau ^\sharp _2\) never selects the action a in a play ending in state s.

Proof

Since the payoff function is shift-invariant, and s is controlled by Player 2, then \(\delta (s,a)\ge 0\), because after Player 2 chooses a in s, he can proceed with an \(\epsilon '\)-optimal strategy from the states t such that \(p(s,a,t)>0\), for an arbitrary \(\epsilon '>0\). Assume \(\epsilon\) strictly smaller that \(\delta (s,a)\). Then \(\tau ^\sharp _2\) never selects a in state s, otherwise this would contradict the \(\epsilon\)-subgame perfection of \(\tau _2\): Player 1 could proceed with some \((\delta (s,a) - \epsilon )/2\)-optimal strategy in \(\textbf{G}_2\) and get an expected payoff strictly greater than \({{\,\textrm{val}\,}}(\textbf{G}_2)(s)+\epsilon\). \(\square\)

Lemma 5.7

Assume that f is shift-invariant and \(\epsilon\) is small enough to guarantee that \(\tau ^\sharp _2\) never selects any value-increasing action. Let \(\sigma\) be a strategy for Player 1 in \(\textbf{G}\) such that \({\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( { {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)}\right) > 0\). Then

$$\begin{aligned} {\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ f \mid {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\right] \le {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}}) + \epsilon . \end{aligned}$$
(14)

Proof

The first ingredient of the proof is the sequence of random variables \((V_n)_{n\in {\mathbb {N}}}\), where \(V_n\) denotes the value in \(\textbf{G}_2\) of the last state of \(\pi _2(S_0A_1\cdots S_n)\). Since the play starts in state \({\tilde{s}}\),

$$\begin{aligned} V_0 = {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}}). \end{aligned}$$

The value of \(V_n\) does not change unless the projection of the play to \(\textbf{G}_2\) via \(\pi _2\) does. Since \(\Pi _2\) is consistent with \(\tau ^\sharp _2\) and since \(\tau ^\sharp _2\) never selects any value-increasing action,

$$\begin{aligned} (V_n)_{n\in {\mathbb {N}}}\text { is a super martingale}. \end{aligned}$$

The second ingredient in the proof is a stopping time T, defined as follows. For every finite play \(h=s_0a_0\ldots s_n\) in \(\textbf{G}\) starting in \({\tilde{s}}\) and consistent with \(\sigma\) and \(\tau ^\sharp\), denote

$$\begin{aligned} \phi (h) = {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {{{\,\textrm{Stay}\,}}_{\ge n}(\textbf{G}_2)\mid h\text { is a prefix of the play}}\right) , \end{aligned}$$

in other words, \(\phi (h)\) is the probability that, after the prefix h, the play stays forever in \(\textbf{G}_2\) i.e. \(\pi _1\) is never updated anymore and stays equal to \(\pi _1(h)\). Fix some \(\epsilon '>0\) and denote by T the stopping time

$$\begin{aligned} T = \min \left\{ n \in {\mathbb {N}}\mid \phi (S_0A_0\ldots S_n) \ge 1 - \epsilon ' \right\} , \end{aligned}$$

with the usual convention \(\min (\emptyset ) = \infty\).

We use the event \(\{T < \infty \}\) as an approximation of the event \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\) by proving

$$\begin{aligned}&{\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( { {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\mid T < \infty }\right) \ge 1 - \epsilon ' \end{aligned}$$
(15)
$$\begin{aligned}&{\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {T < \infty \mid {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)}\right) = 1 . \end{aligned}$$
(16)

The inequality (15) holds because by definition of \(\phi\), for every \(n\in {\mathbb {N}}\),

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( { {{\,\textrm{Stay}\,}}_{\ge n}(\textbf{G}_2)\mid T = n}\right) \ge 1 - \epsilon ' . \end{aligned}$$

We show (16). Fix \(\epsilon ''>0\). By definition of \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\), there exists \(n_1\in {\mathbb {N}}\) such that

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( { {{\,\textrm{Stay}\,}}_{\ge n_1}(\textbf{G}_2)\mid {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)}\right) \ge 1 - \epsilon ''. \end{aligned}$$
(17)

According to Lévy’s 0-1 law (see e.g. (Williams 1991, Theorem 14.4)), the sequence of random variables \(\left( {\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ {{\,\textrm{Stay}\,}}_{\ge n_1}(\textbf{G}_2) \mid S_0,\ldots , S_n\right] \right) _{n\in {\mathbb {N}}}\) almost-surely converges to the indicator function \(\textbf{1}_{{{\,\textrm{Stay}\,}}_{\ge n_1}(\textbf{G}_2)}\). Thus,

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( { \exists n_2 \ge n_1, {\mathbb {E}}_{\tilde{s}}^{\sigma ,\tau ^\sharp }\left[ {{\,\textrm{Stay}\,}}_{\ge n_1}(\textbf{G}_2) \mid S_0,\ldots , S_{n_2}\right] \ge 1 - \epsilon ' \ \vert \ {{\,\textrm{Stay}\,}}_{\ge n_1}(\textbf{G}_2) }\right) =1. \end{aligned}$$

Since \(n_2\ge {n_1}\) implies \({{\,\textrm{Stay}\,}}_{\ge n_2}(\textbf{G}_2)\subseteq {{\,\textrm{Stay}\,}}_{\ge n_1}(\textbf{G}_2)\),

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( { \exists n_2, \phi (S_0,\ldots ,S_{n_2}) \ge 1 - \epsilon '\mid {{\,\textrm{Stay}\,}}_{\ge n_1}(\textbf{G}_2) }\right) =1. \end{aligned}$$

Equivalently,

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {T < \infty \mid {{\,\textrm{Stay}\,}}_{\ge n_1}(\textbf{G}_2)}\right) =1. \end{aligned}$$

and with (17) we get

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {T < \infty \mid {{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)}\right) \ge 1 - \epsilon ''. \end{aligned}$$

This holds for every \(\epsilon ''>0\), hence (16).

Since \(\epsilon '>0\) can be chosen arbitrarily small, then according to (15) and (16), to show our goal (14), it is enough to establish:

$$\begin{aligned} {\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ f \mid T < \infty \right] \le {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}}) + \epsilon + 2\epsilon ' \cdot ||f||_\infty . \end{aligned}$$
(18)

This is well-defined, because (16) ensures \({\mathbb {P}}_{\tilde{s}}^{\sigma ,\tau ^\sharp }\left( {T < \infty }\right) \ge {\mathbb {P}}_{\tilde{s}}^{\sigma ,\tau ^\sharp }\left( {{{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)}\right) >0\), and f is bounded.

Since \((V_n)_{n\in {\mathbb {N}}}\) is a bounded super martingale, we can deduce from Doob’s Forward Convergence Theorem (Williams 1991, Theorem 11.5) that \((V_n)_{n\in {\mathbb {N}}}\) converges almost-surely. We denote \(V_T\) the random variable equal to \((\lim _n V_{n})\) if \(T=\infty\) and \(V_n\) if \(T=n\).

We deduce (18) from the following three inequalities:

$$\begin{aligned}&{\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ V_T\right] \le {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}}) \end{aligned}$$
(19)
$$\begin{aligned}&{\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ V_T \mid T = \infty \right] = {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}}) \end{aligned}$$
(20)
$$\begin{aligned}&{\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ f \mid T< \infty \right] \le {\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ V_T \mid T < \infty \right] + \epsilon + 2\epsilon ' \cdot ||f||_\infty . \end{aligned}$$
(21)

Assuming (19) and (20) do hold, then \({\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ V_T \mid T < \infty \right] \le {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\). Injecting this inequality in (21), we get (18), and the lemma is proved.

We prove the three inequalities (19)–(21). To obtain inequality (19) we apply Doob’s Optional Stopping Theorem (Theorem 10.10 in Williams (1991)) to the bounded super-martingale \((V_n)_{n\in {\mathbb {N}}}\) and the stopping time \(\min (T,k)\) for an arbitrary (large) \(k\in {\mathbb {N}}\). This implies \({\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ V_{\min (T,k)}\right] \le V_0\). Then (19) follows by taking the limit of this inequality when \(k\rightarrow \infty\) and using the equality \(V_0 = {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\).

To prove (20), we prove an even stronger statement:

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {V_T = {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}}) \mid T = \infty }\right) = 1. \end{aligned}$$

If \(T=\infty\) then, according to (16), the event \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\) does not hold. Thus, according to Lemma 5.4, either \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_1)\) or \({{\,\textrm{Switch}\,}}\) holds. In the first case, \(\Pi _2\) is ultimately constant equal to a play in \(\textbf{G}_2\) ending in \(\tilde{s}\) and \((V_n)_{n\in {\mathbb {N}}}\) is ultimately constant equal to \({{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\). In the second case, the play \(\Pi _2\) visits \({\tilde{s}}\) infinitely often. Since \((V_n)_n\) converges almost-surely to \(V_T\) then \(V_T={{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\).

Finally, we prove (21). Denote \(h_T\) the random variable defined when T is finite, which outputs the prefix of the play of length T, i.e.

$$\begin{aligned} h_T = S_0A_0\ldots S_T, \end{aligned}$$

and let h such that \({\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_T=h}\right) > 0\). Denote t the last state of h. We modify the strategy \(\sigma [h]\) in \(\textbf{G}\) to obtain a strategy \(\sigma _0\) in \(\textbf{G}_2\), in the following way. Every finite play \(h_2\) in \(\textbf{G}_2\) is also a finite play in \(\textbf{G}\). In case the last state of \(h_2\) is the pivot state \({\tilde{s}}\), then the lottery \(\sigma [h](h_2)\) selects an action which is either in \(\textbf{A}_2\) or \(\textbf{A}_1\). In the first case we say that \(\sigma [h]\) keeps playing in \(\textbf{G}_2\) while in the second case we say that \(\sigma [h]\) exits \(\textbf{G}_2\). The strategy \(\sigma _0\) coincides with \(\sigma [h]\) as long as it keeps playing in \(\textbf{G}_2\). Whenever \(\sigma [h]\) exits \(\textbf{G}_2\) then \(\sigma _0\) plays arbitrarily in \(\textbf{G}_2\), for the rest of the play. Then:

$$\begin{aligned} {\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ f \mid h_T=h \right]&= {\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ f \mid h \text { is a prefix of the play}\right] \\&= {\mathbb {E}}_{ t}^{\sigma [h],\tau ^\sharp [h]}\left[ f \right] \\&\le {\mathbb {E}}_{ t}^{\sigma _0,\tau ^\sharp [h]}\left[ f \right] + 2\epsilon ' \cdot ||f||_\infty \\&= {\mathbb {E}}_{ t}^{\sigma _0,\tau ^\sharp _2[\pi _2(h)]}\left[ f \right] + 2\epsilon ' \cdot ||f||_\infty \\&\le {{\,\textrm{val}\,}}(\textbf{G}_2)(t) + \epsilon + 2\epsilon ' \cdot ||f||_\infty \\&= {\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ V_T\mid h_T=h\right] + \epsilon + 2\epsilon ' \cdot ||f||_\infty . \end{aligned}$$

The first equality holds because \({\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_T=h}\right) > 0\) thus, by definition of T, no strict prefix \(h'\) of h satisfies \(\phi (h') \ge 1-\epsilon\), hence h is a prefix of the play if and only if \(h_T=h\). The second equality holds by shift-invariance of f. To prove the first inequality, we denote \(X={{\,\textrm{Stay}\,}}_{\ge 0}(\textbf{G}_2)\). Since \(\sigma [h]\) and \(\sigma _0\) coincide on X, and by definition of \(\phi\), \({\mathbb {P}}_{ t}^{\sigma _0,\tau ^\sharp [h]}\left( {X}\right) = {\mathbb {P}}_{ t}^{\sigma [h],\tau ^\sharp [h]}\left( {X}\right) = \phi (h) \ge 1 - \epsilon '\). As a consequence,

$$\begin{aligned} {\mathbb {E}}_{ t}^{\sigma [h],\tau ^\sharp [h]}\left[ f \right]&\le {\mathbb {E}}_{ t}^{\sigma [h],\tau ^\sharp [h]}\left[ f 1_X\right] + \epsilon ' ||f||_\infty \\&= {\mathbb {E}}_{ t}^{\sigma _0,\tau ^\sharp [h]}\left[ f 1_X\right] + \epsilon ' ||f||_\infty \\&\le {\mathbb {E}}_{ t}^{\sigma _0,\tau ^\sharp [h]}\left[ f\right] + 2\epsilon ' ||f||_\infty . \end{aligned}$$

The third equality holds because \(\sigma _0\) guarantees that the play stays in \(\textbf{G}_2\) (i.e. \({{\,\textrm{Stay}\,}}_{\ge 0}(\textbf{G}_2)\)) and this implies that \(\tau ^\sharp [h]\) coincides with \(\tau ^\sharp _2[\pi _2(h)]\). The second inequality is by \(\epsilon\)-subgame optimality of \(\tau ^\sharp _2\) in \(\textbf{G}_2\). The last equality holds by definition of \(V_T\).

Since this holds for every possible value h of \(h_T\) when \(T<\infty\), and there are at most countably many such values, the inequality (21) follows. \(\square\)

5.5 On plays consistent with the merge strategy and switching infinitely often between the two subgames

In this section, we provide an upper-bound on the payoff of plays which switch infinitely often between \(\textbf{G}_1\) and \(\textbf{G}_2\).

Lemma 5.8

Assume that f is shift-invariant and submixing. For all strategies \(\sigma\),

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {f \le \max \{{{\,\textrm{val}\,}}(\textbf{G}_1)(\tilde{s}),{{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\} + \epsilon \ \vert \ {{\,\textrm{Switch}\,}}}\right) =1. \end{aligned}$$
(22)

Proof

By definition of \({{\,\textrm{Switch}\,}}\), if \({{\,\textrm{Switch}\,}}\) occurs then both projections \(\Pi _1\) and \(\Pi _2\) are infinite and visit \({\tilde{s}}\) infinitely often. According to the inequality (13) in Lemma 5.4, to prove (22) it is enough to show, for every \(i\in \{1,2\}\),

$$\begin{aligned}&{\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {f(\Pi _i) \le {{\,\textrm{val}\,}}(\textbf{G}_i)({\tilde{s}})+ \epsilon \ \vert \ \Pi _i\text { is infinite and reaches }{\tilde{s}}\text { infinitely often}}\right) =1 . \end{aligned}$$
(23)

By symmetry, it is enough to show (23) when \(i=1\). For that, we define a strategy \(\sigma _1\) in \(\textbf{G}_1\) such that for every measurable event \({\mathcal {E}}_1\) in the game \(\textbf{G}_1\),

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left( {{\mathcal {E}}_1}\right) \ge {\mathbb {P}}_{\tilde{s}}^{\sigma ,\tau ^\sharp }\left( {\Pi _1\text { is infinite and }\Pi _1\in \mathcal E_1 }\right) . \end{aligned}$$
(24)

The definition of \(\sigma _1\) is a cornerstone of the whole proof of Theorem 5.3 and to make this definition clear, we provide first before the formal definition of \(\sigma _1\) an informal description of \(\sigma _1\) and an example. Assume Player 1 has to choose the next action after a finite play \(h_1\) in \(\textbf{G}_1\) between two actions a and b. Denote \(p_a(h_1)\) (resp. \(p_b(h_1)\)) the probability that the projection \(\Pi _1\) of the play from \(\textbf{G}\) on \(\textbf{G}_1\) admits \(h_1a\) (resp. \(h_1b\)) as a prefix, when playing \(\sigma\) and \(\tau ^\sharp\). Then \(\sigma _1(h_1)\) selects the action a with probability proportional to \(p_a\) i.e. equal to \(p_a / (p_a+p_b)\), while b is selected with probability \(p_b / (p_a+p_b)\).

As an example, consider a game with a single state s controlled by Player 1 (hence \(\tau ^\sharp\) is trivial) on which there are three loops on actions abc which are partitioned into \(\textbf{A}_1 = \{a,b\}\) and \(\textbf{A}_2 = \{c\}\). Consider the strategy \(\sigma\) in \(\textbf{G}\) which plays the uniform lottery on \(\{a,b,c\}\) for the first step, and then repeat the corresponding letter forever. The corresponding strategy \(\sigma _1\) in \(\textbf{G}_1\) plays the uniform lottery on \(\{a,b\}\) and then repeat the corresponding letter forever. Consider a more involved example in the same game but with a different strategy \(\sigma\): this time \(\sigma\) initially plays the uniform lottery on \(\{a,b,c\}\). If a (resp. b resp. c) is chosen, then letter b (resp. c resp. a) is repeated forever. What is the probability that \(\sigma _1\) plays initially the action a? The letter a is a prefix of the projection \(\Pi _1\) in \(\textbf{G}_1\) in exactly two cases: if in \(\textbf{G}\) either the letter a or the letter c is picked up first. Otherwise the projection to \(\textbf{G}_1\) starts with b. Hence \(\sigma _1(s)(a)=2/3\).

Formally, the definition of \(\sigma _1\) relies on the prefix relation \(\preceq\) and the strict prefix relation \(\prec\) over finite or infinite plays. Given a finite play \(h_1\) in \(\textbf{G}_1\), we consider the events \(\{h_1\preceq \Pi _1\}\) and \(\{h_1\prec \Pi _1\}\). The event \(h_1 \prec \Pi _1\) means that not only \(h_1\) appears as a prefix of the projection of the play on \(\textbf{G}_1\), but moreover at least one more action has been played in \(\textbf{G}_1\) after that, i.e. \(\{h_1 \prec \Pi _1\}= \{\exists b \in \textbf{A}, h_1b\preceq \Pi _1\}\). The inclusion \(\{h_1 \prec \Pi _1\}\subseteq \{h_1 \preceq \Pi _1\}\) implies

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1\prec \Pi _1}\right) \le {\mathbb {P}}_{\tilde{s}}^{\sigma ,\tau ^\sharp }\left( {h_1\preceq \Pi _1}\right) . \end{aligned}$$
(25)

This inequality might be strict: for example if \({{\,\textrm{Stay}\,}}_{\ge 0}(\textbf{G}_2)\) holds, i.e. if the play always stay in \(\textbf{G}_2\), then the event \(\{{\tilde{s}}\prec \Pi _1\}\) has probability 0 while the event \({\tilde{s}}\preceq \Pi _1\) has probability 1.

The strategy \(\sigma _1\) in \(\textbf{G}_1\) is defined as:

$$\begin{aligned} \sigma _1(h_1)(a) = {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1a\preceq \Pi _1\ \vert \ h_1 \prec \Pi _1}\right) , \end{aligned}$$

if \({\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1 \prec \Pi _1}\right) >0\) and otherwise \(\sigma _1(h_1)\) is chosen arbitrarily. Remark that in general, \(\sigma _1\) is a mixed strategy.

We proceed with the proof of (24). Let \({\mathfrak {E}}\) be the set of measurable events \({\mathcal {E}}_1\) in \(\textbf{G}_1\) for which (24) holds. We prove first that \({\mathfrak {E}}\) contains all cylinders \(h_1(\textbf{S}\textbf{A})^\omega\) of \(\textbf{G}_1\), which relies on the following inequalities:

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {\Pi _1\text { is infinite and }\Pi _1 \in h_1(\textbf{S}\textbf{A})^\omega }\right)&\le {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1\preceq \Pi _1}\right) \nonumber \\&\le {\mathbb {P}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left( {h_1}\right) . \end{aligned}$$
(26)

We abuse the notation and denote \(h_1\) the event \(\{h_1 \text { is a prefix of the play}\}\). The first inequality is by definition of prefixes. Remark that this inequality might be strict, in case \(\Pi _1\) is finite i.e. in case the play ultimately stays in \(\textbf{G}_2\). The second inequality (26) is proved by induction on the length of \(h_1\). When \(h_1\) is the single initial state \({\tilde{s}}\) then both terms in (26) are equal to 1, and the inequality is an equality. Let \(h_1 a r\) be a finite play in \(\textbf{G}_1\) and assume that (26) holds for \(h_1\). There are two cases, depending who controls the last state of \(h_1\), denoted t. In case t is controlled by Player 1 then

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left( {h_1ar}\right)&= {\mathbb {P}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left( {h_1}\right) \cdot \sigma _1(h_1)(a)\cdot p\left( t,a,r\right) \nonumber \\&\ge {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1\preceq \Pi _1}\right) \cdot \sigma _1(h_1)(a) \cdot p\left( t,a,r\right) \nonumber \\&= {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1\preceq \Pi _1}\right) \cdot {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1a\preceq \Pi _1\ \vert \ h_1 \prec \Pi _1}\right) \cdot p\left( t,a,r\right) \nonumber \\&\ge {\mathbb {P}}_{\tilde{s}}^{\sigma ,\tau ^\sharp }\left( {h_1\prec \Pi _1}\right) \cdot {\mathbb {P}}_{\tilde{s}}^{\sigma ,\tau ^\sharp }\left( {h_1a\preceq \Pi _1\ \vert \ h_1 \prec \Pi _1}\right) \cdot p\left( t,a,r\right) \nonumber \\&= {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1a\preceq \Pi _1}\right) \cdot p\left( t,a,r\right) \nonumber \\&= {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1ar\preceq \Pi _1}\right) , \end{aligned}$$
(27)

where the first and last equalities hold by definition of the probability measure, the first inequality by induction hypothesis and the second equality is by definition of \(\sigma _1\). The second inequality (27) holds because of the inclusion \(\{h_1 \prec \Pi _1\}\subseteq \{h_1 \preceq \Pi _1\}\) already discussed previously [see (25)].

Now we prove the inequality (26), in case t is controlled by Player 2. For every finite play \(h'_1\) in \(\textbf{G}_1\), denote \(C(h'_1)\) the set of finite plays \(h'\) in \(\textbf{G}\) starting in \({\tilde{s}}\) and such that \(\pi _1\) projects \(h'\) on \(h'_1\), but no strict prefix of \(h'\) is projected on \(h'_1\).

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left( {h_1ar}\right)&= {\mathbb {P}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left( {h_1}\right) \cdot \tau ^\sharp _1(h_1)(a)\cdot p\left( t,a,r\right) \nonumber \\&\ge {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1\preceq \Pi _1}\right) \cdot \tau ^\sharp _1(h_1)(a) \cdot p\left( t,a,r\right) \nonumber \\&= \sum _{h' \in C(h_1)} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h'}\right) \cdot \tau ^\sharp _1(h_1)(a) \cdot p\left( t,a,r\right) \nonumber \\&= \sum _{h' \in C(h_1)} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h'}\right) \cdot \tau ^\sharp (h')(a) \cdot p\left( t,a,r\right) \nonumber \\&= \sum _{h' \in C(h_1)} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h'ar}\right) \nonumber \\&\ge \sum _{h'' \in C(h_1ar)} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h''}\right) \nonumber \\&= {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {h_1ar\preceq \Pi _1}\right) . \end{aligned}$$
(28)

The first equality is by definition of the probability measure. The first inequality is by induction hypothesis. The second equality holds because the event \(h_1\preceq \Pi _1\) is the disjoint union of the events \((h')_{h'\in C(h_1)}\): if the projection of an infinite play h to \(\textbf{G}_1\) starts with \(h_1\), then there is a single prefix of this play in \(C(h_1)\), this is the shortest (finite) prefix of h whose projection in \(\textbf{G}_1\) is \(h_1\). The last equality holds by a similar argument. The third equality is by definition of \(\tau ^\sharp\), since \(\forall h' \in C(h_1), \pi _1(h')=h_1\). The fourth equality is by definition of the probability measure. The second inequality (28) relies on the inclusion \(C(h_1ar) \subseteq \{h'ar\mid h'\in C(h_1)\}\) (the converse inclusion actually holds as well but there is no need to prove it). Take \(h''\in C(h_1ar)\) and write \(h''=h'a'r'\), where \(a'\) and \(r'\) are the last action and state of \(h''\) and \(h'\) is the remaining prefix. We prove first that \(\pi _1(h')=h_1\). Denote b the action played in \(h''\) after the last visit to \({\tilde{s}}\). Then b belongs to either \(\textbf{A}_1\) or \(\textbf{A}_2\) and by definition of \(\pi _1\), in the first case \(\pi _1(h'')=\pi _1(h')a'r'\) while in the second case \(\pi _1(h'')=\pi _1(h')\). By minimality of \(h''\) among plays projecting on \(h_1ar\), we can rule out the second case hence \(\pi _1(h'')=\pi _1(h')a'r'\). Since \(\pi _1(h'')= h_1 ar\) then \(\pi _1(h')=h_1\) and \(a=a'\) and \(r=r'\). We prove by contradiction that \(h'\in C(h_1)\). Assume otherwise and let \(h'_\ell \in C(h_1)\) be the shortest prefix of \(h'\) which projects on \(h_1\) as well and bq an action and a state such that \(h'_\ell b q \preceq h'\). Remark that the action after the last visit to \({\tilde{s}}\) in the play \(h'_\ell b q\) is in \(\textbf{A}_2\) (because \(\pi _1(h'_\ell b q)=\pi _1(h'_\ell )\)) while in the play \(h'_\ell\) it is in \(\textbf{A}_1\) (by minimality of \(h'_\ell\)). As a consequence, this last visit of the play \(h'_\ell b q\) to \({\tilde{s}}\) occurs at the end of \(h'_\ell\) and \(t = {\tilde{s}}\), a contradiction since t is controlled by Player 2. Thus \(h'\in C(h_1)\). This holds for every \(h''\in C(h_1ar)\), thus we have established the inclusion \(C(h_1ar) \subseteq \{h'ar\mid h'\in C(h_1)\}\) which completes the proof of the inequality (28).

This completes the proof of the inequality (26) in the second and last case, hence (24) holds when \(\mathcal E_1\) is a cylinder.

Observe that \({\mathfrak {E}}\) is stable by finite disjoint unions, hence \({\mathfrak {E}}\) contains all finite disjoint unions of cylinders, which forms a boolean algebra. Moreover \({\mathfrak {E}}\) is a monotone class, so we can apply the monotone class theorem (see for example (Billingsley 2008, Theorem 3.4)). This implies that \({\mathfrak {E}}\) contains the sigma-field that is generated by cylinders, which by definition is the set of all measurable events in the game \(\textbf{G}_1\). This completes the proof of (24).

Next we prove that

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left( {f \le \liminf _n {{\,\textrm{val}\,}}(\textbf{G}_1)(S_n)+ \epsilon }\right) =1. \end{aligned}$$
(29)

Observe that due to the fact that \(\tau ^\sharp _1\) is \(\epsilon\)-subgame-perfect and that f is shift-invariant, then for all \(n\in {\mathbb {N}}\),

$$\begin{aligned} {\mathbb {E}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left[ f\ \vert \ S_0,A_0,\ldots ,S_n\right] ={\mathbb {E}}_{S_n}^{\sigma _1[S_0\cdots S_n],\tau ^\sharp _1[S_0\cdots S_n]}\left[ f\right] \le {{\,\textrm{val}\,}}(\textbf{G}_1)(S_n)+\epsilon , \end{aligned}$$

and as a consequence,

$$\begin{aligned} \liminf _n {\mathbb {E}}_{{\tilde{s}}}^{\sigma _1,\tau ^\sharp _1}\left[ f\ \vert \ S_0,A_0,\ldots ,S_n\right] \le \liminf _n {{\,\textrm{val}\,}}(\textbf{G}_1)(S_n)+\epsilon . \end{aligned}$$
(30)

According to Lévy’s 0-1 law (see e.g. (Williams 1991, Theorem 14.4)), the sequence of random variables: \(({\mathbb {E}}_{\tilde{s}}^{\sigma _1,\tau ^\sharp _1}\left[ f\ \vert \ S_0,A_0,\ldots ,S_n\right] )_{n\in {\mathbb {N}}}\) converges point-wise to the random variable \(f(S_0A_0S_1\cdots )\). As a consequence the left hand side of (30) is almost-surely equal to f and we get (29).

Denote \({\mathcal {E}}_1\) the event

$$\begin{aligned} {\mathcal {E}}_1 =\{ f > {{\,\textrm{val}\,}}(\textbf{G}_1)({\tilde{s}})+ \epsilon \text { and }{\tilde{s}}\text { is reached infinitely often} \}. \end{aligned}$$

According to (29), \({\mathbb {P}}_{\tilde{s}}^{\sigma _1,\tau ^\sharp _1}\left( {{\mathcal {E}}_1}\right) =0\). We apply (24) to \({\mathcal {E}}_1\) and get

$$\begin{aligned} {\mathbb {P}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left( {\Pi _1\text { is infinite and }\Pi _1\in {\mathcal {E}}_1 }\right) =0. \end{aligned}$$

By definition of \({\mathcal {E}}_1\), this last equality is equivalent to (23) with \(i=1\). The proof for the case \(i=2\) is symmetric. \(\square\)

5.6 Proof of Theorem 5.3

Proof of Theorem 5.3

To prove the first statement (9) in Theorem 5.3, we combine the two lemmas proved in the two previous sections in order to show:

$$\begin{aligned} \forall \sigma , {\mathbb {E}}_{{\tilde{s}}}^{\sigma ,\tau ^\sharp }\left[ f\right] \le \max \{{{\,\textrm{val}\,}}(\textbf{G}_1)({\tilde{s}}),{{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\}+ \epsilon . \end{aligned}$$
(31)

The bound (31) can be obtained as follows. According to Lemma 5.4, the three events \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_1),{{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\) and \({{\,\textrm{Switch}\,}}\) partition the set of infinite plays. In case \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_1)\) occurs, Lemma 5.7 guarantees that the expected payoff is no more than \({{\,\textrm{val}\,}}(\textbf{G}_1)({\tilde{s}})+\epsilon\). By symmetry, in case \({{\,\textrm{Stay}\,}}_\omega (\textbf{G}_2)\) occurs, the expected payoff is no more than \({{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})+\epsilon\). And in case \({{\,\textrm{Switch}\,}}\) occurs, Lemma 5.8 guarantees that the payoff is almost-surely no more than \(\max \{{{\,\textrm{val}\,}}(\textbf{G}_1)({\tilde{s}}), {{\,\textrm{val}\,}}(\textbf{G}_2)(\tilde{s})\} + \epsilon\). Thus (31) holds. The inequality

$$\begin{aligned} {{\,\textrm{val}\,}}(\textbf{G})({\tilde{s}}) \ge \max \{{{\,\textrm{val}\,}}(\textbf{G}_1)(\tilde{s}),{{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}})\} \end{aligned}$$

is clear, since Player 1 has more choice in \(\textbf{G}\) than he has in \(\textbf{G}_1\) and \(\textbf{G}_2\). And \(\epsilon\) can be chosen arbitrarily small in (31), hence the first statement (9) of Theorem 5.3.

We proceed with the second statement of Theorem 5.3. Assume that

$$\begin{aligned} {{\,\textrm{val}\,}}(\textbf{G}_1)({\tilde{s}})\ge {{\,\textrm{val}\,}}(\textbf{G}_2)({\tilde{s}}). \end{aligned}$$
(32)

We have to show (10), i.e.

$$\begin{aligned} \forall s \in \textbf{S}, {{\,\textrm{val}\,}}(\textbf{G})( s)={{\,\textrm{val}\,}}(\textbf{G}_1)( s). \end{aligned}$$

According to (9), we already know that this equality holds for \({\tilde{s}}\), and we shall extend it to all states \(s\in \textbf{S}\).

Recall that the merge strategy was defined only for plays that start in state \({\tilde{s}}\); we enlarge this definition, profiting from the assumption (32). First, extend the definition of \({{\,\textrm{last}\,}}(h)\) to any play h that has visited \({\tilde{s}}\) at least once, in which case \({{\,\textrm{last}\,}}(h)\) denotes the action that is played right after the last visit of h to \({\tilde{s}}\). Second, for all finite plays h that end in a state controlled by Player 2,

$$\begin{aligned} \tau ^\sharp (h) \mathop {=}\limits ^{\textsf {def}}{\left\{ \begin{array}{ll} \tau ^\sharp _1(\pi _1(h)) &{} \text { if }h\text { never visited }{\tilde{s}}\text { or }{{\,\textrm{last}\,}}(h)\in \textbf{A}_1 \\ \tau ^\sharp _{2}(\pi _{2}(h)) &{} \text { if }h\text { has visited }\tilde{s}\text { at least once and }{{\,\textrm{last}\,}}(h)\in \textbf{A}_{2}. \end{array}\right. } \end{aligned}$$

The merge strategy is well-defined because if h never visited \({\tilde{s}}\) or if \({{\,\textrm{last}\,}}(h)\in \textbf{A}_1\) then both h and \(\pi _1(h)\) end with the same state, controlled by Player 2. And if h has visited \({\tilde{s}}\) at least once and \({{\,\textrm{last}\,}}(h)\in \textbf{A}_2\) then both h and \(\pi _2(h)\) end with the same state, controlled by Player 2.

We prove that \(\tau ^\sharp\) guarantees a payoff smaller than \({{\,\textrm{val}\,}}(\textbf{G}_1)(s)+\epsilon\) for every state s. Fix \(\sigma\) a strategy for Player 1 in \(\textbf{G}\), and define \(\sigma '\) to be the strategy that plays like \(\sigma\) as long as the play does not reach the pivot state \({\tilde{s}}\). As soon as the pivot state is reached, the strategy \(\sigma '\) switches definitively to a strategy \(\sigma ^\sharp _1\) that is optimal in the game \(\textbf{G}_1\), whose existence is guaranteed by the induction hypothesis. We shall prove that, for every \(s\in S\),

$$\begin{aligned} {\mathbb {E}}_{s}^{\sigma ,\tau ^\sharp }\left[ f\right] \le {\mathbb {E}}_{s}^{\sigma ',\tau ^\sharp }\left[ f\right] +\epsilon . \end{aligned}$$
(33)

Since the strategies \(\sigma\) and \(\sigma '\) coincide on those plays that never reach \({\tilde{s}}\), to get (33) it is enough to prove that for every finite play h starting from s and reaching \({\tilde{s}}\) for the first time at the end of h,

$$\begin{aligned} {\mathbb {E}}_{s}^{\sigma ,\tau ^\sharp }\left[ f\ \vert \ h\right] \le {\mathbb {E}}_{s}^{\sigma ',\tau ^\sharp }\left[ f\ \vert \ h\right] +\epsilon . \end{aligned}$$
(34)

The inequality (34) holds because

$$\begin{aligned} {\mathbb {E}}_{s}^{\sigma ,\tau ^\sharp }\left[ f\ \vert \ h\right]&= {\mathbb {E}}_{{\tilde{s}}}^{\sigma [h],\tau ^\sharp [h]}\left[ f\right] \\&\le {{\,\textrm{val}\,}}(\textbf{G}_1)({\tilde{s}})+\epsilon \\&\le {\mathbb {E}}_{{\tilde{s}}}^{\sigma ^\sharp _1,\tau ^\sharp _1[h]}\left[ f\right] +\epsilon \\&={\mathbb {E}}_{{\tilde{s}}}^{\sigma '[h],\tau ^\sharp _1[h]}\left[ f\right] +\epsilon \\&= {\mathbb {E}}_{s}^{\sigma ',\tau ^\sharp }\left[ f\ \vert \ h\right] +\epsilon . \end{aligned}$$

The first and third equalities hold because f is shift-invariant. The first inequality holds because the strategy \(\tau ^\sharp [h]\) is \(\epsilon\)-optimal from state \({\tilde{s}}\), for the following reason. The strategy \(\tau ^\sharp [h]\) coincides with the strategy obtained by merging \(\tau ^\sharp _1[h]\) and \(\tau ^\sharp _2\) on the pivot state \({\tilde{s}}\), both of which are \(\epsilon\)-subgame-perfect in the respective subgames. Since (31) was proved for any merge of two \(\epsilon\)-subgame-perfect strategies, we can apply (31) to the strategy \(\tau ^\sharp [h]\), and conclude that the latter is \(\epsilon\)-optimal from state \({\tilde{s}}\). The second inequality holds because \(\sigma ^\sharp _1\) is optimal in \(\textbf{G}_1\). The second equality holds because \(\sigma '[h]=\sigma ^\sharp _1\). Finally we have proved (33).

The plays consistent with \(\sigma '\) and \(\tau ^\sharp\) stay in the subgame \(\textbf{G}_1\). Since \(\tau ^\sharp\) coincides with \(\tau ^\sharp _1\) on plays staying in \(\textbf{G}_1\), and since \(\tau ^\sharp _1\) is \(\epsilon\)-optimal in \(\textbf{G}_1\), we can write for all \(s\in S\):

$$\begin{aligned} {\mathbb {E}}_{s}^{\sigma ',\tau ^\sharp }\left[ f\right] ={\mathbb {E}}_{s}^{\sigma ',\tau ^\sharp _1}\left[ f\right] \le {{\,\textrm{val}\,}}(\textbf{G}_1)(s) +\epsilon . \end{aligned}$$
(35)

With (33) this shows that for all s,

$$\begin{aligned} {\mathbb {E}}_{s}^{\sigma ,\tau ^\sharp }\left[ f\right] \le {{\,\textrm{val}\,}}(\textbf{G}_1)(s)+2\epsilon . \end{aligned}$$
(36)

This holds for every strategy \(\sigma\) and \(\epsilon >0\) arbitrarily small, thus \({{\,\textrm{val}\,}}(\textbf{G})(s)\le {{\,\textrm{val}\,}}(\textbf{G}_1)(s)\), and moreover \(\tau ^\sharp\) is optimal in \(\textbf{G}\), from every state s. The converse inequality is obvious, because Player 1 has more freedom in \(\textbf{G}\) than in \(\textbf{G}_1\), hence the second statement (10) of Theorem 5.3. \(\square\)

5.7 Remarks about the merge strategy

We observe a byproduct of inequality (36) in the proof of Theorem 5.3.

Observation 5.9

The merge strategy \(\tau ^\sharp\) constructed with \(\epsilon\)-subgame-perfect pieces is \(2\epsilon\)-optimal in the game \(\textbf{G}\).

After this observation, since the merge strategy is obtained by merging two \(\epsilon\)-subgame-perfect strategies, a natural question to ask is whether \(\tau ^\sharp\) is \(2\epsilon\)-subgame-perfect in the \(\textbf{G}\)? The answer is negative; consider the following simple example:

figure c

The goal of Player 1 is to visit the state t infinitely often (say that if he achieves this goal he receives a payoff 1, otherwise 0), and every action is deterministic. The blue state is controlled by Player 1, and the red ones by his opponent. In the subgame \(\textbf{G}_1\) we remove the action \(s\rightarrow t\). In particular in the game \(\textbf{G}_1\) the positional strategy \(\tau ^\sharp _1\) which chooses \(u\rightarrow s\) and \(t\rightarrow s\) is subgame-perfect. We can therefore use it to construct a merge strategy \(\tau ^\sharp\). However this merge strategy is not \(2\epsilon\)-subgame-perfect, since in case Player 1 uses the sub-optimal action \(s\rightarrow u\), his opponent does not profit by taking the self-loop forever.

6 Finite memory transfer theorem

The construction of the merge strategy in the previous section reveals that games that are equipped with shift-invariant and submixing payoffs have the following interesting property. While they yield very simple optimal strategies for Player 1, they allow his opponent to recombine strategies that work for one-player games (also known as Markov decision processes) and use them in a two-player game! We give the proof of this theorem that was announced in the “Introduction”:

Theorem 1.2

Let f be a payoff function that is both shift-invariant and submixing. Assume that in all games equipped with f and fully controlled by the minimizer, the minimizer has optimal strategies with finite memory. Then the minimizer has the same in all games equipped with f.

We prove a slightly stronger theorem and derive Theorem 1.2 as a corollary. An arena is said to be controlled by Player 2 if in every state that belongs to his opponent there is only one action available. (In other words these arenas are one-player games, or Markov decision processes).

Theorem 6.1

Let f be a shift-invariant and submixing payoff function. If for all \(\epsilon >0\), Player 2 has an \(\epsilon\)-optimal strategy with finite memory in every game controlled by himself, then in every (two-player) game he has an \(\epsilon\)-subgame-perfect strategy that has finite memory.

The statement also holds for \(\epsilon =0\), that is: if Player 2 has a finite-memory optimal strategy in every game controlled by himself, then in every (two-player) game he has a subgame-perfect strategy with finite memory.

Proof

The proof is by induction on actions of \(\textbf{G}\), by defining the smaller games \(\textbf{G}_1\) and \(\textbf{G}_2\) as in the proof of the main theorem in the previous section.

We assume w.l.o.g. that \(\textbf{G}\) is value-preserving for Player 2, in the sense of Definition 4.5. If this is not the case initially then we simply remove from \(\textbf{G}\) the actions available in the states of Player 2 which are not value-preserving, which does not change the value of the states of the game. The \(\epsilon\)-subgame-perfect strategies of Player 2 in the value-preserving game are \(\epsilon\)-subgame-perfect in the original game.

The base of the induction follows from the assumption of the theorem, the induction hypothesis says that there are two \(\epsilon\)-subgame perfect strategies \(\tau ^\sharp _1\) and \(\tau ^\sharp _1\) with finite memory, given by the transducers:

$$\begin{aligned} (\mathcal {M}_1,{{\,\textrm{init}\,}}_1,{{\,\textrm{up}\,}}_1,{{\,\textrm{out}\,}}_1)\qquad \text {and}\qquad (\mathcal {M}_2,{{\,\textrm{init}\,}}_2,{{\,\textrm{up}\,}}_2,{{\,\textrm{out}\,}}_2), \end{aligned}$$

for Player 2 in \(\textbf{G}_1\) and \(\textbf{G}_2\) respectively. The strategy \(\tau ^\sharp\) obtained by merging \(\tau ^\sharp _1\) and \(\tau ^\sharp _2\) is also a finite-memory strategy, whose memory is

$$\begin{aligned} \mathcal {M}\mathop {=}\limits ^{\textsf {def}}\{1,2\}\times \mathcal {M}_1\times \mathcal {M}_2. \end{aligned}$$

The initial memory state in state s is \((0,{{\,\textrm{init}\,}}_1(s),{{\,\textrm{init}\,}}_2(s))\). The updates on the components \(\mathcal {M}_1\) and \(\mathcal {M}_2\) are performed with \({{\,\textrm{up}\,}}_1\) and \({{\,\textrm{up}\,}}_2\) respectively. The first component is updated only when the play leaves the pivot state \({\tilde{s}}\); it is switched to 1 or 2 depending whether Player 1 chooses an action in \(\textbf{A}_1\) or \(\textbf{A}_2\). The choice of action, or the output, depends on the first component: in memory state \((b,m_1,m_2)\) the action played by the strategy is \({{\,\textrm{out}\,}}_b(m_b)\).

According to Observation 5.9, \(\tau ^\sharp\) is \(2\epsilon\)-optimal. According to Proposition 4.8, there exists a finite-memory \(4\epsilon\)-subgame-perfect strategy in \(\textbf{G}\). Since this holds for any \(\epsilon >0\), it concludes the induction step.

The second part of the theorem follows similarly thanks to Proposition 4.8. \(\square\)

6.1 On the size of the memory

How large is the memory \(\mathcal {M}_\textbf{G}\) needed by Player 2 to play optimally in some \(\textbf{G}=({\mathcal {A}},f)\)? Every deterministic and stationary strategy \(\sigma :S_1 \rightarrow A\) for Player 1 in \(\textbf{G}\) induces a game \(\textbf{G}_\sigma\) that is controlled by Player 2. Let \({\mathfrak {M}}\) be the maximal memory size required by Player 2 to play optimally in the games \(G_\sigma\). According to the proof of the theorem above, the memory \(\mathcal {M}_\textbf{G}\) needed by Player 2 to play optimally in \(\textbf{G}\) is of size \(2\cdot |\mathcal {M}_{\textbf{G}_1}| \cdot |\mathcal {M}_{\textbf{G}_2}|\). We can iterate the construction on the same state \({\tilde{s}}\) and partition \(A({\tilde{s}})\) until all actions sets are singletons. For every action \(a \in A({\tilde{s}} )\) there is a subgame \(\textbf{G}_a\) where Player 1 always play action a when reaching \({\tilde{s}}\). Denote \(\mathcal {M}_{\textbf{G}_a}\) the memory needed Player 2 to play optimally in the subgame \(\textbf{G}_a\). To play optimally in \(\textbf{G}\), it is enough for Player 2 to remember which action in \(A({\tilde{s}})\) was chosen by Player 1 last time the play reached \({\tilde{s}}\), plus the memory states of its finite memory strategies in the subgames \(\textbf{G}_a, a \in A({\tilde{s}})\), which can be implemented with the memory states

$$\begin{aligned} \mathcal {M}_\textbf{G}= A({\tilde{s}}) \times \prod _{a \in A({\tilde{s}})} \mathcal {M}_{\textbf{G}_a}. \end{aligned}$$

Inductively, Player 2 can play optimally by remembering the last action choices in every state controlled by Player 1 as a mapping \(\sigma :S_1 \rightarrow A\), such that \(\forall s\in S_1, \sigma (s) \in A(s)\), and for every possible such mapping \(\sigma\), the memory state in \(1\ldots {\mathfrak {M}}\) of an optimal strategy in the one-player game \(\textbf{G}_\sigma\). That leads to the following bound on the memory of Player 2. Enumerate \(S_1\) as \(s_1,s_2,\ldots , s_m\) then

$$\begin{aligned} |\mathcal {M}_\textbf{G}| \le | A(s_1)|\cdot | A(s_2) | \cdots | A(s_m) |\cdot | {{\mathfrak {M}}}^{ A(s_1)\times A(s_2) \ldots \times A(s_m)} | \le | A|^{|S_1|} \cdot {{\mathfrak {M}}}^{|A|^{|S_1|}} . \end{aligned}$$

When \({\mathfrak {M}}=1\), i.e. when Player 2 has deterministic and stationary strategies in games he controls, then in Gimbert and Zielonka (2005) it is shown that the same holds for two player games as well, hence the upper-bound can be downsized to 1 instead of \(| A|^{|S_1|}\). In the general case where \({\mathfrak {M}}\ge 2\), we do not have examples where the memory size required by Player 2 to play optimally has the same order of magnitude as the upper bound above.