Subgame Optimal Strategies in Finite Concurrent Games with Prefix-Independent Objectives

Bordais, Benjamin; Bouyer, Patricia; Roux, Stéphane Le

doi:10.1007/978-3-031-30829-1_26

Benjamin Bordais⁹,
Patricia Bouyer⁹ &
Stéphane Le Roux⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13992))

Included in the following conference series:

International Conference on Foundations of Software Science and Computation Structures

1545 Accesses

Abstract

We investigate concurrent two-player win/lose stochastic games on finite graphs with prefix-independent objectives. We characterize subgame optimal strategies and use this characterization to show various memory transfer results: 1) For a given (prefix-independent) objective, if every game that has a subgame almost-surely winning strategy also has a positional one, then every game that has a subgame optimal strategy also has a positional one; 2) Assume that the (prefix-independent) objective has a neutral color. If every turn-based game that has a subgame almost-surely winning strategy also has a positional one, then every game that has a finite-choice (notion to be defined) subgame optimal strategy also has a positional one.

We collect or design examples to show that our results are tight in several ways. We also apply our results to Büchi, co-Büchi, parity, mean-payoff objectives, thus yielding simpler statements.

You have full access to this open access chapter, Download conference paper PDF

The Complexity of Ergodic Mean-payoff Games

The Complexity of Infinitely Repeated Alternating Move Games

Solving Mean-Payoff Games via Quasi Dominions

1 Introduction

Turn-based two-player win/lose (stochastic) games on finite graphs have been intensively studied in the context of model checking in a broad sense [1, 19]. These games behave well regarding optimality in various settings. Most importantly for this paper, [14] proved the following results for finite turn-based stochastic games with prefix-independent objectives: (1) every game has deterministic optimal strategies; (2) from every value-1 state, there is an optimal, i.e. almost-surely winning, strategy; (3) if from every value-1 state of every game there is an optimal strategy using some fixed amount of memory, every game has an optimal strategy using this amount of memory. These results are of either of the following generic forms:

In all games, (from all nice states) there is a nice strategy.
If from all nice states of all games there is a nice strategy, so it is from all states.

The concurrent version of these turn-based (stochastic) games has a higher modeling power than the turn-based version: this is really useful in practice since real-world systems are intrinsically concurrent [15]. They are played on a finite graph as follows: at each player state, the two players stochastically and independently choose one among finitely many actions. This yields a Nature state, which stochastically draws a next player state, from where each player chooses one action again, and so on. Each player state is labelled by a color, and who wins depends on the infinite sequence of colors underlying the (stochastically) generated infinite sequence of player states. Unfortunately, these concurrent games do not behave well in general even for simple winning conditions and simple graph structures, like finite graphs:

Reachability objectives: there is a game without optimal strategies [13];
Büchi objectives: there is a game with value 1 while all finite-memory strategies have value 0 [12];
Co-Büchi objectives: although there are always positional $\varepsilon $-optimal strategies [8], there is a game with optimal strategies but without finite-memory optimal strategies [4];
Parity [12] and mean-payoff [10] objectives: there is a game with subgame almost-surely-winning strategies, but where all finite-memory strategies have value 0.

In this paper, we focus on concurrent stochastic finite games. Therefore, the generic forms of our results will be more complex, in order to take into account the above-mentioned discrepancies. They will somehow be given as generic statements as follows:

Every game that has a nice strategy also has a nicer one.
If all special games that have a nice strategy have a nicer one, so it is for all games.

Much of the difficulty consists in fine-tuning the strength of “nice”, “nicer” and “special” above. We present below our main contributions on finite two-player win/lose concurrent stochastic games with prefix-independent objectives:

1.
We provide a characterization of subgame optimal strategies, which are strategies that are optimal after every history (Theorem 1): a Player $\textsf{A} $ strategy is subgame optimal iff 1) it is locally optimal and 2) for every Player $\textsf{B} $ deterministic strategy, after every history, if the visited states have the same positive value, Player $\textsf{A} $ wins with probability 1. This characterization is used to prove all the results below.
2.
We prove memory transfer results from subgame almost-surely winning strategies to subgame optimal strategies:
1. (a)
  Theorem 2: If every game that has a subgame almost-surely winning strategy also has a positional one, then every game that has a subgame optimal strategy also has a positional one.
2. (b)
  Corollary 1: every Büchi or co-Büchi game that has a subgame optimal strategy has a positional one. (Whereas parity games may require infinite memory [12].)
Note that the transfer result 2a can be generalized from positional to finite memory .
3.
We say that a strategy has finite-choice, if it uses only finitely many action distributions. Note that finite-memory (resp. deterministic) strategies clearly have finite choice.
1. (a)
  Theorem 4: In a given game, if there is a finite-choice optimal strategy, there is a finite-choice subgame optimal strategy.
2. (b)
  Theorem 5: Assume that the objective has a neutral color. If every turn-based game that has a subgame almost-surely winning strategy also has a positional one, then every game that has a finite-choice subgame optimal strategy also has a positional one.
3. (c)
  Corollary 2: every parity or mean-payoff game that has a finite-memory subgame optimal strategy also has a positional one.
Note that 3a and 3b are false if the word finite-choice is removed [4]. The proof of 3b invokes 3a. Flavor (and proofs) of 3b and 2a are similar, but both premises and conclusions are weakened in 3b, as emphasized.

Related works. A large part of this paper is dedicated to the extension to concurrent games of the results from [14] regarding the transfer of memory from almost-surely winning strategies to optimal strategies in turn-based games. Note that the proof technique used in [14] is different and could not be adapted to our more general setting. In their proof, both players agree on a preference over Nature states and play according to this preference. In our proof, we slice the graph into value areas (that is, sets of states with the same value), and show that it is sufficient to play an almost-sure winning strategy in each slice; we then glue these (partial) strategies together to get a subgame-optimal strategy over the whole graph.

The slicing technique was already used in the context of concurrent games in [8]. The authors focus on parity objectives and establishes a memory transfer result from limit-sure winning strategies to almost-optimal strategies. As an application, they show that, for co-Büchi objectives, since positional strategies are sufficient to win limit-surely, they also are to win almost-optimally. Their construction made heavy use of the specific nature of parity objectives.

We also mention [6], where the focus is also on concurrent games with prefix-independent objectives. In particular, the authors establish a (very useful) result: if all states have positive values, then they all have value 1. (Note that a strengthening of this result is presented in this paper (Theorem 3), which also appears as an adaptation of a result proved in [14]). This result is then used in another context with non-zero-sum games.

Finally, some recent works on concurrent games have been done in [2,3,4], where the goal is the following: local interactions of the two players in the player state are given by bi-dimensional tables; those tables can be abstracted as game forms, where (output) variables are issues of the local interaction (possibly several issues are labelled by the same variable). The goal of this series of works is to give (intrinsic) properties of these game forms, so that, when used in a graph game, the existence of optimal strategies is ensured. For instance, in [3], a property of games forms, called RM, is given, which ensures that, if one only uses RM game forms in a graph, then for every reachability objective, Player $\textsf{A} $ will always have an optimal strategy for that objective. This property is a characterization of well-behaved game forms regarding reachability objectives since every game form which is not RM can be embedded into a (small) graph game in such a way that Player $\textsf{A} $ does not have an optimal strategy. This line of works really differs from the target of the current paper.

Structure of the paper. Section 2 presents notations, Section 3 recalls the notion of game forms, Section 4 introduces our formalism, Section 5 exhibits a necessary and sufficient pair of conditions for subgame optimality, Section 6 shows a memory transfer from subgame almost-surely winning to subgame optimal in concurrent games, and Section 7 adapts the results of the previous section to the case of the existence of a subgame finite-choice strategy.

Detailed proofs and additional formal definitions are available in [5].

2 Preliminaries

Consider a non-empty set Q. We denote by $Q^*$, $Q^+$ and $Q^\omega $ the set of finite sequences, non-empty finite sequences and infinite sequences of elements of Q respectively. For $n \in \mathbb {N}$, we denote by $Q^n$ (resp. $Q^{\le n}$) the set of sequences of (resp. at most) n elements of Q. For all $\rho = q_1 \cdots q_n \in Q^n$ and $i \le n$, we denote by $\rho _i$ the element $q_i \in Q$ and by $\rho _{\le i} \in Q^i$ the finite sequence $q_1 \cdots q_i$. For a subset $S \subseteq Q$, we denote by $Q^* \cdot S^\omega \subseteq Q^\omega $ the set of infinite paths that eventually settle in S and by $(Q^* \cdot S)^\omega \subseteq Q^\omega $ the set of infinite paths visiting infinitely often the set S.

A discrete probabilistic distribution over a non-empty finite set Q is a function $\mu : Q \rightarrow [0,1]$ such that $\sum _{x \in Q} \mu (x) = 1$. The support $\textsf{Supp} (\mu )$ of a probabilistic distribution $\mu : Q \rightarrow [0,1]$ is the set of non-zeros of the distribution: $\textsf{Supp} (\mu ) = \{ q \in Q \mid \mu (q) \in (0,1] \}$. The set of all distributions over Q is denoted $\mathcal {D} (Q)$.

3 Game forms

We recall the definition of game forms – informally, bi-dimensional tables with variables – and of games in normal forms – game forms whose outcomes are values between 0 and 1.

Definition 1 (Game form and game in normal form)

A game form (GF for short) is a tuple $\mathcal {F}= \langle \textsf{Act}_\textsf{A} ,\textsf{Act}_\textsf{B} ,\textsf{O},\varrho \rangle $ where $\textsf{Act}_\textsf{A} $ (resp. $\textsf{Act}_\textsf{B} $) is the non-empty finite set of actions available to Player $\textsf{A} $ (resp. $\textsf{B} $), $\textsf{O}$ is a non-empty set of outcomes, and $\varrho : \textsf{Act}_\textsf{A} \times \textsf{Act}_\textsf{B} \rightarrow \textsf{O}$ is a function that associates an outcome to each pair of actions. When the set of outcomes $\textsf{O}$ is equal to [0, 1], we say that $\mathcal {F}$ is a game in normal form. For a valuation $v \in [0,1]^\textsf{O}$ of the outcomes, the notation $\langle \mathcal {F},v \rangle $ refers to the game in normal form $\langle \textsf{Act}_\textsf{A} ,\textsf{Act}_\textsf{B} ,[0,1],v \circ \varrho \rangle $.

We use game forms to represent interactions between two players. The strategies available to Player $\textsf{A} $ (resp. $\textsf{B} $) are convex combinations of actions given as the rows (resp. columns) of the table. In a game in normal form, Player $\textsf{A} $ tries to maximize the outcome, whereas Player $\textsf{B} $ tries to minimize it.

Definition 2 (Outcome of a game in normal form)

Consider a game in normal form $\mathcal {F}= \langle \textsf{Act}_\textsf{A} ,\textsf{Act}_\textsf{B} ,[0,1],\varrho \rangle $. The set $\mathcal {D} (\textsf{Act}_\textsf{A} )$ (resp. $\mathcal {D} (\textsf{Act}_\textsf{B} )$) is the set of strategies available to Player $\textsf{A} $ (resp. $\textsf{B} $). For a pair of strategies $(\sigma _\textsf{A} ,\sigma _\textsf{B} ) \in \mathcal {D} (\textsf{Act}_\textsf{A} ) \times \mathcal {D} (\textsf{Act}_\textsf{B} )$, the outcome $\textsf{out}_\mathcal {F}(\sigma _\textsf{A} ,\sigma _\textsf{B} )$ in $\mathcal {F}$ of the strategies $(\sigma _\textsf{A} ,\sigma _\textsf{B} )$ is defined as: $\textsf{out}_\mathcal {F}(\sigma _\textsf{A} ,\sigma _\textsf{B} ) := \sum _{a \in \textsf{Act}_\textsf{A} } \sum _{b \in \textsf{Act}_\textsf{B} } \sigma _\textsf{A} (a) \cdot \sigma _\textsf{B} (b) \cdot \varrho (a,b) \in [0,1]$.

Definition 3 (Value of a game in normal form and optimal strategies)

Consider a game in normal form $\mathcal {F}= \langle \textsf{Act}_\textsf{A} ,\textsf{Act}_\textsf{B} ,[0,1],\varrho \rangle $ and a strategy $\sigma _\textsf{A} \in \mathcal {D} (\textsf{Act}_\textsf{A} )$ for Player $\textsf{A} $. The value of the strategy $\sigma _\textsf{A} $, denoted $\textsf{val}_\mathcal {F}(\sigma _\textsf{A} )$ is equal to: $\textsf{val}_\mathcal {F}(\sigma _\textsf{A} ) := \inf _{\sigma _\textsf{B} \in \mathcal {D} (\textsf{Act}_\textsf{B} )} \textsf{out}_{\mathcal {F}}(\sigma _\textsf{A} ,\sigma _\textsf{B} )$, and analogously for Player $\textsf{B} $, with a $\sup $ instead of an $\inf $. When $\sup _{\sigma _\textsf{A} \in \mathcal {D} (\textsf{Act}_\textsf{A} )} \textsf{val}_\mathcal {F}(\sigma _\textsf{A} ) = \inf _{\sigma _\textsf{B} \in \mathcal {D} (\textsf{Act}_\textsf{B} )} \textsf{val}_\mathcal {F}(\sigma _\textsf{B} )$, it defines the value of the game $\mathcal {F}$, denoted $\textsf{val}_\mathcal {F}$.

A strategy $\sigma _\textsf{A} \in \mathcal {D} (\textsf{Act}_\textsf{A} )$ ensuring $\textsf{val}_\mathcal {F}= \textsf{val}_\mathcal {F}(\sigma _\textsf{A} )$ is called optimal. The set of all optimal strategies for Player $\textsf{A} $ is denoted $\textsf{Opt}_\textsf{A} (\mathcal {F}) \subseteq \mathcal {D} (\textsf{Act}_\textsf{A} )$, and analogously for Player $\textsf{B} $. Von Neuman’s minimax theorem [20] ensures the existence of optimal strategies (for both players).

In the following, strategies in games in normal forms will be called $\textsf{GF}$-strategies, in order not to confuse them with strategies in concurrent (graph) games.

4 Concurrent games and optimal strategies

4.1 Concurrent arenas and strategies

We introduce the definition of concurrent arenas played on a finite graph.

Definition 4 (Finite stochastic concurrent arena)

A colored concurrent arena $\mathcal {C} $ is a tuple $\langle Q,(A_q)_{q \in Q},(B_q)_{q \in Q},\textsf{D} ,\delta ,\textsf{dist} ,\textsf{K} ,\textsf{col} \rangle $ where Q is the non-empty finite set of states, for all $q \in Q$, $A_q$ (resp. $B_q$) is the non-empty finite set of actions available to Player $\textsf{A} $ (resp. $\textsf{B} $) at state q, $\textsf{D} $ is the finite set of Nature states, $\delta : \bigcup _{q \in Q} (\{ q \} \times A_q \times B_q) \rightarrow \textsf{D} $ is the transition function, $\textsf{dist} : \textsf{D} \rightarrow \mathcal {D} (Q)$ is the distribution function. Furthermore, $\textsf{K} $ is the non-empty finite set of colors and $\textsf{col} : Q \rightarrow \textsf{K} $ is the coloring function.

In the following, the arena $\mathcal {C} $ will refer to the tuple $\langle Q,(A_q)_{q \in Q},(B_q)_{q \in Q},\textsf{D} ,\delta ,\textsf{dist} ,\textsf{K} ,\textsf{col} \rangle $, unless otherwise stated. A concurrent game is obtained from a concurrent arena by adding a winning condition: the set of infinite paths winning for Player $\textsf{A} $ (and losing for Player $\textsf{B} $).

Definition 5 (Finite stochastic concurrent game)

A finite concurrent game is a pair $\langle \mathcal {C} ,W \rangle $ where $\mathcal {C} $ is a finite concurrent colored arena and $W \subseteq \textsf{K} ^\omega $ is Borel. The set W is called the objective, as it corresponds to the set of colored paths winning for Player $\textsf{A} $.

In this paper, we only consider a specific kind of objectives: prefix-independent ones. Informally, they correspond to objectives W such that an infinite path $\rho $ is in W if and only if any of its suffixes is in W. More formally:

Definition 6 (Prefix-independent objectives)

For a non-empty finite set of colors $\textsf{K} $ and $W \subseteq \textsf{K} ^\omega $, W is said to be prefix-independent (PI for short) if, for all $\rho \in \textsf{K} ^\omega $ and $i \ge 0$, $\rho \in W \Leftrightarrow \rho _{\ge i} \in W$.

In the following, we refer to concurrent games with prefix-independent objectives as PI concurrent games.

Definition 7 (Parity, Büchi, co-Büchi objectives)

Let $\textsf{K} \subset \mathbb {N}$ be a finite non-empty set of integers. Consider a concurrent arena $\mathcal {C} $ with $\textsf{K} $ as set of colors. For an infinite path $\rho \in Q^\omega $, we denote by $\textsf{col} (\rho )_\infty \subseteq \mathbb {N}$ the set of colors seen infinitely often in $\rho $: $\textsf{col} (\rho )_\infty := \{ n \in \mathbb {N}\mid \forall i \in \mathbb {N},\; \exists j \ge i,\; \textsf{col} (\rho _j) = n \}$. Then, the parity objective w.r.t. $\textsf{col} $ is the set $W^{\textsf{Parity}}(\textsf{col} ) := \{ \rho \in Q^\omega \mid \max \textsf{col} (\rho )_\infty \text { is even } \}$. The Büchi (resp. co-Büchi) objective correspond to the parity objective with $\textsf{K} := \{ 1,2 \}$ (resp. $\textsf{K} := \{ 0,1 \}$).

Strategies are then defined as functions that, given the history of the game (i.e. the sequence of states already seen) associate a distribution on the actions available to the Player.

Definition 8 (Strategies)

Consider a concurrent game $\mathcal {C} $. A strategy for Player $\textsf{A} $ is a function $\textsf{s}_\textsf{A} : Q^+ \rightarrow \mathcal {D} (A)$ with $A := \bigcup _{q \in Q} A_q$ such that, for all $\rho = q_0 \cdots q_n \in Q^+$, we have $\textsf{s}_\textsf{A} (\rho ) \in \mathcal {D} (A_{q_n})$. We denote by $\textsf{S}_{\mathcal {C} }^{\textsf{A} } $ the set of all strategies in arena $\mathcal {C} $ for Player $\textsf{A} $. This is analogous for Player $\textsf{B} $.

Given two strategies $\textsf{s}_\textsf{A} ,\textsf{s}_\textsf{B} $ for both players in an arena $\mathcal {C} $ from a starting state $q_0$, we define in the usual manner the probability $\mathbb {P}^{\mathcal {C} ,q_0}_{\textsf{s}_\textsf{A} ,\textsf{s}_\textsf{B} } $ of a finite path which induces the probability of an arbitrary Borel subset of infinite paths. Values of strategies and of the game are defined below.

Definition 9 (Value of strategies and of the game)

Let $\mathcal {G} = \langle \mathcal {C} ,W \rangle $ be a PI concurrent game and consider a strategy $\textsf{s}_\textsf{A} \in \textsf{S}_{\mathcal {C} }^{\textsf{A} } $ for Player $\textsf{A} $. The function $\chi _{\mathcal {G} } [\textsf{s}_\textsf{A} ]: Q \rightarrow [0,1]$ giving the value of the strategy $\textsf{s}_\textsf{A} $ is such that, for all $q_0 \in Q$, we have $\chi _{\mathcal {G} } [\textsf{s}_\textsf{A} ](q_0) := \inf _{\textsf{s}_\textsf{B} \in \textsf{S}_{\mathcal {C} }^{\textsf{B} } } \mathbb {P}^{\mathcal {C} ,q_0}_{\textsf{s}_\textsf{A} ,\textsf{s}_\textsf{B} } [W]$. The function $\chi _{\mathcal {G} } [\textsf{A} ]: Q \rightarrow [0,1]$ giving the value for Player $\textsf{A} $: is such that, for all $q_0 \in Q$, we have $\chi _{\mathcal {G} } [\textsf{A} ](q_0) := \sup _{\textsf{s}_\textsf{A} \in \textsf{S}_{\mathcal {C} }^{\textsf{A} } } \chi _{\mathcal {G} } [\textsf{s}_\textsf{A} ](q_0)$. The function $\chi _{\mathcal {G} } [\textsf{B} ]: Q \rightarrow [0,1]$ giving the value of the game for Player $\textsf{B} $ is defined similarly by reversing the supremum and infimum.

By Martin’s result on the determinacy of Blackwell games [17], for all concurrent games $\mathcal {G} = \langle \mathcal {C} ,W \rangle $, the value functions for both Players are equal, this defines the value function $\chi _{\mathcal {G} } : Q \rightarrow [0,1]$ of the game: $\chi _{\mathcal {G} } := \chi _{\mathcal {G} } [\textsf{A} ] = \chi _{\mathcal {G} } [\textsf{B} ]$.

We define value areas: subsets of states whose values are the same.

Definition 10 (Value area)

In a PI concurrent game $\mathcal {G} $, $V_\mathcal {G}$ refers to the set of values appearing in the game : $V_\mathcal {G} := \{ \chi _{\mathcal {G} } [q] \mid q \in Q \}$. Furthermore, for all $u \in V_\mathcal {G}$, $Q_u \subseteq Q$ refers to the set of states whose values are u w.r.t. $\chi _{\mathcal {G} } $: $Q_u := \{ q \in Q \mid \chi _{\mathcal {G} } (q) = u \}$.

In concurrent games, game forms appear at each state and describe the interactions of the players at that state. Furthermore, the valuation mapping each state to its value in the game can be lifted, via a convex combination, into a valuation of the Nature states. This, in turn, induces a natural way to define the game in normal form appearing at each state.

Definition 11 (Local interactions, Lifting valuations)

In a PI concurrent game $\mathcal {G} $ where the valuation $\chi _{\mathcal {G} } : Q \rightarrow [0,1]$ gives the values of the game, the lift $\nu _{\mathcal {G} } : \textsf{D} \rightarrow [0,1]$ is such that, for all $d \in \textsf{D} $, we have $\nu _{\mathcal {G} } (d) := \sum _{q \in Q} \chi _{\mathcal {G} } (q) \cdot \textsf{dist} (d)(q)$ (recall that $\textsf{dist} : \textsf{D} \rightarrow \mathcal {D} (Q)$ is the distribution function).

Let $q \in Q$. The local interaction at state q is the game form $\mathcal {F}_q = \langle A_q,B_q,\textsf{D} ,\delta (q,\cdot ,\cdot ) \rangle $. The game in normal form at state q is then $\mathcal {F}^\textsf{nf}_q := \langle \mathcal {F}_q,\nu _{\mathcal {G} } \rangle $.

The values of the game in normal form $\mathcal {F}^\textsf{nf}_q$ and of the state q are equal.

Proposition 1

In a PI concurrent game $\mathcal {G} $, for all states $q \in Q$, we have $\chi _{\mathcal {G} } (q) = \textsf{out}_{\mathcal {F}^\textsf{nf}_q}$.

4.2 More on strategies

In this subsection, we define several kinds of strategies . Let us fix a PI concurrent game $\mathcal {G} $ for the rest of this section. First, we consider optimal strategies, i.e. strategies realizing the value of the game. Strategies are positively-optimal if their values are positive from all states whose value is positive.

Definition 12 ((Positively-) optimal strategies)

A Player $\textsf{A} $ strategy $\textsf{s}_\textsf{A} \in \textsf{S}_{\mathcal {C} }^{\textsf{A} } $ is (resp. positively-) optimal from a state $q \in Q$ if $\chi _{\mathcal {G} } (q) = \chi _{\mathcal {G} } [\textsf{s}_\textsf{A} ](q)$ (resp. if $\chi _{\mathcal {G} } (q)> 0 \Rightarrow \chi _{\mathcal {G} } [\textsf{s}_\textsf{A} ](q) > 0$). It is (resp. positively-) optimal if this holds from all states $q \in Q$.

Note that the definition of optimal strategies we consider is sometimes referred to as uniform optimality, as it holds from every state of the game. However, it does not say anything about what happens once some sequence of states have been seen. We would like now to define a notion of strategy that is optimal from any point that can occur after any finite sequence of states has been seen. This correspond to subgame optimal strategies. To define them, we need to introduce the notion of residual strategy.

Definition 13 (Residual and Subgame Optimal Strategies)

For all finite sequences $\rho \in Q^+$, the residual strategy $\textsf{s}_\textsf{A} ^\rho $ of a Player $\textsf{A} $ strategy $\textsf{s}_\textsf{A} $ is the strategy $\textsf{s}_\textsf{A} ^\rho : Q^+ \rightarrow \mathcal {D} (A)$ such that, for all $\pi \in Q^+$, we have $\textsf{s}_\textsf{A} ^\rho (\pi ) := \textsf{s}_\textsf{A} (\rho \cdot \pi )$.

The Player $\textsf{A} $ strategy $\textsf{s}_\textsf{A} $ is subgame optimal if, for all $\rho = \rho ' \cdot q \in Q^+$, the residual strategy $\textsf{s}_\textsf{A} ^\rho $ is optimal from q, i.e. $\chi _{\mathcal {G} } [\textsf{s}_\textsf{A} ^\rho ](q) = \chi _{\mathcal {G} } (q)$.

Note that, in particular, subgame optimal strategies are optimal strategies. When such strategies do exist, we want them to be as simple as possible, for instance we want them to be positional , that is that they only depend on the current state of the game.

As for Player $\textsf{B} $, we will consider a specific kind of strategies, namely deterministic strategies. That is because, once a Player $\textsf{A} $ strategy is fixed we obtain an (infinite) MDP. In such a context, $\varepsilon $-optimal strategies can be chosen among deterministic strategies (see for instance the explanation in [9, Thm. 1]).

Definition 14 (Positional, Deterministic strategies)

A Player $\textsf{A} $ strategy $\textsf{s}_\textsf{A} $ is positional if, for all states $q \in Q$ and paths $\rho \in Q^+$ we have $\textsf{s}_\textsf{A} (\rho \cdot q) = \textsf{s}_\textsf{A} (q)$.

A Player $\textsf{B} $ strategy $\textsf{s}_\textsf{B} $ is deterministic if, for all finite sequences $\rho \cdot q \in Q^+$, there exists $b \in B_q$ such that $\textsf{s}_\textsf{B} (\rho \cdot q)(b) = 1$.

5 Necessary and sufficient condition for subgame optimality

In this section, we present a necessary and sufficient pair of conditions for a Player $\textsf{A} $ strategy to be subgame optimal, formally stated in Theorem 1. The arguments given here are somewhat similar to the ones given in Section 4 of [4], which deals with the same question restricted to positional strategies.

The first condition is local: it specifies how a strategy behaves in the games in normal form at each local interaction of the game. As mentioned in Proposition 1, at each state q, the value of the game in normal form $\mathcal {F}^\textsf{nf}_q$ is equal to the value of the state q (given by the valuation $\chi _{\mathcal {G} } \in [0,1]^Q$). This suggests that, for all finite sequences of states $\rho \in Q^+$ ending at that state q, the $\textsf{GF}$-strategy $\textsf{s}_\textsf{A} (\rho )$ needs to be optimal in the game in normal form $\mathcal {F}^\textsf{nf}_q$ for the residual strategy $\textsf{s}_\textsf{A} ^\rho $ to be optimal from q. Strategies with such a property are called locally optimal. This is a necessary condition for subgame optimality. (However, it is neither a necessary nor a sufficient condition for optimality, as argued in Section 6 ).

Definition 15 (Locally optimal strategies)

Consider a PI concurrent game $\mathcal {G} $. A Player $\textsf{A} $ strategy $\textsf{s}_\textsf{A} $ is locally optimal if, for all $\rho = \rho ' \cdot q \in Q^+$, the $\textsf{GF}$-strategy $\textsf{s}_\textsf{A} (\rho )$ is optimal in the game in normal form $\mathcal {F}^\textsf{nf}_q$. That is – recalling that $\nu _{\mathcal {G} } \in [0,1]^\textsf{D} $ lifts the valuation $\chi _{\mathcal {G} } \in [0,1]^Q$ to the Nature states – for all $b \in B_q$: $\chi _{\mathcal {G} } (q) \le \sum _{a \in A} \textsf{s}_\textsf{A} (\rho )(a) \cdot \nu _{\mathcal {G} } \circ \delta (q,a,b) = \textsf{out}_{\mathcal {F}^\textsf{nf}_q}(\textsf{s}_\textsf{A} (\rho ),b)$

Lemma 1

In a PI concurrent game, subgame optimal strategies are locally optimal.

Note that this was already shown for positional strategies in [4].

Local optimality does not ensure subgame optimality in general. However, it does ensure that, for all Player $\textsf{B} $ deterministic strategies, the game almost-surely eventually settles in a value area, i.e. in some $Q_u$ for some $u \in V_\mathcal {G} $.

Lemma 2

Consider a PI concurrent game $\mathcal {G} $ and a Player $\textsf{A} $ locally optimal strategy $\textsf{s}_\textsf{A} $. For all Player $\textsf{B} $ deterministic strategies, almost surely the states seen infinitely often have the same value . That is: $\mathbb {P}^{\textsf{s}_\textsf{A} ,\textsf{s}_\textsf{B} }[\bigcup _{u \in V_{\mathcal {G} }} Q^* \cdot (Q_u)^\omega ] = 1$.

Proof (Sketch)

First, if a state of value 1 is reached (i.e. a state in $Q_1$), then all states that can be seen with positive probability have value 1 (i.e. are in $Q_1$), since the strategy $\textsf{s}_\textsf{A} $ is locally optimal. Let now $u \in V_{\mathcal {G} }$ be the highest value in $V_{\mathcal {G} }$ that is not 1 and consider the set of infinite paths such that the set $Q_u$ is seen infinitely often but the game does not settle in it, i.e. the set $(Q^* \cdot (Q \setminus Q_u))^\omega \cap (Q^* \cdot Q_u)^\omega \subseteq Q^\omega $. Since the strategy $\textsf{s}_\textsf{A} $ is locally optimal (and since $V_{\mathcal {G} }$ is finite), one can show that there is a positive probability $p > 0$ such that, the conditional probability of reaching $Q_1$ knowing that $Q_u$ is left is at least p. Hence, if $Q_u$ is left infinitely often, almost-surely the set $Q_1$ is seen (and never left). It follows that the probability of the event $(Q^* \cdot (Q \setminus Q_u))^\omega \cap (Q^* \cdot Q_u)^\omega $ is 0. This implies that, almost-surely, if the set $Q_u$ is seen infinitely often, then at some point it is never left. The same arguments can then be used with the highest value in $V_{\mathcal {G} }$ that is less than u, etc. Overall, we obtain that, for all $u \in V_{\mathcal {G} }$, if a set $Q_u$ is seen infinitely often, it is eventually never left almost-surely.

Local optimality ensures that, at each step, the expected values of the states reached does not worsen (and may even improve if Player $\textsf{B} $ does not play optimally). By propagating this property, we obtain that, given a Player $\textsf{A} $ locally optimal strategy and a Player $\textsf{B} $ deterministic strategy, the convex combination of the values u in $V_\mathcal {G} $ weighted by the probability of settling in the value area $Q_u$, from a state q is at least equal to its value $\chi _{\mathcal {G} } (q)$. This is stated in Lemma 3 below.

Lemma 3

For a PI concurrent game $\mathcal {G} $, a Player $\textsf{A} $ locally optimal strategy $\textsf{s}_\textsf{A} $, a Player $\textsf{B} $ deterministic strategy $\textsf{s}_\textsf{B} $ and a state $q \in Q$: $\chi _{\mathcal {G} } (q) \le \sum _{u \in V_{\mathcal {G} }} u \cdot \mathbb {P}^{\textsf{s}_\textsf{A} ,\textsf{s}_\textsf{B} }_q[Q^* \cdot (Q_u)^\omega ]$.

Note that if Player $\textsf{B} $ plays subgame optimally, then this inequality is an equality.

Proof (Sketch)

First, let us denote $\mathbb {P}^{\textsf{s}_\textsf{A} ,\textsf{s}_\textsf{B} }_q$ by $\mathbb {P}$. It can be shown by induction that, for all $i \in \mathbb {N}^*$, we have the property $\mathcal {P}(i): \chi _{\mathcal {G} } (q) \le \sum _{\pi \cdot q' \in q \cdot Q^i} \chi _{\mathcal {G} } (q') \cdot \mathbb {P}(\pi \cdot q') = \sum _{u \in V_\mathcal {G} \setminus \{ 0 \}} u \cdot \mathbb {P}[q \cdot Q^{i-1} \cdot Q_u]$. Furthermore, since by Lemma 2, the game almost-surely settles in a value area, it can be shown that for n large enough, the probability of being in $Q_u$ after n steps (i.e. $\mathbb {P}[q \cdot Q^{n-1} \cdot Q_u]$) is arbitrarily close to the probability of eventually settling in $Q_u$ (i.e. $\mathbb {P} [Q^* \cdot (Q_u)^\omega ]$). We can then apply $\mathcal {P}(n)$ to obtain the desired inequality.

Recall that we are considering a pair of conditions to characterize that a strategy is subgame optimal. The first condition is local optimality. To summarize, we have seen that the fact that a strategy is locally optimal ensures that, from any state q, the expected values of the value areas where the game settles is at least $\chi _{\mathcal {G} } (q)$. However, local optimality does not ensure anything as to the probability of W given that the game settles in a specific value area . This is where the second condition comes into play. For the explanations regarding this condition, we will need Lemma 4 below: a consequence of Levy’s 0-1 Law.

Lemma 4

Let $\mathcal {M}$ be a countable Markov chain with a PI objective. If there is a $q \in Q$ such that $\chi _{\mathcal {M}} (q) < 1$, then $\inf _{q' \in Q} \chi _{\mathcal {M}} (q') = 0$.

Consider now a Player $\textsf{A} $ subgame optimal strategy $\textsf{s}_\textsf{A} $ and a Player $\textsf{B} $ deterministic strategy. Let us consider what happens if the game eventually settles in $Q_u$ for some $u \in V_\mathcal {G} \setminus \{ 0 \}$. Assume towards a contradiction that there is a finite path after which the probability of W given that the play eventually settles in $Q_u$ is less than 1. Then, there is a continuation of this path ending in $Q_u$ for which this probability of W is less than u. Indeed, it was shown that, for a PI objective, in a countable Markov chain (which is what we obtain once strategies for both players are fixed), if there is a state with a value less than 1, then the infimum of the values in the Markov chain is 0 (this is what is stated in Lemma 4). Following our above towards-a-contradiction-assumption, there would be a finite path from which the Player $\textsf{A} $ strategy $\textsf{s}_\textsf{A} $ is not optimal. This is in contradiction with the fact that it is subgame optimal. Hence, a second necessary condition – in addition to the local optimality assumption – for subgame optimality is: from all finite paths, for all Player $\textsf{B} $ deterministic strategies, for all positive values $u \in V_{\mathcal {G} } \setminus \{ 0 \}$, the probability of W and eventually settling in $Q_u$ is equal to the probability of eventually settling in $Q_u$. We obtain the theorem below.

Theorem 1

Consider a concurrent game $\mathcal {G} $ with a PI objective W and a Player $\textsf{A} $ strategy $\textsf{s}_\textsf{A} \in \textsf{S}_{\mathcal {C} }^{\textsf{A} } $. The strategy $\textsf{s}_\textsf{A} $ is subgame optimal if and only if:

it is locally optimal;
for all $\rho \in Q^+$, for all Player $\textsf{B} $ deterministic strategies $\textsf{s}_\textsf{B} $, for all values $u \in V_\mathcal {G} \setminus \{ 0 \}$, we have $\mathbb {P}_\rho ^{\textsf{s}_\textsf{A} ^\rho ,\textsf{s}_\textsf{B} ^\rho }[W \cap Q^* \cdot (Q_u)^\omega ] = \mathbb {P}_\rho ^{\textsf{s}_\textsf{A} ^\rho ,\textsf{s}_\textsf{B} ^\rho }[Q^* \cdot (Q_u)^\omega ]$.

Proof (Sketch)

Lemma 1 states that local optimality is necessary and we have informally argued above why the second condition is also necessary for subgame optimality. As for the fact that they are sufficient conditions, this is a direct consequence of Lemmas 2 and 3 and the fact that deterministic strategies can achieve the same values as arbitrary strategies in MDPs (which we obtain once a Player $\textsf{A} $ strategy is fixed), as cited in Subsection 4.2 .

One may ask what happens in the special case where the strategy $\textsf{s}_\textsf{A} $ considered is positional. As mentioned above, such a characterization was already presented in [4]^{Footnote 1}. Overall, we obtain a similar result except that the second condition is replaced by what happens in the game restricted to the End Components in the Markov Decision Process induced by the positional strategy $\textsf{s}_\textsf{A} $.

6 From subgame almost-surely winning to subgame optimality

In [14, Thm. 4.5], the authors have proved a transfer result in PI turn-based games : the amount of memory sufficient to play optimally in every state of value 1 of every game is also sufficient to play optimally in every game. This result does not hold on concurrent games as is. First, although there are always optimal strategies in PI turn-based games (as proved in the same paper [14, Thm. 4.3]), there are PI concurrent games without optimal strategies. Second, infinite memory may be required to play optimally in co-Büchi concurrent games whereas almost-surely winning strategies can be found among positional strategies in a turn-based setting. This can be seen in the game of Figure 1 with $\textsf{col} (q_0) = 0$ and $\textsf{col} (q_1) = \textsf{col} (q_1') = 1$. The green values in the local interaction at state $q_0$ are the values of the game if they are reached (the game ends immediately). If a green value is not reached, the objective of Player $\textsf{A} $ is to see only finitely often states $q_1$ and $q_1'$. It has already been argued in [4] that the value of this game is 1/2 and that there is an optimal strategy for Player $\textsf{A} $ but it requires infinite memory. To play optimally, Player $\textsf{A} $ must play the top row with probability $1 - \varepsilon _k$ and the middle row with probability $\varepsilon _k$ for $\varepsilon _k > 0$ that goes (fast) to 0 when k goes to $\infty $ (where k denotes the number of steps). The $\varepsilon _k$ must be chosen so that, if Player $\textsf{B} $ always plays the left column with probability 1, then the state $q_1$ is seen finitely often with probability 1. Furthermore, as soon as the state $q_1'$ is visited, Player $\textsf{A} $ switches to a positional strategy playing the bottom row with probability $\varepsilon _k'$ small enough (where k denotes the number of steps before the state $q_1'$ was seen) and the two top rows with probability $(1 - \varepsilon _k')/2$.

Hence, the transfer of memory from almost-surely winning to optimal does not hold in concurrent games even if it is assumed that optimal strategies exist . However, one can note that although the strategy described above is optimal, it is not subgame optimal. Indeed, when the strategy switches, the value of the residual strategy is $1/2 - \varepsilon _k' < 1/2$. In fact, there is no subgame optimal strategy in that game. Actually, if we assume that, not only optimal but subgame optimal strategies exist, then the transfer of memory will hold.

The aim of this section is twofold: first, we identify a necessary and sufficient condition for the existence of subgame optimal strategies^{Footnote 2}. Second, we establish the above-mentioned memory transfer that relates the amount of memory to play subgame optimally and to be almost-surely winning. Before stating the main theorem of this section , let us first introduce the definition of positionally subgame almost-surely winnable objective, i.e. objectives for which subgame almost-surely winning strategies can be found among positional strategies.

Definition 16 (Positionally subgame almost-surely winnable objective)

Consider a PI objective $W \subseteq \textsf{K} ^\omega $. It is said to be a positionally subgame almost-surely winnable objective (PSAW for short) if the following holds: in all concurrent games $\mathcal {G} = \langle \mathcal {C} ,W \rangle $ where there is a subgame almost-surely winning strategy, there is a positional one.

Theorem 2

Consider a non-empty finite set of colors K and a PI objective $\emptyset \subsetneq W \subseteq \textsf{K} ^\omega $. Consider a concurrent game $\mathcal {G} $ with objective W. Then, the three following assertions are equivalent:

a.
there exists a subgame optimal strategy;
b.
there exists an optimal strategy that is locally optimal;
c.
there exists a positively-optimal strategy that is locally optimal.

Furthermore, if this holds and if the objective W is PSAW, then there exists a subgame optimal positional strategy.

First, note that the equivalence is stated in terms of existence of strategies, not on the strategies themselves. In particular, any subgame optimal strategy is both optimal and locally optimal, however, an optimal strategy that is locally optimal is not necessarily a subgame optimal strategy. Second, it is straightforward that point a implies point b (from Theorem 1) and that point b implies point c (by definition of positively-optimal strategies). In the remainder of this section, we explain informally the constructions leading to the proof of this theorem, i.e. to the proof that point c implies point a. The transfer of memory is a direct consequence of the way this theorem is proven. We fix a PI concurrent game $\mathcal {G} = \langle \mathcal {C} ,W \rangle $ for the rest of the section.

The idea is as follows. As stated in Theorem 1, subgame optimal strategies are locally optimal and win the game almost-surely if the game settles in a value area $Q_u$ for some positive $u \in V_{\mathcal {G} } \setminus \{ 0 \}$. Our idea is therefore to consider subgame almost-surely winning strategies in the derived game $\mathcal {G} _u$: a “restriction” of the game $\mathcal {G} $ to $Q_u$ (more details will be given later). We can then glue together these subgame almost-surely winning strategies – defined for all $u \in V_{\mathcal {G} } \setminus \{ 0 \}$ – into a subgame optimal strategy. However, there are some issues:

1.
the state values in the game $\mathcal {G} _u$ should be all equal to 1;
2.
furthermore, there must exist a subgame almost-surely winning strategy in $\mathcal {G} _u$;
3.
this subgame almost-surely winning strategy in $\mathcal {G} _u$ should be locally optimal when considered in the whole game $\mathcal {G} $.

Note that the method we use here is different from what the authors of [14] did to prove the transfer of memory in turn-based games .

Let us first deal with issue . One can ensure that the almost-surely winning strategies in the game $\mathcal {G} _u$ are all locally optimal in $\mathcal {G} $ by properly defining the game $\mathcal {G} _u$. More specifically, this is done by enforcing that the only Player $\textsf{A} $ possible strategies in $\mathcal {G} _u$ are locally optimal in the game $\mathcal {G} $. To do so, we construct the game $\mathcal {G} _u$ whose state space is $Q_u$ (plus gadget states) but whose set of actions $A_{\mathcal {F}^\textsf{nf}_q}$, at a state $q \in Q_u$, is such that the set of strategies $\mathcal {D} (A_{\mathcal {F}^\textsf{nf}_q})$ corresponds exactly to the set of optimal strategies in the original game in normal form $\mathcal {F}^\textsf{nf}_q$, while keeping the set of actions $A_{\mathcal {F}^\textsf{nf}_q}$ for Player $\textsf{A} $ finite. This is possible thanks to Proposition 2 below: in every game in normal form $\mathcal {F}^\textsf{nf}_q$ at state $q \in Q_u$, there exists a finite set $A_{\mathcal {F}^\textsf{nf}_q}$ of optimal strategies such that the optimal strategies in $\mathcal {F}^\textsf{nf}_q$ are exactly the convex combinations of strategies in $A_{\mathcal {F}^\textsf{nf}_q}$. This is a well known result, argued for instance in [18].

Proposition 2

Consider a game in normal form $\mathcal {F}^\textsf{nf}= \langle A,B,[0,1],\delta \rangle $ with $|A| = n$ and $|B| = k$. There exists a set $A_{\mathcal {F}^\textsf{nf}} \subseteq \textsf{Opt}_\textsf{A} (\mathcal {F}^\textsf{nf})$ of optimal strategies such that $|A_{\mathcal {F}^\textsf{nf}}| \le n+k$ and $\mathcal {D} (A_{\mathcal {F}^\textsf{nf}}) = \textsf{Opt}_\textsf{A} (\mathcal {F}^\textsf{nf})$.

Proof (Sketch)

One can write a system of $n+k$ inequalities (with some additional equalities) whose set of solutions is exactly the set of optimal $\textsf{GF}$-strategies $\textsf{Opt}_\textsf{A} (\mathcal {F}^\textsf{nf})$. The result then follows from standard system of inequalities arguments as the space of solutions is in fact a polytope with at most $n+k$ vertices. .

We illustrate this construction: a part of a concurrent game is depicted in Figure 3 and the change of the interaction of the players at state $q_0$ is depicted in Figures 4, 5, 6 and 7.

The game $\mathcal {G} _u$ has the same objective W as the game $\mathcal {G} $. Since we want all the states to have value 1 in $\mathcal {G} _u$ (recall issue ), we will build the game $\mathcal {G} _u$ such that any edge leading to a state not in $Q_u$ in $\mathcal {G} $ now leads to a PI concurrent game $\mathcal {G} _W$ (with the same objective W) where all states have value 1. The game $\mathcal {G} _W$ is (for instance) a clique with all colors in $\textsf{K} $ where Player $\textsf{A} $ plays alone.

An illustration of this construction can be found in Figures 8 and 9. The blue dotted arrows are the ones that need to be redirected when the game is changed. With such a definition, we have made some progress w.r.t. the issue cited previously (regarding the values being equal to 1): the values of all states of the game $\mathcal {G} _u$ are positive (for positive u).

Lemma 5

Consider the game $\mathcal {G} _u$ for some positive $u \in V_{\mathcal {G} } \setminus \{ 0 \}$ and assume that, in $\mathcal {G} $, there exists a positively-optimal strategy that is locally optimal. Then, for all states q in $\mathcal {G} _u$, the value of the state q in $\mathcal {G} _u$ is positive: $\chi _{\mathcal {G} _u} (q) > 0$.

Proof (Sketch)

Consider a state $q \in Q_u$ and a Player $\textsf{A} $ locally optimal strategy $\textsf{s}_\textsf{A} $ in $\mathcal {G} $ that is positively-optimal from q. Then, the strategy $\textsf{s}_\textsf{A} $ (restricted to $Q_u^+$) can be seen as a strategy in $\mathcal {G} _u$ (it has to be defined in $\mathcal {G} _W$, but this can done straightforwardly). Note that this is only possible because the strategy $\textsf{s}_\textsf{A} $ is locally optimal (due to the definition of $\mathcal {G} _u$). For a Player $\textsf{B} $ strategy $\textsf{s}_\textsf{B} $ in $\mathcal {G} _u$, consider what happens with strategies $\textsf{s}_\textsf{A} $ and $\textsf{s}_\textsf{B} $ in both games $\mathcal {G} _u$ and $\mathcal {G} $. Either the game stays indefinitely in $Q_u$, and what happens in $\mathcal {G} _u$ and $\mathcal {G} $ is identical. Or it eventually leaves $Q_u$, leading to states of value 1 in $\mathcal {G} _u$. Hence, the value of the game $\mathcal {G} _u$ from q with strategies $\textsf{s}_\textsf{A} $ and $\textsf{s}_\textsf{B} $ is at least the value of the game $\mathcal {G} $ from q with the same strategies. Thus, the value of the state q is positive in $\mathcal {G} _u$.

As it turns out, Lemma 5 suffices to deal with both issues and at the same time. Indeed, as stated in Theorem 3 below, it is a general result that in a PI concurrent game, if all states have positive values, then all states have value 1 and there is a subgame almost-surely winning strategy.

Theorem 3

Consider a PI concurrent game $\mathcal {G} $ and assume that all state values are greater than or equal to $c > 0$, i.e. for all $q \in Q$, $\chi _{\mathcal {G} } (q) \ge c$. Then, there is a subgame almost-surely winning strategy in $\mathcal {G} $.

Remark 1

This theorem can be seen as a strengthening of Theorem 1 from [6]. Indeed, this Theorem 1 states that if all states have positive values, then they all have value 1 (this is then generalized to games with countably-many states). Theorem 3 is stronger since it ensures the existence of (subgame) almost-surely winning strategies. Although a detailed proof is provided in the complete version of this paper [5], note that this theorem was already stated and proven in [14] in the context of PI turn-based games. Nevertheless their arguments could have been used verbatim for concurrent games as well. In [5], we give a proof using the same construction (namely, reset strategies) but we argue differently why the construction proves the theorem.

We can now glue together pieces of strategies $\textsf{s}_\textsf{A} ^u$ defined in all games $\mathcal {G} _u$ into a single strategy $\textsf{s}_\textsf{A} [(\textsf{s}_\textsf{A} ^u)_{u \in V_{\mathcal {G} } \setminus \{ 0 \}}]$. Informally, the glued strategy mimics the strategy on $Q_u^+$ and switches strategy when a value area is left and another one is reached.

Definition 17 (Gluing strategies)

Consider a PI concurrent game $\mathcal {G} $ and for all values $u \in V_{\mathcal {G} } \setminus \{ 0 \}$, a strategy $\textsf{s}_\textsf{A} ^u$ in the game $\mathcal {G} _u$. Then, we glue these strategies into the strategy $\textsf{s}_\textsf{A} [(\textsf{s}_\textsf{A} ^u)_{u \in V_{\mathcal {G} } \setminus \{ 0 \}}]: Q^+ \rightarrow \mathcal {D} (A)$ simply written $\textsf{s}_\textsf{A} $ such that, for all $\rho $ ending at state $q \in Q$:

$$\begin{aligned} \textsf{s}_\textsf{A} (\rho ) := {\left\{ \begin{array}{ll} \textsf{s}_\textsf{A} ^u(\pi ) &{} \text { if } u = \chi _{\mathcal {G} } (q) > 0 \text { for } \pi \text { the longest suffix of } \rho \text { in } Q_u^+ \\ \text {is arbitrary } &{} \text { if } \chi _{\mathcal {G} } (q) = 0\\ \end{array}\right. } \end{aligned}$$

As stated in Lemma 6 below, the construction described in Definition 17 transfers almost-surely winning strategies in $\mathcal {G} _u$ into a subgame optimal strategy in $\mathcal {G} $.

Lemma 6

For all $u \in V_{\mathcal {G} } \setminus \{ 0 \}$, let $\textsf{s}_\textsf{A} ^u$ be a subgame almost-surely winning strategy in $\mathcal {G} _u$. The glued strategy $\textsf{s}_\textsf{A} [(\textsf{s}_\textsf{A} ^u)_{u \in V_\mathcal {G} \setminus \{ 0 \}}]$, denoted $\textsf{s}_\textsf{A} $, is subgame optimal in $\mathcal {G} $.

Proof (Sketch)

We apply Theorem 1. First, the strategy $\textsf{s}_\textsf{A} $ is locally optimal in all $Q_u$ for $u > 0$ by the strategy restriction done to define the game $\mathcal {G} _u$ (only optimal strategies are considered at each game in normal form $\mathcal {F}^\textsf{nf}_q$ at states $q \in Q_u$ ). Furthermore, any strategy is optimal in a game in normal form of value 0 (which is the case of the game in normal forms of states in $Q_0$). Second, if the game eventually settles in a value area $Q_u$ for some $u > 0$, from then on the strategy $\textsf{s}_\textsf{A} $ mimics the strategy $\textsf{s}_\textsf{A} ^u$, which is subgame almost-surely winning in $\mathcal {G} _u$. Hence, the probability of W given that the game eventually settles in $Q_u$ is 1. This holds for all $u \in V_{\mathcal {G} } \setminus \{ 0 \}$, so the second condition of Theorem 1 holds.

We now have all the ingredients to prove Theorem 2.

Proof

(Of Theorem 2). We consider the PI concurrent game $\mathcal {G} $ and assume that there is a positively-optimal strategy that is locally optimal. Then, by Lemma 5, for all positive values $u \in V_{\mathcal {G} } \setminus \{ 0 \}$, all states in $\mathcal {G} _u$ have positive values. It follows, by Theorem 3, that there exists a subgame almost-surely winning strategy in every game $\mathcal {G} _u$ for $u \in V_{\mathcal {G} } \setminus \{ 0 \}$. We then obtain a subgame optimal strategy by gluing these strategies together, given by Lemma 6.

The second part of the theorem, dealing with transfer of positionality from subgame almost-surely winning to subgame optimal follows from the fact that if all strategies $\textsf{s}_\textsf{A} ^u$ are positional for all $u \in V_{\mathcal {G} } \setminus \{ 0 \}$, then so is the glued strategy $\textsf{s}_\textsf{A} [(\textsf{s}_\textsf{A} ^u)_{u \in V_{\mathcal {G} } \setminus \{ 0 \}}]$.

We now apply the result of Theorem 2 to two specific classes of objectives: Büchi and co-Büchi objectives. Note that this result is already known for Büchi objectives, proven in [4].

Corollary 1

Consider a concurrent game with a Büchi (resp. co-Büchi) objective and assume that there is a positively-optimal strategy that is locally optimal. Then there is a subgame optimal positional strategy.

Note that it is also possible to prove a memory transfer from subgame almost-surely winning to subgame optimal for an arbitrary memory skeleton, instead of only positional strategies. This adds only a few minor difficulties.

Application to the turn-based setting. The aim of Section 6 was to extend an already existing result on turn-based games in the context of concurrent games. This required an adaptation of the assumptions. However, it is in fact possible to retrieve the original result on turn-based games from Theorem 2 in a fairly straightforward manner. It amounts to show that, in all finite turn-based games $\mathcal {G} $, for all values $u \in V_{\mathcal {G} } \setminus \{ 0 \}$, there is a locally optimal strategy that is positively-optimal from all states in $Q_u$.

7 Finite-choice strategies

In this section, we introduce a new kind of strategies, namely finite-choice strategies. Let us first motivate why we consider such strategies. Consider again the co-Büchi game of Figure 1. Recall that the optimal strategy we described first plays the top row with increasing probability and the middle row with decreasing probability and then, once Player $\textsf{B} $ plays the second column, switches to a positional strategy playing the bottom row with positive, yet small enough probability. Note that switching strategy is essential. Indeed, if Player $\textsf{A} $ does not switch , Player $\textsf{B} $ could at some point opt for the middle column and see indefinitely the state $q_1'$ with very high probability . In fact, what happens in that case is rather counter-intuitive: once Player $\textsf{B} $ switches, there is infinitely often a positive probability to reach the outcome of value 1. However, the probability to ever reaching this outcome can be arbitrarily small, if Player $\textsf{B} $ waits long enough before playing the middle row. This happens because the probability $\varepsilon _k$ to visit that outcome goes (fast) to 0 when k goes to $\infty $. In fact, such an optimal strategy has “infinite choice” in the sense that it may prescribe infinitely many different probability distribution.

In this section, we consider finite-choice strategies, i.e. strategies that can use only finitely many $\textsf{GF}$-strategies at each state.

Definition 18 (Finite-choice strategy)

Let $\mathcal {G} $ be a concurrent game. A Player $\textsf{A} $ strategy $\textsf{s}_\textsf{A} $ in $\mathcal {G} $ has finite choice if, for all $q \in Q$, the set $S^{\textsf{s}_\textsf{A} }_q := \{ \textsf{s}_\textsf{A} (\rho \cdot q) \mid \rho \in Q^+ \} \subseteq \mathcal {D} (A_q)$ is finite.

Note that positional (even finite-memory) and deterministic strategies are examples of finite-choice strategies.

Interestingly, we can link finite-choice strategies with the existence of subgame optimal strategies. In general it does not hold that if there are optimal strategies, then there exists subgame optimal strategies (as exemplified in the game of Figure 1). However, in Theorem 4 below, we state that if we additionally assume that the optimal strategy considered has finite choice, then there is a subgame optimal strategy (that has also finite choice).

Theorem 4

Consider a PI concurrent game $\mathcal {G} $. If there is a finite-choice optimal strategy, then there is a finite-choice subgame optimal strategy.

Proof (Sketch)

Consider such an optimal finite-choice strategy $\textsf{s}_\textsf{A} $. In particular, note that there is a constant $c > 0$ such that for all $\rho \cdot q \in Q^+$, for all $a \in A_q$ we have: $\textsf{s}_\textsf{A} (\rho \cdot q)(q) > 0 \Rightarrow \textsf{s}_\textsf{A} (\rho \cdot q)(q) \ge c$. We build a subgame optimal strategy $\textsf{s}_\textsf{A} '$ in the following way: for all $\rho = \rho ' \cdot q \in Q^+$, if the residual strategy $\textsf{s}_\textsf{A} ^\rho $ is optimal, then $\textsf{s}_\textsf{A} '(\rho ) := \textsf{s}_\textsf{A} (\rho )$, otherwise $\textsf{s}_\textsf{A} '(\rho ) := \textsf{s}_\textsf{A} (q)$ (i.e. we reset the strategy). Straightforwardly, the strategy $\textsf{s}_\textsf{A} '$ has finite choice. We want to apply Theorem 1 to prove that it is subgame optimal. One can see that it is locally optimal (by the criterion chosen for resetting the strategy). Consider now some $\rho \in Q^+$ ending at state $q \in Q$ and another state $q' \in Q$. Assume that the residual strategy $\textsf{s}_\textsf{A} ^\rho $ is optimal but that the residual strategy $\textsf{s}_\textsf{A} ^{\rho \cdot q'}$ is not. Then, similarly to why local optimality is necessary for subgame optimality (see Proposition 1), one can show that any Player $\textsf{B} $ action b leading to $q'$ from $\rho $ with positive probability is such that $\chi _{\mathcal {G} } (q) < \textsf{out}_{\mathcal {F}^\textsf{nf}_q}(\textsf{s}_\textsf{A} (\rho ),b)$. Hence, there is positive probability from $\rho $, if Player $\textsf{B} $ opts for the action b, to reach a state of value different from $u = \chi _{\mathcal {G} } (q)$. And if this happens infinitely often, a state of value different from u will be reached almost-surely^{Footnote 3}. In other words, if a value area is never left, almost-surely, the strategy $\textsf{s}_\textsf{A} '$ only resets finitely often.

Consider now some $\rho \in Q^+$, a Player $\textsf{B} $ deterministic strategy $\textsf{s}_\textsf{B} $ and a value $u \in V_\mathcal {G} \setminus \{ 0 \}$. From what we argued above, the probability of the event $Q^* \cdot (Q_u)^\omega $ (resp. $W \cap Q^* \cdot (Q_u)^\omega $) is the same if we intersect it with the fact that the strategy $\textsf{s}_\textsf{A} '$ only resets finitely often. Furthermore, if the strategy does not reset anymore from some point on, and all states have the same value $u > 0$, then it follows that the probability of W is 1 (since W is PI). We can then conclude by applying Theorem 1.

Finite-choice strategies are interesting for another reason. In the previous section, we applied the memory transfer from Theorem 2 to the Büchi and co-Büchi objectives. We did not apply it to other objectives – in particular to the parity objective. Indeed, in general, contrary to the case of turn-based games, infinite-memory is necessary to be almost-surely winning in parity games. This happens in Figure 2 (already described in [12]) where the objective of Player $\textsf{A} $ is to see $q_1$ infinitely often, while seeing $q_2$ only finitely often. Let us describe a Player $\textsf{A} $ subgame almost-surely winning strategy. The top row is played with probability $1 - \varepsilon _k$ and the bottom row is played with probability $\varepsilon _k > 0$ with $\varepsilon _k$ going to 0 when k goes to $\infty $ (the ($\varepsilon _k$) used in the game in Figure 1 works here as well) where k denotes the number of times the state $q_0$ is seen. Such a strategy is subgame almost-surely winning and does not have finite choice. In fact, it can be shown that all Player $\textsf{A} $ finite-choice strategies have value 0 in that game.

Interestingly, the transfer of memory of Theorem 2 is adapted in Theorem 5 with the memory that is sufficient in turn-based games – for those PI objectives that have a “neutral color”– if we additionally assume that the subgame optimal strategy considered has finite choice. First, let us define what is meant by “neutral color”, then we define the turn-based version of PSAW.

Definition 19 (Objective with a neutral color)

Consider a set of colors $\textsf{K} $ and a PI objective $W \subseteq \textsf{K} ^\omega $. It has a neutral color if there is some (neutral) color $k \in \textsf{K} $ such that, for all $\rho = \rho _0 \cdot \rho _1 \cdots \in \textsf{K} ^\omega $, we have $\rho \in W \Leftrightarrow \rho _0 \cdot k \cdot \rho _1 \cdot k \cdots \in W$.

Definition 20

(PASW objective in turn-based games). Consider a PI objective $W \subseteq \textsf{K} ^\omega $. It is positionally subgame almost-surely winnable in turn-based games (PSAWT for short) if in all turn-based games $\mathcal {G} = \langle \mathcal {C} ,W \rangle $ where there is a subgame almost-surely winning strategy, there is a positional one.

Theorem 5

Consider a PSAWT PI objective $W \subseteq \textsf{K} ^\omega $ with a neutral color and a concurrent game $\mathcal {G} $ with objective W. Assume there is a subgame optimal strategy that has finite choice. Then, there is a positional one.

Proof (Sketch)

A finite-choice strategy $\textsf{s}_\textsf{A} $ plays only among a finite number of $\textsf{GF}$-strategies at each state. The idea is therefore to modify the game $\mathcal {G} _u$ of the previous subsection into a game $\mathcal {G} '_u$ by transforming it into a (finite) turn-based game. At each state, Player $\textsf{A} $ chooses first her $\textsf{GF}$-strategy. She can choose among only a finite number of them: she has at her disposal, at a state q, only optimal $\textsf{GF}$-strategies in $S^{\textsf{s}_\textsf{A} }_q$ (recall Definition 18) . We consider the objective W in that new arena where Player $\textsf{B} $ states are colored with a neutral color. The existence, in $\mathcal {G} $, of a subgame optimal strategy that has finite choice ensures that all states in $\mathcal {G} _u'$ have positive values. We can then conclude as for Theorem 2: a subgame optimal strategy can be obtained by gluing together subgame almost-surely winning strategies in the (turn-based) games $\mathcal {G} _u'$ (that can be chosen positional by assumption).

As an application, one can realize that the parity, mean-payoff and generalized Büchi objectives have a neutral color and are PSAWT ([7, 11, 16]). Hence, for these objectives, if there exists an optimal strategy that has finite choice, then there is one that is positional.

Corollary 2

Consider a concurrent game $\mathcal {G} $ with a parity (resp. mean-payoff, resp. generalized BüÚchi) objective. Assume that there is an optimal strategy that has finite choice in $\mathcal {G} $. Then, there is a positional one.

Notes

1.
The proof was only presented for a specific class of objectives.
2.
Note that this is different from what we did in the previous section: there, we established a necessary and sufficient condition for a specific strategy to be subgame optimal. Here, given a game, we consider necessary and sufficient conditions on the game for the existence of a subgame optimal strategy.
3.
This holds because the strategy $\textsf{s}_\textsf{A} $ has finite choice: the probability to see a state of different value is bounded below by the product of c and the smallest positive probability among all Nature states.

References

Roderick Bloem, Krishnendu Chatterjee, and Barbara Jobstmann. Handbook of Model Checking, chapter Graph games and reactive synthesis, pages 921–962. Springer, 2018.
Google Scholar
Benjamin Bordais, Patricia Bouyer, and Stéphane Le Roux. From local to global determinacy in concurrent graph games. In Mikolaj Bojanczyk and Chandra Chekuri, editors, 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2021, December 15-17, 2021, Virtual Conference, volume 213 of LIPIcs, pages 41:1–41:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
Google Scholar
Benjamin Bordais, Patricia Bouyer, and Stéphane Le Roux. Optimal strategies in concurrent reachability games. In Florin Manea and Alex Simpson, editors, 30th EACSL Annual Conference on Computer Science Logic, CSL 2022, February 14-19, 2022, Göttingen, Germany (Virtual Conference), volume 216 of LIPIcs, pages 7:1–7:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
Google Scholar
Benjamin Bordais, Patricia Bouyer, and Stéphane Le Roux. Playing (almost-)optimally in concurrent büchi and co-büchi games. CoRR, abs/2203.06966, 2022.
Google Scholar
Benjamin Bordais, Patricia Bouyer, and Stéphane Le Roux. Sub-game optimal strategies in concurrent games with prefix-independent objectives. CoRR, abs/2301.10697, 2023.
Google Scholar
Krishnendu Chatterjee. Concurrent games with tail objectives. Theor. Comput. Sci., 388(1-3):181–198, 2007.
Google Scholar
Krishnendu Chatterjee, Luca de Alfaro, and Thomas A. Henzinger. Trading memory for randomness. In 1st International Conference on Quantitative Evaluation of Systems (QEST 2004), 27-30 September 2004, Enschede, The Netherlands, pages 206–217. IEEE Computer Society, 2004.
Google Scholar
Krishnendu Chatterjee, Luca de Alfaro, and Thomas A. Henzinger. The complexity of quantitative concurrent parity games. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 678–687. ACM Press, 2006.
Google Scholar
Krishnendu Chatterjee, Laurent Doyen, Hugo Gimbert, and Thomas A. Henzinger. Randomness for free. Inf. Comput., 245:3–16, 2015.
Google Scholar
Krishnendu Chatterjee and Rasmus Ibsen-Jensen. Qualitative analysis of concurrent mean-payoff games. Inf. Comput., 242:2–24, 2015.
Google Scholar
Krishnendu Chatterjee, Marcin Jurdzinski, and Thomas A. Henzinger. Quantitative stochastic parity games. In J. Ian Munro, editor, Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 11-14, 2004, pages 121–130. SIAM, 2004.
Google Scholar
Luca de Alfaro and Thomas A. Henzinger. Concurrent omega-regular games. In 15th Annual IEEE Symposium on Logic in Computer Science, Santa Barbara, California, USA, June 26-29, 2000, pages 141–154. IEEE Computer Society, 2000.
Google Scholar
Hugh Everett. Recursive games. Annals of Mathematics Studies – Contributions to the Theory of Games, 3:67–78, 1957.
Google Scholar
Hugo Gimbert and Florian Horn. Solving simple stochastic tail games. In Moses Charikar, editor, Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 847–862. SIAM, 2010.
Google Scholar
Marta Kwiatkowska, Gethin Norman, Dave Parker, and Gabriel Santos. Automatic verification of concurrent stochastic systems. Formal Methods in System Design, 58:188–250, 2021.
Google Scholar
Thomas M Liggett and Steven A Lippman. Stochastic games with perfect information and time average payoff. Siam Review, 11(4):604–607, 1969.
Google Scholar
Donald A. Martin. The determinacy of blackwell games. The Journal of Symbolic Logic, 63(4):1565–1581, 1998.
Google Scholar
Lloyd S Shapley and RN Snow. Basic solutions of discrete games. Contributions to the Theory of Games, 1(24):27–27, 1950.
Google Scholar
Wolfgang Thomas. Infinite games and verification. In Proc. 14th International Conference on Computer Aided Verification (CAV’02), volume 2404 of Lecture Notes in Computer Science, pages 58–64. Springer, 2002. Invited Tutorial.
Google Scholar
John von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton Univ. Press, Princeton, 1944.
Google Scholar

Download references

Author information

Authors and Affiliations

Université Paris-Saclay, CNRS, ENS Paris-Saclay, LMF, 91190, Gif-sur-Yvette, France
Benjamin Bordais, Patricia Bouyer & Stéphane Le Roux

Authors

Benjamin Bordais
View author publications
You can also search for this author in PubMed Google Scholar
Patricia Bouyer
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Le Roux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Bordais .

Editor information

Editors and Affiliations

The Hebrew University of Jerusalem, Jerusalem, Israel
Orna Kupferman
Tallinn University of Technology, Tallinn, Estonia
Pawel Sobocinski

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bordais, B., Bouyer, P., Roux, S.L. (2023). Subgame Optimal Strategies in Finite Concurrent Games with Prefix-Independent Objectives. In: Kupferman, O., Sobocinski, P. (eds) Foundations of Software Science and Computation Structures. FoSSaCS 2023. Lecture Notes in Computer Science, vol 13992. Springer, Cham. https://doi.org/10.1007/978-3-031-30829-1_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-30829-1_26
Published: 21 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30828-4
Online ISBN: 978-3-031-30829-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Subgame Optimal Strategies in Finite Concurrent Games with Prefix-Independent Objectives

Abstract

Similar content being viewed by others

The Complexity of Ergodic Mean-payoff Games

The Complexity of Infinitely Repeated Alternating Move Games

Solving Mean-Payoff Games via Quasi Dominions

1 Introduction

2 Preliminaries

3 Game forms

Definition 1 (Game form and game in normal form)

Definition 2 (Outcome of a game in normal form)

Definition 3 (Value of a game in normal form and optimal strategies)

4 Concurrent games and optimal strategies

4.1 Concurrent arenas and strategies

Definition 4 (Finite stochastic concurrent arena)

Definition 5 (Finite stochastic concurrent game)

Definition 6 (Prefix-independent objectives)

Definition 7 (Parity, Büchi, co-Büchi objectives)

Definition 8 (Strategies)

Definition 9 (Value of strategies and of the game)

Definition 10 (Value area)

Definition 11 (Local interactions, Lifting valuations)

Proposition 1

4.2 More on strategies

Definition 12 ((Positively-) optimal strategies)

Definition 13 (Residual and Subgame Optimal Strategies)

Definition 14 (Positional, Deterministic strategies)

5 Necessary and sufficient condition for subgame optimality

Definition 15 (Locally optimal strategies)

Lemma 1

Lemma 2

Proof (Sketch)

Lemma 3

Proof (Sketch)

Lemma 4

Theorem 1

Proof (Sketch)

6 From subgame almost-surely winning to subgame optimality

Definition 16 (Positionally subgame almost-surely winnable objective)

Theorem 2

Proposition 2

Proof (Sketch)

Lemma 5

Proof (Sketch)

Theorem 3

Remark 1

Definition 17 (Gluing strategies)

Lemma 6

Proof (Sketch)

Proof

Corollary 1

7 Finite-choice strategies

Definition 18 (Finite-choice strategy)

Theorem 4

Proof (Sketch)

Definition 19 (Objective with a neutral color)

Definition 20

Theorem 5

Proof (Sketch)

Corollary 2

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation