Automatic Verification of Concurrent Stochastic Systems

Automated verification techniques for stochastic games allow formal reasoning about systems that feature competitive or collaborative behaviour among rational agents in uncertain or probabilistic settings. Existing tools and techniques focus on turn-based games, where each state of the game is controlled by a single player, and on zero-sum properties, where two players or coalitions have directly opposing objectives. In this paper, we present automated verification techniques for concurrent stochastic games (CSGs), which provide a more natural model of concurrent decision making and interaction. We also consider (social welfare) Nash equilibria, to formally identify scenarios where two players or coalitions with distinct goals can collaborate to optimise their joint performance. We propose an extension of the temporal logic rPATL for specifying quantitative properties in this setting and present corresponding algorithms for verification and strategy synthesis for a variant of stopping games. For finite-horizon properties the computation is exact, while for infinite-horizon it is approximate using value iteration. For zero-sum properties it requires solving matrix games via linear programming, and for equilibria-based properties we find social welfare or social cost Nash equilibria of bimatrix games via the method of labelled polytopes through an SMT encoding. We implement this approach in PRISM-games, which required extending the tool's modelling language for CSGs, and apply it to case studies from domains including robotics, computer security and computer networks, explicitly demonstrating the benefits of both CSGs and equilibria-based properties.


Introduction
Stochastic multi-player games are a versatile modelling framework for systems that exhibit cooperative or competitive behaviour in the presence of adversarial or uncertain environments. They can be viewed as a collection of players (agents) with strategies for determining their actions based on the execution so far. These models combine nondeterminism, representing the adversarial, cooperative and competitive choices, stochasticity, modelling uncertainty due to noise, failures or randomness, and concurrency, representing simultaneous execution of interacting agents. Examples of such systems appear in many domains, from robotics and autonomous transport, to security and computer networks. A game-theoretic approach also facilitates the design of protocols that use penalties or incentives to ensure robustness against selfish participants. However, the complex interactions involved in such systems make their correct construction a challenge.
Formal verification for stochastic games provides a means of producing quantitative guarantees on the correctness of these systems (e.g. "the control software can always safely stop the vehicle with probability at least 0.99, regardless of the actions of other road users"), where the required behavioural properties are specified precisely in quantitative extensions of temporal logic. The closely related problem of strategy synthesis constructs an optimal strategy for a player, or coalition of players, which guarantees that such a property is satisfied.
A variety of verification algorithms for stochastic games have been devised, e.g., [17,77,3,4,18]. In recent years, further progress has been made: verification and strategy synthesis algorithms have been developed for various temporal logics [23,9,45,42] and implemented in the PRISM-games tool [51], an extension of the PRISM probabilistic model checker [44]. This has allowed modelling and verification of stochastic games to be used for a variety of non-trivial applications, in which competitive or collaborative behaviour between entities is a crucial ingredient, including computer security and energy management.
A limitation of the techniques implemented in PRISM-games to date is that they focus on turn-based stochastic multi-player games (TSGs), whose states are partitioned among a set of players, with exactly one player taking control of each state. In this paper, we propose and implement techniques for concurrent stochastic multi-player games (CSGs), which generalise TSGs by permitting players to choose their actions simultaneously in each state. This provides a more realistic model of interactive agents operating concurrently, and making action choices without already knowing the actions being taken by other agents. Although algorithms for CSGs have been known for some time (e.g., [3,4,18]), their implementation and application to real-world examples has been lacking.
A further limitation of existing work is that it focuses on zero-sum properties, in which one player (or a coalition of players) aims to optimise some objective, while the remaining players have the directly opposing goal. In PRISM-games, properties are specified in the logic rPATL (probabilistic alternating-time temporal logic with rewards) [23], a quantitative extension of the game logic ATL [5]. This allows us to specify that a coalition of players can achieve a high-level objective, regarding the probability of an event's occurrence or the expectation of reward measure, irrespective of the other players' strategies. Extensions have allowed players to optimise multiple objectives [24,9], but again in a zero-sum fashion.
In this work, we move beyond zero-sum properties and consider situations where two players (or two coalitions of players) in a CSG have distinct objectives to be maximised or minimised. The goals of the players (or coalitions) are not necessarily directly opposing, and so it may be beneficial for players to collaborate. For these nonzero-sum scenarios, we use the well studied notion of Nash equilibria (NE), where it is not beneficial for any player to unilaterally change their strategy. In particular, we use subgame-perfect NE [63], where this equilibrium criterion holds in every state of the game, and we focus on two specific variants of equilibria: social welfare and social cost NE, which maximise and minimise, respectively, the sum of the objectives of the players.
We propose an extension of the rPATL logic which adds the ability to express quantitative nonzero-sum properties based on these notions of equilibria, for example "the two robots have navigation strategies which form a (social cost) Nash equilibrium, and under which the combined expected energy usage until completing their tasks is below k". We also include some additional reward properties that have proved to be useful when applying our methods to various case studies.
We provide a formal semantics for the new logic and propose algorithms for CSG verification and strategy synthesis for a variant of stopping games, including both zero-sum and nonzero-sum properties. Our algorithms extend the existing approaches for rPATL model checking, and employ a combination of exact computation through backward induction for finite-horizon properties and approximate computation through value iteration for infinite-horizon properties. Both approaches require the solution of games for each state of the model in each iteration of the computation: we solve matrix games for the zero-sum case and find optimal NE for bimatrix games for the nonzero-sum case. The former can be done with linear programming; we perform the latter using labelled polytopes [52] and a reduction to SMT.
We have implemented our verification and strategy synthesis algorithms in a new release, version 3.0, of PRISM-games [48], extending both the modelling and property specification languages to support CSGs and nonzero-sum properties. In order to investigate the performance, scalability and applicability of our techniques, we have developed a large selection of case studies taken from a diverse set of application domains including: finance, computer security, computer networks, communication systems, robot navigation and power control.
These illustrate examples of systems whose modelling and analysis requires stochasticity and competitive or collaborative behaviour between concurrent components or agents. We demonstrate that our CSG modelling and verification techniques facilitate insightful analysis of quantitative aspects of such systems. Specifically, we show cases where CSGs allow more accurate modelling of concurrent behaviour than their turn-based counterparts and where our equilibria-based extension of rPATL allows us to synthesise better performing strategies for collaborating agents than can be achieved using the zero-sum version.
The paper combines and extends the conference papers [45] and [46]. In particular, we: (i) introduce the definition of social cost Nash equilibria for CSGs and model checking algorithms for verifying temporal logic specifications using this definition; (ii) provide additional details and proofs of model checking algorithms, for example for combinations of finite-and infinite-horizon objectives; (iii) present an expanded experimental evaluation, including a wider range of properties, ex-tended analysis of the case studies and a more detailed evaluation of performance, including efficiency improvements with respect to [45,46]. Related work. Various verification algorithms have been proposed for CSGs, e.g. [3,4,18], but without implementations, tool support or case studies. PRISMgames 2.0 [51], which we have built upon in this work, provided modelling and verification for a wide range of properties of stochastic multi-player games, including those in the logic rPATL, and multi-objective extensions of it, but focusing purely on the turn-based variant of the model (TSGs) in the context of two-coalitional zero-sum properties. GIST [21] allows the analysis of ω-regular properties on probabilistic games, but again focuses on turn-based, not concurrent, games. GAVS+ [25] is a general-purpose tool for algorithmic game solving, supporting TSGs and (non-stochastic) concurrent games, but not CSGs. Three further tools, PRALINE [14], EAGLE [76] and EVE [34], support the computation of NE [58] for the restricted class of (non-stochastic) concurrent games. In addition, EVE has recently been extended to verify if an LTL property holds on some or all NE [35]. Computing NE is also supported by MCMAS-SLK [16] via strategy logic and general purpose tools such as Gambit [57] can compute a variety of equilibria but, again, not for stochastic games.
Work concerning nonzero-sum properties includes [22,77], in which the existence of and the complexity of finding NE for stochastic games is studied, but without practical algorithms. The complexity of finding subgame-perfect NE for quantitative reachability properties is studied in [15], while [33] considers the complexity of equilibrium design for temporal logic properties and lists social welfare requirements and implementation as future work. In [67], a learning-based algorithm for finding NE for discounted properties of CSGs is presented and evaluated. Similarly, [53] studies NE for discounted properties and introduces iterative algorithms for strategy synthesis. A theoretical framework for price-taking equilibria of CSGs is given in [6], where players try to minimise their costs which include a price common to all players and dependent on the decisions of all players. A notion of strong NE for a restricted class of CSGs is formalised in [27] and an approximation algorithm for checking the existence of such NE for discounted properties is introduced and evaluated. The existence of stochastic equilibria with imprecise deviations for CSGs and a PSPACE algorithm to compute such equilibria is considered in [12]. Finally, we mention the fact that the concept of equilibrium is used to analyze different applications such as cooperation among agents in stochastic games [39] and to design protocols based on quantum secret sharing [69].

Preliminaries
We begin with some basic background from game theory, and then describe CSGs, illustrating each with examples. For any finite set X, we will write Dist(X) for the set of probability distributions over X and for any vector v ∈ Q n for n ∈ N we use v(i) to denote the ith entry of the vector.

Game Theory Concepts
We first introduce normal form games, which are simple one-shot games where players make their choices concurrently.
Definition 1 (Normal form game) A (finite, n-person) normal form game (NFG) is a tuple N = (N, A, u) where: -N = {1, . . . , n} is a finite set of players; -A = A 1 × · · · × An and A i is a finite set of actions available to player i ∈ N ; -u = (u 1 , . . . , un) and u i : A → Q is a utility function for player i ∈ N .
In a game N, players select actions simultaneously, with player i ∈ N choosing from the action set A i . If each player i selects action a i , then player j receives the utility u j (a 1 , . . . , an).
Definition 2 (Strategies and strategy profile) A (mixed) strategy σ i for player i in an NFG N is a distribution over its action set, i.e., σ i ∈ Dist(A i ). We let Σ i N denote the set of all strategies for player i. A strategy profile (or just profile) σ = (σ 1 , . . . , σn) is a tuple of strategies for each player.
A two-player NFG is also called a bimatrix game as it can be represented by two distinct matrices Z 1 , Z 2 ∈ Q l×m where A 1 = {a 1 , . . . , a l }, A 2 = {b 1 , . . . , bm}, z 1 ij = u 1 (a i , b j ) and z 2 ij = u 2 (a i , b j ). A two-player NFG is constant-sum if there exists c ∈ Q such that u 1 (α)+u 2 (α) = c for all α ∈ A and zero-sum if c = 0. A zero-sum, two-player NFG is often called a matrix game as it can be represented by a single matrix Z ∈ Q l×m where A 1 = {a 1 , . . . , a l }, A 2 = {b 1 , . . . , bm} and z ij = u 1 (a i , b j ) = −u 2 (a i , b j ). For zerosum, two-player NFGs, in the bimatrix game representation we have Z 1 = −Z 2 .

Matrix Games
We require the following classical result concerning matrix games, which introduces the notion of the value of a matrix game (and zero-sum NFG).
Theorem 1 (Minimax theorem [59,60]) For any zero-sum NFG N = (N, A, u) and corresponding matrix game Z, there exists v ⋆ ∈ Q, called the value of the game and denoted val (Z), such that: there is a strategy σ ⋆ 1 for player 1, called an optimal strategy of player 1, such that under this strategy the player's expected utility is at least v ⋆ regardless of the strategy of player 2, i.e. inf σ2∈Σ 2 there is a strategy σ ⋆ 2 for player 2, called an optimal strategy of player 2, such that under this strategy the player's expected utility is at least −v ⋆ regardless of the strategy of player 1, i.e. inf σ1∈Σ 1 The value of a matrix game Z ∈ Q l×m can be found by solving the following linear programming (LP) problem [59,60]. Maximise v subject to the constraints: In addition, the solution for (x 1 , . . . , x l ) yields an optimal strategy for player 1.
The value of the game can also be found by solving the following dual LP problem.
Minimise v subject to the constraints: 0 for all 1 j m y 1 + · · · + ym = 1 and in this case the solution (y 1 , . . . , ym) yields an optimal strategy for player 2.
Example 1. Consider the (zero-sum) NFG corresponding to the well known rockpaper-scissors game, where each player i ∈ {1, 2} chooses "rock" (r i ), "paper" (p i ) or "scissors" (s i ). The matrix game representation is: where the utilities for winning, losing and drawing are 1, −1 and 0 respectively. The value for this matrix game is the solution to the following LP problem. Maximise v subject to the constraints: which yields the value v ⋆ = 0 with optimal strategy σ ⋆ 1 = (1/3, 1/3, 1/3) for player 1 (the optimal strategy for player 2 is the same).
The notion of SWNE is standard [61] and corresponds to the case where utility values represent profits or rewards. We introduce the dual notion of SCNE for the case where utility values correspond to losses or costs. In our experience of modelling with stochastic games, such situations are common: example objectives in this category include minimising the probability of a fault occurring or minimising the expected time to complete a task. Representing SCNE directly is a more natural approach than the alternative of simply negating utilities, as above.
The following demonstrates the relationship between SWNE and SCNE.
Lemma 1 can be used to reduce the computation of SCNE profiles and values to those of SWNE profiles and values (or vice versa). This is achieved by negating all utilities in the NFG or bimatrix game, computing an SWNE profile and corresponding SWNE values, and then negating the SWNE values to obtain an SCNE profile and corresponding SCNE values for the original NFG or bimatrix game.
Finding NE and NE values in bimatrix games is in the class of linear complementarity problems (LCPs). More precisely, (σ 1 , σ 2 ) is an NE profile and (u, v) are the corresponding NE values of the bimatrix game Z 1 , Z 2 ∈ Q l×m where A 1 = {a 1 , . . . , a l }, A 2 = {b 1 , . . . , bm} if and only if for the column vectors x ∈ Q l and y ∈ Q m where x i = σ 1 (a i ) and y j = σ 2 (b j ) for 1 i l and 1 j m, we have: and 0 and 1 are vectors or matrices with all components 0 and 1, respectively. Example 2. We consider a nonzero-sum stag hunt game [64] where, if players decide to cooperate, this can yield a large utility, but if the others do not, then the cooperating player gets nothing while the remaining players get a small utility. A scenario with 3 players, where two form a coalition (assuming the role of player 2), yields a bimatrix game: where nc i and c i represent player 1 and coalition 2 not cooperating and cooperating, respectively, and hc 2 represents half the players in the coalition cooperating. A strategy profile σ * = ((x 1 , x 2 , x 3 ), (y 1 , y 2 )) is an NE and (u, v) the corresponding NE values of the game if and only if, from Eqn (1) and (2): and, from Eqn (3) and (4): There are three solutions to this LCP problem which correspond to the following NE profiles: player 1 and the coalition pick nc 1 and nc 2 , respectively, with NE values (2, 4); -player 1 selects nc 1 and c 1 with probabilities 5/9 and 4/9 and the coalition selects nc 2 and c 2 with probabilities 2/3 and 1/3, with NE values (2, 4); -player 1 and the coalition select c 1 and c 2 , respectively, with NE values (6,9).
For instance, in the first case, neither player 1 nor the coalition believes the other will cooperate: the best they can do is act alone. The third maximises the joint utility and is the only SWNE profile, with corresponding SWNE values (6,9).
To find SCNE profiles and SCNE values for the same set of utility functions, using Lemma 1 we can negate all the utilities of the players in the game and look for NE profiles in the resulting bimatrix game; again, there are three: player 1 and the coalition select c 1 and nc 2 , respectively, with NE values (0, −4); -player 1 selects nc 1 and c 1 with probabilities 1/2 and 1/2 and the coalition selects nc 2 and hc 2 with probabilities 1/2 and 1/2, with NE values (−2, −4); -player 1 and the coalition select nc 1 and c 2 , respectively, with NE values (−2, 0).
The third is the only SCNE profile, with corresponding SCNE values (2, 0).
In this work, we compute the SWNE values for a bimatrix game (or, via Lemma 1, the SCNE values) by first identifying all the NE values of the game. For this, we use the Lemke-Howson algorithm [52], which is based on the method of labelled polytopes [61]. Other well-known methods include those based on support enumeration [66] and regret minimisation [71]. Given a bimatrix game Z 1 , Z 2 ∈ Q l×m , we denote the sets of deterministic strategies of players 1 and 2 by I = {1, . . . , l} and M = {1, . . . , m} and define J = {l+1, . . . , l+m} by mapping j ∈ M to l+j ∈ J. A label is then defined as an element of I ∪ J. The sets of strategies for players 1 and 2 can be represented by: The strategy set Y is then divided into regions Y (i) and Y (j) (polytopes) for i ∈ I and j ∈ J such that Y (i) contains strategies for which the deterministic strategy i of player 1 is a best response and Y (j) contain strategies which choose action j with probability 0: where Z 1 (i, :) is the ith row vector of Z 1 . A vector y is then said to have label k if y ∈ Y (k), for k ∈ I ∪ J. The strategy set X is divided analogously into regions X(j) and X(i) for j ∈ J and i ∈ I and a vector x has label k if x ∈ X(k), for k ∈ I ∪ J. A pair of vectors (x, y) ∈ X×Y is completely labelled if the union of the labels of x and y equals I ∪ J.
The NE profiles of the game are the vector pairs that are completely labelled [52,74]. The corresponding NE values can be computed through matrixvector multiplication. A SWNE profile and corresponding SWNE values can then be found through an NE profile with NE values that maximise the sum.

Concurrent Stochastic Games
We now define concurrent stochastic games [73], where players repeatedly make simultaneous choices over actions that update the game state probabilistically. -N = {1, . . . , n} is a finite set of players; -S is a finite set of states andS ⊆ S is a set of initial states; -A = (A 1 ∪ {⊥})× · · · ×(An ∪ {⊥}) where A i is a finite set of actions available to player i ∈ N and ⊥ is an idle action disjoint from the set Ai is an action assignment function; -δ : S×A → Dist(S) is a probabilistic transition function; -AP is a set of atomic propositions and L : S → 2 AP is a labelling function.
A CSG G starts in an initial states ∈S and, when in state s, each player i ∈ N selects an action from its available actions A i (s) def = ∆(s) ∩ A i if this set is nonempty, and from {⊥} otherwise. Supposing each player i selects action a i , the state of the game is updated according to the distribution δ(s, (a 1 , . . . , an)). A CSG is a turn-based stochastic multi-player game (TSG) if for any state s there is precisely one player i for which A i (s) = {⊥}. Furthermore, a CSG is a Markov decision process (MDP) if there is precisely one player i such that A i (s) = {⊥} for all states s.
A path π of G is a sequence π = s 0 α0 − − → s 1 α1 − − → · · · where s i ∈ S, α i ∈ A and δ(s i , α i )(s i+1 ) > 0 for all i 0. We denote by π(i) the (i+1)th state of π, π[i] the action associated with the (i+1)th transition and, if π is finite, last (π) the final state. The length of a path π, denoted |π|, is the number of transitions appearing in the path. Let FPaths G and IPaths G (FPaths G,s and IPaths G,s ) be the sets of finite and infinite paths (starting in state s).
We augment CSGs with reward structures of the form r = (r A , r S ), where r A : S×A → Q is an action reward function (which maps each state and action tuple pair to a rational value that is accumulated when the action tuple is selected in the state) and r S : S → Q is a state reward function (which maps each state to a rational value that is incurred when the state is reached). We allow both positive and negative rewards; however, we will later impose certain restrictions to ensure the correctness of our model checking algorithms.
A strategy for a player in a CSG resolves the player's choices in each state. These choices can depend on the history of the CSG's execution and can be randomised. Formally, we have the following definition.
. We denote by Σ i G the set of all strategies for player i. As for NFGs, a strategy profile for G is a tuple σ = (σ 1 , . . . , σn) of strategies for all players and, for player i and strategy σ ′ i , we define the sequence σ −i and profile σ −i [σ ′ i ] in the same way. For strategy profile σ = (σ 1 , . . . , σn) and state s, we let FPaths σ G,s and IPaths σ G,s denote the finite and infinite paths from s under the choices of σ. We can define a probability measure Prob σ G,s over the infinite paths IPaths σ G,s [43]. This construction is based on first defining the probabilities for finite paths from the probabilistic transition function and choices of the strategies in the profile. More precisely, for a finite path π = s 0 where s 0 = s, the probability of π under the profile σ is defined by: Next, for each finite path π, we define the basic cylinder C σ (π) that consists of all infinite paths in IPaths σ G,s that have π as a prefix. Finally, using properties of cylinders, we can then construct the probability space (IPaths σ G,s , F σ s , Prob σ G,s ), where F σ s is the smallest σ-algebra generated by the set of basic cylinders {C σ (π) | π ∈ IPaths σ G,s } and P rob σ G,s is the unique measure such that P rob σ G,s (C σ (π)) = P σ (π) for all π ∈ FPaths σ G,s . For random variable X : IPaths G → Q, we can then define for any profile σ and state s the expected value E σ G,s (X) of X in s with respect to σ. These random variables X represent an objective (or utility function) for a player, which includes both finite-horizon and infinite-horizon properties. Examples of finite-horizon properties include the probability of reaching a set of target states T within k steps or the expected reward accumulated over k steps. These properties can be expressed by the random variables: respectively. Examples of infinite-horizon properties include the probability of reaching a target set T and the expected cumulative reward until reaching a target set T (where paths that never reach the target have infinite reward), which can be expressed by the random variables: Let us first focus on zero-sum games, which are by definition two-player games. As for NFGs (see Definition 1), for a two-player CSG G and a given objective X, we can consider the case where player 1 tries to maximise the expected value of X, while player 2 tries to minimise it. The above definition yields the value of G with respect to X if it is determined, i.e., if the maximum value that player 1 can ensure equals the minimum value that player 2 can ensure. Since the CSGs we consider are finite state and finitely-branching, it follows that they are determined for all the objectives we consider [55]. Formally we have the following.  labelled with atomic propositions corresponding to when a player wins or there is a draw in a round of the rock-paper-scissors game.
For the zero-sum objective to maximise the probability of reaching s 1 before s 2 , i.e. player 1 winning a round of the game before player 2, the value of the game is 1/2 and the optimal strategy of each player i is to choose r i , p i and s i , each with probability 1/3 in state s 0 and t i otherwise.
For nonzero-sum CSGs, with an objective X i for each player i, we will use NE, which can be defined as for NFGs (see Definition 4). In line with the definition of zero-sum optimality above (and because the model checking algorithms we will later introduce are based on backward induction [72,60]), we restrict our attention to subgame-perfect NE [63], which are NE in every state of the CSG.
Furthermore, because we use a variety of objectives, including infinite-horizon objectives, where the existence of NE is an open problem [11], we will in some cases use ε-NE, which do exist for any ε > 0 for all the properties we consider.
Definition 11 (Subgame-perfect ε-NE) For CSG G and ε > 0, a strategy profile σ ⋆ is a subgame-perfect ε-Nash equilibrium for objectives X i i∈N if and only if G,s (X i ) − ε for all i ∈ N and s ∈ S.
Example 4. In [14] a non-probabilistic concurrent game is used to model medium access control. Two users with limited energy share a wireless channel and choose to transmit (t i ) or wait (w i ) and, if both transmit, the transmissions fail due to interference. We extend this to a CSG by assuming that transmissions succeed with probability q if both transmit. Figure 2 presents a CSG model of the protocol where each user has enough energy for one transmission. The states are labelled with the status of each user, where the first value represents if the user i has transmitted or not transmitted their message (tr i and nt i respectively) and the second if there is sufficient energy to transmit or not (1 and 0 respectively). If the objectives are to maximise the probability of a successful transmission, there are two subgame-perfect SWNE profiles, one when user 1 waits for user 2 to transmit before transmitting and another when user 2 waits for user 1 to transmit before transmitting. Under both profiles, both users successfully transmit with probability 1. If the objectives are to maximise the probability of being one of the first to transmit, then there is only one SWNE profile corresponding to both users immediately trying to transmit. In this case the probability of each user successfully transmitting is q.

Property Specification: Extending the Logic rPATL
In order to formalise properties of CSGs, we propose an extension of the logic rPATL, previously defined for zero-sum properties of TSGs [23]. In particular, we add operators to specify nonzero-sum properties, using (social welfare or social cost) Nash equilibria, and provide a semantics for this extended logic on CSGs.
Definition 12 (Extended rPATL syntax) The syntax of our extended version of rPATL is given by the grammar: where a is an atomic proposition, C and C ′ are coalitions of players such that rPATL is a branching-time temporal logic for stochastic games, which combines the probabilistic operator P of PCTL [38], PRISM's reward operator R [44], and the coalition operator C of ATL [5]. The syntax distinguishes between state (φ), path (ψ) and reward (ρ) formulae. State formulae are evaluated over states of a CSG, while path and reward formulae are both evaluated over paths. The core operators from the existing version of rPATL [23] are C P∼q[ ψ ] and C R r ∼x [ ρ ]. A state satisfies a formula C P∼q[ ψ ] if the coalition of players C can ensure that the probability of the path formula ψ being satisfied is ∼q, regardless of the actions of the other players (N\C) in the game. A state satisfies a formula C R r ∼x [ ρ ] if the players in C can ensure that the expected value of the reward formula ρ for reward structure r is ∼x, whatever the other players do. Such properties are inherently zero-sum in nature as one coalition tries to maximise an objective (e.g., the probability of ψ) and the other tries to minimise it; hence, we call these zero-sum formulae.
The most significant extension we make to the rPATL logic is the addition of nonzero-sum formulae. These take the form C:C ′ opt∼x(θ), where C and C ′ are two coalitions that represent a partition of the set of players N , and θ is the sum of either two probabilistic or two reward objectives. Their meaning is as follows: -C:C ′ max∼x(θ) is satisfied if there exists a subgame-perfect SWNE profile between coalitions C and C ′ under which the sum of the objectives of C and C ′ in θ is ∼x; -C:C ′ min∼x (θ) is satisfied if there exists a subgame-perfect SCNE profile between coalitions C and C ′ under which the sum of the objectives of C and C ′ in θ is ∼x.
Like the existing zero-sum formulae, the new nonzero-sum formulae still split the players into just two coalitions, C and C ′ = N \C. This means that the model checking algorithm (see Section 4) reduces to finding equilibria in two-player CSGs, which is more tractable than for larger numbers of players. Technically, therefore, we could remove the second coalition C ′ from the syntax. However, we retain it for clarity about which coalition corresponds to each of the two objectives, and to allow a later extension to more than two coalitions [47].
Both types of formula, zero-sum and nonzero-sum, are composed of path (ψ) and reward (ρ) formulae, used in probabilistic and reward objectives included within P and R operators, respectively. For path formulae, we follow the existing rPATL syntax from [23] and allow next (X φ), bounded until (φ U k φ) and unbounded until (φ U φ). We also allow the usual equivalences such as F φ ≡ true U φ (i.e., probabilistic reachability) and F k φ ≡ true U k φ (i.e., bounded probabilistic reachability).
For reward formulae, we introduce some differences with respect to [23]. We allow instantaneous (state) reward at the kth step (instantaneous reward I =k ), reward accumulated over k steps (bounded cumulative reward C k ), and reward accumulated until a formula φ is satisfied (expected reachability F φ). The first two, adapted from the property specification language of PRISM [44], were not previously included in rPATL, but proved to be useful for the case studies we present later in Section 7.2. For the third (F φ), [23] defines several variants, which differ in the treatment of paths that never reach a state satisfying φ. We restrict our attention to the most commonly used one, which is the default in PRISM, where paths that never satisfy φ have infinite reward. In the case of zero-sum formulae, adding the additional variants is straightforward based on the algorithm of [23]. On the other hand, for nonzero-sum formulae, currently no algorithms exist for these variants.
As for other probabilistic temporal logics, it is useful to consider numerical queries, which represent the value of an objective, rather than checking whether it is above or below some threshold. In the case of zero-sum formulae, these take the form C P min=? Example 5. Consider a scenario in which two robots (rbt 1 and rbt 2 ) move concurrently over a square grid of cells, where each is trying to reach their individual goal location. Each step of the robot involves transitioning to an adjacent cell, possibly stochastically. Examples of zero-sum formulae, where crash, goal 1 , goal 2 denote the obvious atomic propositions labelling states, include: asks what is the maximum probability with which the first robot can ensure that it reaches its goal location within 10 steps and without crashing, no matter how the second robot behaves; states that, no matter the behaviour of the first robot, the second robot can ensure the expected number of times it crashes before reaching its goal is less than or equal to 1.5 (r crash is a reward structure that assigns 1 to states labelled crash and 0 to all other states).
Examples of nonzero-sum formulae include: rbt 1 :rbt 2 max 2 (P[ F goal 1 ]+P[ ¬crash U 10 goal 2 ]) states the robots can collaborate so that both reach their goal with probability 1, with the additional condition that the second has to reach its goal within 10 steps without crashing; ) asks what is the sum of expected reachability values when the robots collaborate and each wants to minimise their expected steps to reach their goal (r steps is a reward structure that assigns 1 to all state and action tuple pairs).
Examples of more complex nested formulae for this scenario include the following, where r steps is as above: ] asks what is the maximum probability with which the first robot can get to a state where the expected time for the second robot to reach their goal is at least 10 steps; states the robots can collaborate to reach, with probability at least 0.75, a state where the sum of the expected time for the robots to reach their goals is at most 5.
Before giving the semantics of the logic, we define coalition games which, for a CSG G and coalition (set of players) C ⊆ N , reduce G to a two-player CSG G C , with one player representing C and the other N \C. Without loss of generality we assume the coalition of players is of the form C = {1, . . . , n ′ }.
is a two-player CSG where: a C 1 = (a 1 , . . . , am) ∈ ∆ C (s) if and only if either ∆(s) ∩ A j = ∅ and a j = ⊥ or a j ∈ ∆(s) for all 1 j m and a C 2 = (a m+1 , . . . , an) ∈ ∆ C (s) if and only if either ∆(s) ∩ A j = ∅ and a j = ⊥ or a j ∈ ∆(s) for all m + 1 j n for s ∈ S; -for any s ∈ S, a C Furthermore, for a reward structure r = (r A , r S ) of G, by abuse of notation we also use r for the corresponding reward structure r = (r C Our logic includes both finite-horizon (X , U k , I =k , C k ) and infinite-horizon (U, F) temporal operators. For the latter, the existence of SWNE or SCNE profiles is an open problem [11], but we can check for ε-SWNE or ε-SCNE profiles for any ε. Hence, we define the semantics of the logic in the context of a particular ε.
Definition 14 (Extended rPATL semantics) For a CSG G, ε > 0 and a formula φ in our rPATL extension, we define the satisfaction relation |= inductively over the structure of φ. The propositional logic fragment (true, a, ¬, ∧) is defined in the usual way. For a zero-sum formula and state s ∈ S of CSG G, we have: For a nonzero-sum formula and state s ∈ S of CSG G, we have: , and path π ∈ IPaths G C ,s : For a temporal formula and path π ∈ IPaths G C ,s : For a reward structure r, reward formula and path π ∈ IPaths G C ,s : Using the notation above, we can also define the numerical queries mentioned previously. For example, for state s we have: As the zero-sum objectives appearing in the logic are either finite-horizon or infinite-horizon and correspond to either probabilistic until or expected reachability formulae, we have that CSGs are determined (see Definition 9) with respect to these objectives [55], and therefore values exist. More precisely, for any CSG G, coalition C, state s, path formula ψ, reward structure r and reward formula ρ, the values val G C (s, X ψ ) and val G C (s, X r,ρ ) of the game G C in state s with respect to the objectives X ψ and X r,ρ are well defined. This determinacy result also yields the following equivalences: Also, as for other probabilistic temporal logics, we can represent negated path formulae by inverting the probability threshold, e.g.: , notably allowing the 'globally' operator G φ ≡ ¬(F ¬φ) to be defined.

Model Checking for Extended rPATL against CSGs
We now present model checking algorithms for the extended rPATL logic, introduced in the previous section, on a CSG G. Since rPATL is a branching-time logic, this works by recursively computing the set Sat (φ) of states satisfying formula φ over the structure of φ, as is done for rPATL on TSGs [23].
, this reduces to computing values for a two-player CSG (either G C or G N \C ) with respect to X ψ or X r,ρ . In particular, for ∼ ∈ { , >} and s ∈ S we have: and, since CSGs are determined for the zero-sum properties we consider, for ∼ ∈ {<, } we have: Without loss of generality, for such formulae we focus on computing val G C (s, X ψ ) and val G C (s, X r,ρ ) and, to simplify the presentation, we denote these values by V G C (s, ψ) and V G C (s, r, ρ) respectively.
If, on the other hand, φ is a nonzero-sum formula of the form C:C ′ opt∼x(θ) then, from the semantics for C:C ′ opt∼x(θ) (see Definition 14), computing Sat (φ) reduces to the computation of subgame-perfect SWNE or SCNE values for the objectives (X θ 1 , X θ 2 ) and a comparison of their sum to the threshold x. Again, to simplify the presentation, will use the notation V G C (s, θ) for the SWNE values of the objectives (X θ For the remainder of this section, we fix a CSG G = (N, S,S, A, ∆, δ, AP , L) and coalition C of players and assume that the available actions of players 1 and 2 of the (two-player) CSG G C in a state s are {a 1 , . . . , a l } and {b 1 , . . . , bm}, respectively. We also fix a value ε > 0 which, as discussed in Section 3, is needed to define the semantics of our logic, in particular for infinite-horizon objectives where we need to consider ε-SWNE profiles.
Assumptions. Our model checking algorithms require several assumptions on CSGs, depending on the operators that appear in the formula φ. These can all be checked using standard graph algorithms [2]. In the diverse set of model checking case studies that we later present in Section 7.2, these assumptions have not limited the practical applicability of our model checking algorithms.
For zero-sum formulae, the only restriction is for infinite-horizon reward properties on CSGs with both positive and negative reward values.
Assumption 1 For a zero-sum formula of the form C R r ∼x [ F φ ], from any state s where r S (s) < 0 or r A (s, a) < 0 for some action a, under all profiles of G, with probability 1 we reach either a state satisfying φ or a state where all rewards are zero and which cannot be left with probability 1 under all profiles.
Without this assumption, the values computed during value iteration can oscillate, and therefore fail to converge (see Appendix A). This restriction is not applied in the existing rPATL model checking algorithms for TSGs [23] since that work assumes that all rewards are non-negative.
The remaining two assumptions concern nonzero-sum formulae that contain infinite-horizon objectives. We restrict our attention to a class of CSGs that can be seen as a variant of stopping games [24], as used for multi-objective TSGs. Compared to [24], we use a weaker, objective-dependent assumption, which ensures that, under all profiles, with probability 1, eventually the outcome of each player's objective does not change by continuing. Like for Assumption 1, without this restriction, value iteration may not converge since values can oscillate (see Appendices B and C). Notice that Assumption 1 is not required for nonzero-sum properties containing negative rewards since Assumption 3 is itself a stronger restriction.

Model Checking Zero-Sum Properties
In this section, we present algorithms for zero-sum properties, i.e., for computing the values V G C (s, ψ) or V G C (s, r, ρ) for path formulae ψ or reward formulae ρ in all states s of G C . We split the presentation into finite-horizon properties, which can be solved exactly using backward induction [72,60], and infinite-horizon properties, for which we approximate values using value iteration [70,19]. Both cases require the solution of matrix games, for which we rely on the linear programming approach presented in Section 2.1.1.

Computing the Values of Zero-Sum Finite-Horizon Formulae
Finite-horizon properties are defined over a bounded number of steps: the next or bounded until operators for probabilistic formulae, and the instantaneous or bounded cumulative reward operators. Computation of the values V G C (s, ψ) or V G C (s, r, ρ) for these is done recursively, based on the step bound, using backward induction and solving matrix games in each state at each iteration. The actions of each matrix game correspond to the actions available in that state; the utilities are constructed from the transition probabilities δ C of the game G C , the reward structure r (in the case of reward formulae) and the values already computed recursively for successor states.
Next. This is the simplest operator, over just one step, and so in fact requires no recursion, just solution of a matrix game for each state. If ψ = X φ, then for any state s we have that V G C (s, ψ) = val (Z) where Z ∈ Q l×m is the matrix game with: Bounded Until. If ψ = φ 1 U k φ 2 , we compute the values for the path formulae ψn = φ 1 U n φ 2 for 0 n k recursively. For any state s: where val (Z) equals the value of the matrix game Z ∈ Q l×m with: Instantaneous Rewards. If ρ = I =k , then for the reward structure r we compute the values for the reward formulae ρn = I =n for 0 n k recursively. For any state s: where val (Z) equals the value of the matrix game Z ∈ Q l×m with: Bounded Cumulative Rewards. If ρ = C k , then for the reward structure r we compute the values for the reward formulae ρn = C n for 0 n k recursively. For any state s: where val (Z) equals the value of the matrix game Z ∈ Q l×m with:

Computing the Values of Zero-Sum Infinite Horizon Formulae
We now discuss how to compute the values V G C (s, ψ) and V G C (s, r, ρ) for infinitehorizon properties, i.e., when the path formula ψ is an until operator, or for the expected reachability variant of the reward formulae ρ. In both cases, we approximate these values using value iteration, adopting a similar recursive computation to the finite-horizon cases above, solving matrix games in each state and at each iteration, which converges in the limit to the desired values.
Following the approach typically taken in probabilistic model checking tools to implement value iteration, we estimate convergence of the iterative computation by checking the maximum relative difference between successive iterations. However, it is known [36] that, even for simpler probabilistic models such as MDPs, this convergence criterion cannot be used to guarantee that the final computed values are accurate to within a specified error bound. Alternative approaches that resolve this by computing lower and upper bounds for each state have been proposed for MDPs (e.g. [36,13]) and extended to both single-and multi-objective solution of TSGs [42,7]; extensions could be investigated for CSGs. Another possibility is to use policy iteration (see, e.g., [18]).
Until. If ψ = φ 1 U φ 2 , the probability values can be approximated through value iteration using the fact that for increasingly large k and estimate convergence as described above, based on the difference between values in successive iterations. However, we can potentially speed up convergence by first precomputing the set of states S ψ 0 for which the value of the zero-sum objective X ψ is 0 and the set of states S ψ 1 for which the value is 1 using standard graph algorithms [2]. We can then apply value iteration to approximate where val (Z) equals the value of the matrix game Z ∈ Q l×m with: Expected Reachability. If ρ = F φ and the reward structure is r, then we first make all states of G C satisfying φ absorbing, i.e., we remove all outgoing transitions from such states. Second, we find the set of states S ρ ∞ for which the reward is infinite; as in [23], this involves finding the set of states satisfying the formula C P <1 [ F φ ] and we can use the graph algorithms of [2] to find these states. Again following [23], to deal with zero-reward cycles we need to use value iteration to compute a greatest fixed point. This involves first computing upper bounds on the actual values, by changing all zero reward values to some value γ > 0 to construct the reward structure rγ = (r γ A , r γ A ) and then applying value iteration to where val (Z) equals the value of the matrix game Z ∈ Q l×m with: Finally, using these upper bounds as the initial values we again perform value iteration as above, except now using the original reward structure r, i.e., to approximate V G C (s, r,ρ) = lim k→∞ V G C (s, r, ρ k ). The choice of γ can influence value iteration computations in opposing ways: increasing γ can speed up convergence when computing over-approximations, while potentially slowing it down when computing the actual values.

Model Checking Nonzero-Sum Properties
Next, we show how to compute subgame-perfect SWNE and SCNE values for the two objectives corresponding to a nonzero-sum formula. As for the zero-sum case, the approach taken depends on whether the formula contains finite-horizon or infinite-horizon objectives. We now have three cases: 1. when both objectives are finite-horizon, we use backward induction [72,60] to compute (precise) subgame-perfect SWNE and SCNE values; 2. when both objectives are infinite-horizon, we use value iteration [70,19] to approximate the values; 3. when there is a mix of the two types of objectives, we convert the problem to two infinite-horizon objectives on an augmented model. In a similar style to the algorithms for zero-sum properties, in all three cases the computation is an iterative process that analyses a two-player game for each state at each step. However, this now requires finding SWNE or SCNE values of a bimatrix game, rather than solving a matrix game as in the zero-sum case. We solve bimatrix games using the approach presented in Section 2.1.2 (see also the more detailed discussion of its implementation in Section 6.2).
Another important aspect of our algorithms is that, for efficiency, if we reach a state where the value of one player's objective cannot change (e.g., the goal of that player is reached or can no longer be reached), then we switch to the simpler problem of solving an MDP to find the optimal value for the other player in that state. This is possible since the only SWNE profile in that state corresponds to maximising the objective of the other player. More precisely: the first player (whose objective cannot change) is indifferent, since its value will not be affected by the choices of either player; the second player cannot do better than the optimal value of its objective in the corresponding MDP where both players collaborate; -for any NE profile, the value of the first player is fixed and the value of the second is less than or equal to the optimal value of its objective in the MDP.
We use the notation P max G,s (ψ) and R max G,s (r, ρ) for the maximum probability of satisfying the path formula ψ and the maximum expected reward for the random variable rew (r, ρ), respectively, when the players collaborate in state s. These values can be computed through standard MDP model checking [10,1].

Computing SWNE Values of Finite-Horizon Nonzero-Sum Formulae
As for the zero-sum case, for a finite-horizon nonzero-sum formula θ, we compute the SWNE values V G C (s, θ) for all states s of G C in a recursive fashion based on the step bound. We now solve bimatrix games at each step, which are defined in a similar manner to the matrix games for zero-sum properties: the actions of each bimatrix game correspond to the actions available in that state and the utilities are constructed from the transition probabilities δ C of the game G C , the reward structure (in the case of reward formulae) and the values already computed recursively for successor states.
For any state formula φ and state s we let η φ (s) equal 1 if s ∈ Sat (φ) and 0 otherwise. Recall that probability and reward values of the form P max G,s (ψ) and R max G,s (r, ρ), respectively, are computed through standard MDP verification. Below, we explain the computation for both types of finite-horizon probabilistic objectives (next and bounded until) and reward objectives (instantaneous and bounded cumulative), as well as combinations of each type.
Again, since next is a 1-step property, no recursion is required.
, we compute SWNE values for the objectives for the nonzero-sum formulae θ n+n1,n+n2 = P[ φ 1 ] for 0 n k recursively, where k = min{k 1 , k 2 }, n 1 = k 1 −k and n 2 = k 2 −k. In this case, there are three situations in which the value of the objective of one of the players cannot change, and hence we can switch to MDP verification. The first is when the step bound is zero for only one of the corresponding objectives, the second is when a state satisfying φ i 2 is reached by only one player i (and therefore the objective is satisfied by that state) and the third is when a state satisfying ¬φ i 1 ∧ ¬φ i 2 is reached by only one player i (and therefore the objective is not satisfied by that state). For any state s, if n = 0, then: On the other hand, if n > 0, then: Next and Bounded Until.
In this case, since the value for objectives corresponding to next formulae cannot change after the first step, we can always switch to MDP verification after this step. The symmetric case is similar.
, we compute SWNE values of the objectives for the nonzero-sum formulae θ n+n1,n+n2 = R r1 [ I =n+n1 ] + R r2 [ I =n+n2 ] for 0 n k recursively, where k = min{k 1 , k 2 }, n 1 = k 1 −k and n 2 = k 2 −k. Here, there is only one situation in which the value of the objective of one of the players cannot change: when one of the step bounds equals zero. Hence, this is the only time we switch to MDP verification. For any state s, if n = 0, then: On the other hand, if n > 0, then V G C (s, θ n+n1,n+n2 ) equals SWNE values of the bimatrix game (Z 1 , Z 2 ) ∈ Q l×m where: we compute values of the objectives for the formulae θ n+n1,n+n2 = R r1 [ C n+n1 ] + R r2 [ C n+n2 ] for 0 n k recursively, where k = min{k 1 , k 2 }, n 1 = k 1 −k and n 2 = k 2 −k. As for instantaneous rewards, the only time we can switch to MDP verification is when one of the step bounds equals zero. For state s, if n = 0: Bounded Instantaneous and Cumulative Rewards.

Computing SWNE Values of Infinite-Horizon Nonzero-Sum Formulae
We next show how to compute SWNE values V G C (s, θ) for infinite-horizon nonzerosum formulae θ in all states s of G C . As for the zero-sum case, we approximate these using a value iteration approach. Each step of this computation is similar in nature to the algorithms in the previous section, where a bimatrix game is solved for each state, and a reduction to solving an MDP is used after one of the player's objective can no longer change.
A key aspect of the value iteration algorithm is that, while the SWNE (or SCNE) values take the form of a pair, with one value for each player, convergence is defined over the sum of the two values. This is because there is not necessarily a unique pair of such values, but the maximum (or minimum) of the sum of NE values is uniquely defined. Convergence of value iteration is estimated in the same way as for the zero-sum computation (see Section 4.1.2), by comparing values in successive iterations. As previously, this means that we are not able to guarantee that the computed values are within a particular error bound of the exact values.
Below, we give the algorithms for the cases of two infinite-horizon objectives. The notation used is as in the previous section: for any state formula φ and state s we let η φ (s) equal 1 if s ∈ Sat (φ) and 0 otherwise; and values of the form P max G,s (ψ) and R max G,s (r, ρ) are computed through standard MDP verification.
, values for any state s can be computed through value iteration as the limit V G C (s, θ) = limn→∞ V G C (s, θ, n) where: and (v s ′ ,1 n−1 , v s ′ ,2 n−1 ) = V G C (s ′ , θ, n−1) for all s ′ ∈ S. As can be seen, there are two situations in which we switch to MDP verification. These correspond to the two cases where the value of the objective of one of the players cannot change: when a state satisfying φ i 2 is reached for only one player i (and therefore the objective is satisfied by that state) and when a state satisfying ¬φ i 1 ∧ ¬φ i 2 is reached for only one player i (and therefore the objective is not satisfied by that state).
, values can be computed through value iteration as the limit V G C (s, θ) = limn→∞ V G C (s, θ, n) where: where val (Z 1 , Z 2 ) equals SWNE values of the bimatrix game (Z 1 , Z 2 ) ∈ Q l×m : for all s ′ ∈ S. In this case, the only situation in which the value of the objective of one of the players cannot change is when only one of their goals is reached, i.e., when a state satisfying φ i is reached for only one player i. This is therefore the only time we switch to MDP verification.

Computing SWNE Values of Mixed Nonzero-Sum Formulae
We now present the algorithms for computing SWNE values of nonzero-sum formula containing a mixture of both finite-and infinite-horizon objectives. This is achieved by finding values for a sum of two modified (infinite-horizon) objectives θ ′ on a modified game G ′ using the algorithms presented in Section 4.2.2. This approach is based on the standard construction for converting the verification of finite-horizon properties to infinite-horizon properties [68]. We consider the cases when the first objective is finite-horizon and second infinite-horizon; the symmetric cases follow similarly. In each case, the modified game has states of the form (s, n), where s is a state of G C , n ∈ N and the SWNE values V G C (s, θ) are given by the SWNE values V G ′ ((s, 0), θ ′ ).
Next and Unbounded Until.

and compute the SWNE values of
Bounded Cumulative and Expected Rewards.

Computing SCNE Values of Nonzero-Sum Formulae
The case for SCNE values follows similarly to the SWNE case using backward induction for finite-horizon properties and value iteration for infinite-horizon properties. There are two differences in the computation. First, when solving MDPs, we find the minimum probability of satisfying path formulae and the minimum expected reward for reward formulae. Second, when solving the bimatrix games constructed during backward induction and value iteration, we find SCNE rather than SWNE values; this is achieved through Lemma 1. More precisely, we negate all the utilities in the game, find the SWNE values of this modified game, then negate these values to obtain SCNE values of the original bimatrix game.

Strategy Synthesis
In addition to verifying formulae in our extension of rPATL, it is typically also very useful to perform strategy synthesis, i.e., to construct a witness to the satisfaction of a property. For each zero-sum formula C P∼q[ ψ ] or C R r ∼x [ ρ ] appearing as a sub-formula, this comprises optimal strategies for the players in coalition C (or, equivalently, for player 1 in the coalition game G C ) for the objective X ψ or X r,ρ . For each nonzero-sum formula C:C ′ opt∼x(θ) appearing as a sub-formula, this is a subgame-perfect SWNE/SCNE profile for the objectives (X θ 1 , X θ 2 ) in the coalition game G C .
We can perform strategy synthesis by adapting the model checking algorithms described in the previous sections which computes the values of zero-sum objectives and SWNE or SCNE values of nonzero-sum objectives. The type of strategy needed (deterministic or randomised; memoryless or finite-memory) depends on the types of objectives. As discussed previously (in Sections 4.2.2 and 4.1.2), for infinitehorizon objectives our use of value iteration means we cannot guarantee that the values computed are within a particular error bound of the actual values; so, the same will be true of the optimal strategy that we synthesise for such a formula.
Zero-sum properties. For zero-sum formulae, all strategies synthesised are randomised; this is in contrast to checking the equivalent properties against TSGs [23], where deterministic strategies are sufficient. For infinite-horizon objectives, we synthesise memoryless strategies, i.e., a distribution over actions for each state of the game. For finite-horizon objectives, strategies are finite-memory, with a separate distribution required for each state and each time step.
For both types of objectives, we synthesise the strategies whilst computing values using the approach presented in Section 4.1: from the matrix game solved for each state, we extract not just the value of the game, but also an optimal (randomised) strategy for player 1 of G C in that state. It is also possible to extract the optimal strategy for player 2 in the state by solving the dual LP problem for the matrix game (see Section 2.1.1). For finite-horizon objectives, we retain the choices for all steps; for infinite-horizon objectives, just those from the final step of value iteration are needed.
Nonzero-sum properties. In the case of a nonzero-sum formula, randomisation is again needed for all types of objectives. Similarly to zero-sum formulae above, strategies are generated whilst computing SWNE or SCNE values, using the algorithms presented in Section 4.2. Now, we do this in two distinct ways: -when solving bimatrix games in each state, we also extract an SWNE/SCNE profile, comprising the distributions over actions for each player of G C in that state; -when solving MDPs, we also synthesise an optimal strategy for the MDP [49], which is equivalent to a strategy profile for G C (in fact, randomisation is not needed for this part).
The final synthesised profile is then constructed by initially following the ones generated when solving bimatrix games, and then switching to the MDP strategies if we reach a state where the value of one player's objective cannot change. This means that all strategies synthesised for nonzero-sum formulae may need memory. As for the zero-sum case, finite-horizon strategies are finite-memory since separate player choices are stored for each state and each time step. But, in addition, for both finite-and infinite-horizon objectives, one bit of memory is required to record that a switch is made to the strategy extracted when solving the MDP.

Complexity
Due to its overall recursive nature, the complexity of our model checking algorithms are linear in the size of the formula φ. In terms of the problems solved for each subformula, finding zero-sum values of a 2-player CSG is PSPACE [20] and finding subgame-perfect NE for reachability objectives of a 2-player CSG is PSPACE-complete [15]. In practice, our algorithms are iterative, so the complexity depends on the number of iterations required, the number of states in the CSG and the problems solved for each state and in each step. For finite-horizon objectives, the number of iterations is equal to the stepbound in the formula. For infinite-horizon objectives, the number of iterations depends on the convergence criterion used. For zero-sum properties, an exponential lower bound has been shown for the worst-case number of iterations required for a non-trivial approximation [37]. We report on efficiency in practice in Section 7.1.
In the case of zero-sum properties, for each state, at each iteration, we need to solve an LP problem of size |A|. Such problems can be solved using the simplex algorithm, which is PSPACE-complete [29], but performs well on average [75]. Alternatively, Karmarkar's algorithm [41] could be used, which is PTIME.
For nonzero-sum properties, in each state, at each iteration, we need to find all solutions to an LCP problem of size |A|. Papadimitriou established the complexity of solving the class of LCPs we encounter to be in PPAD (polynomial parity argument in a directed graph) [65] and, to the best of our knowledge, there is still no polynomial algorithm for solving such problems. More closely related to finding all solutions, it has been shown that determining if there exists an equilibrium in a bimatrix game for which each player obtains a utility of a given bound is NP-complete [32]. Also, it is demonstrated in [8] that bimatrix games may have a number of NE that is exponential with respect to the size of the game, and thus any method that relies on finding all NE in the worst case cannot be expected to perform in a running time that is polynomial with respect to the size of the game.

Correctness of the Model Checking Algorithms
The overall (recursive) approach and the reduction to solution of a two-player game is essentially the same as for TSGs [23], and therefore the same correctness arguments apply. In the case of zero-sum formulae, the correctness of value iteration for infinite-horizon properties follows from [70] and for finite-horizon properties from Definition 14 and the solution of matrix games (see Section 2). Below, we show the correctness of the model checking algorithms for nonzero-sum formulae.

Nonzero-Sum Formulae
We fix a game G and a nonzero-sum formula C:C ′ opt∼x(θ). For the case of finitehorizon nonzero-sum formulae, the correctness of the model checking algorithms follows from the fact that we use backward induction [72,60]. For infinite-horizon nonzero-sum formulae, the proof is based on showing that the values computed during value iteration correspond to subgame-perfect SWNE values of finite game trees, and the values of these game trees converge uniformly and are bounded from above by the actual values of G C .
The fact that we use MDP model checking when the goal of one of the players is reached means that the values computed during value iteration are not finite approximations for the values of G C . Therefore we must also show that the values computed during value iteration are bounded from below by finite approximations for the values of G C . We first consider the case when both the objectives in the sum θ are infinite-horizon objectives. Below we assume opt = max and the case when opt = min follow similarly. For any ( The following lemma follows by definition of subgame-perfect SWNE values.   2 ) (0, 0); -if s |= φ 1 ∧ ¬φ 2 , then (0, R max G,s (r 2 , F φ 2 )) are the unique subgame-perfect SWNE values for state s and (v σ,s , 0) are the unique subgame-perfect SWNE values for state s and (v σ,s Next we require the following objectives of G C .
Definition 15 For any sum of two probabilistic or reward objectives θ, 1 i 2 and n ∈ N, let X θ i,n be the objective where for any path π of G C : The following lemma demonstrates that, for a fixed strategy profile and state, the values of these objectives are non-decreasing and converge uniformly to the values of θ.
Lemma 3 For any sum of two probabilistic or reward objectives θ and ε > 0, there exists N ∈ N such that, for any n N , s ∈ S, σ ∈ Σ 1 G C ×Σ 2 G C and 1 i 2 : Proof Consider any sum of two probabilistic or reward objectives θ, state s and 1 i 2. Using Assumption 3 we have that, for subformulae R r [ F φ i ], the set Sat (φ i ) is reached with probability 1 from all states of G under all profiles, and therefore E σ G C ,s (X θ i ) is finite. Furthermore, for any n N , by Definitions 14 and 15 we have that E σ G C ,s (X θ i,n ) is the value of state s for the nth iteration of value iteration [19] when computing E σ G C ,s (X θ i ) in the DTMC obtained from G C by following the strategy σ, and the sequence is both non-decreasing and converges. The fact that we can choose an N independent of the strategy profile for uniform convergence follows from Assumptions 2 and 3.

⊓ ⊔
In the proof of correctness we will use the fact that n iterations of value iteration is equivalent to performing backward induction on the following game trees.
Definition 16 For any state s and n ∈ N, let G C n,s be the game tree corresponding to playing G C for n steps when starting from state s and then terminating.
We can map any strategy profile σ of G C to a strategy profile of G C n,s by only considering the choices of the profile over the first n steps when starting from state s. This mapping is clearly surjective, i.e., we can generate all profiles of G C n,s , but is not injective. We also need the following objectives corresponding to the values computed during value iteration for the game trees of Definition 16.
Definition 17 For any sum of two probabilistic or reward objectives θ, s ∈ S, n ∈ N, 1 i 2 and j = i+1 mod 2, let Y θ i be the objective where, for any path π of G C n,s : Similarly to Lemma 3, the lemma below demonstrates, for a fixed strategy profile and state s of G C , that the values for the objectives given in Definition 17 when played on the game trees G C n,s are non-decreasing and converge uniformly. As with Lemma 3 the result follows from Assumptions 2 and 3.
Lemma 4 For any sum of two probabilistic or reward objectives θ and ε > 0, there exists N ∈ N such that for any m n N , σ ∈ Σ 1 G C ×Σ 2 G C , s ∈ S and 1 i 2 : We require the following lemma relating the values of the objectives X θ i,n , Y θ i and X θ i for 1 i 2.
Lemma 5 For any sum of two probabilistic or reward objectives θ, state s of G C , strategy profile σ such that when one of the targets of the objectives of θ is reached, the profile then collaborates to maximise the value of the other objective, n ∈ N and 1 i 2 : Proof Consider any strategy profile σ, n ∈ N and 1 i 2. By Definitions 15 and 17 it follows that: Furthermore, if we restrict the profile σ such that, when one of the targets of the objectives of θ is reached, the profile then collaborates to maximise the value of the other objective, then by Definitions 17 and 14: Combining these results with Lemma 2, we have: as required.

⊓ ⊔
We now define the strategy profiles synthesised during value iteration.
Definition 18 For any n ∈ N and s ∈ S, let σ n,s be the strategy profile generated for the game tree G C n,s (when considering value iteration as backward induction) and σ n,⋆ be the synthesised strategy profile for G C after n iterations.
Before giving the proof of correctness we require the following results.
Lemma 6 For any state s of G C , sum of two probabilistic or reward objectives θ and n ∈ N we have that σ n,s is a subgame-perfect SWNE profile of the CSG G C n,s for the objectives (Y θ1 , Y θ2 ).
Proof The result follows from the fact that value iteration selects SWNE profiles, value iteration corresponds to performing backward induction for the objectives (Y θ1 , Y θ2 ) and backward induction returns a subgame-perfect NE [72,60].

⊓ ⊔
The following proposition demonstrates that value iteration converges and depends on Assumptions 2 and 3. Without this assumption convergence cannot be guaranteed as demonstrated by the counterexamples in Appendix B and Appendix C. Although value iteration converges, unlike value iteration for MDPs or zero-sum games, the generated sequence of values is not necessarily non-decreasing.
Proposition 1 For any sum of two probabilistic or reward objectives θ and state s, the sequence V G C (s, θ, n) n∈N converges.
Proof For any state s and n ∈ N we can consider G C n,s as two-player infinite-action NFGs Nn,s where for 1 i 2: the set of actions of player i equals the set of strategies of player i in G C ; -for the action pair (σ 1 , σ 2 ), the utility function for player i returns E σ The correctness of this construction relies on the mapping of strategy profiles from the game G C to G C n,s being surjective. Using Lemma 4, we have that the sequence Nn,s n∈N of NFGs converges uniformly, and therefore, since V G C (s, θ, n) are subgame-perfect SWNE values of G C n,s (see Lemma 6), the sequence V G C (s, θ, n) n∈N also converges.

⊓ ⊔
A similar convergence result to Proposition 1 has been shown for the simpler case of discounted properties in [30]. Lemma 7 For any ε > 0, there exists N ∈ N such that for any s ∈ S and 1 i 2: Proof Using Lemma 4 and Proposition 1, we can choose N such that the choices of the profile σ n,s agree with those of σ n,⋆ for a sufficient number of steps such that the inequality holds.

⊓ ⊔
Theorem 2 For a given sum of two probabilistic or reward objectives θ and ε > 0, there exists N ∈ N such that for any n N the strategy profile σ n,⋆ is a subgame-perfect ε-SWNE profile of G C and the objectives (X θ1 , X θ2 ).
Proof Consider any ε > 0. From Lemma 7 there exists N 1 ∈ N such that for any s ∈ S and n N 1 : For any m ∈ N and s ∈ S, using Lemma 6 we have that σ m,s is a NE of G C m,s , and therefore for any m ∈ N, s ∈ S and 1 i 2: From Lemma 3 there exists N 2 ∈ N such that for any n N 2 , s ∈ S and 1 i 2: By construction, σ n,⋆ is a profile for which, if one of the targets of the objectives of θ is reached, the profile maximises the value of the objective. We can thus rearrange (7) and apply Lemma 5 to yield for any n N 2 , s ∈ S and 1 i 2: Letting N = max{N 1 , N 2 }, for any n N , s ∈ S and 1 i 2: and hence, since ε > 0, s ∈ S and 1 i 2 were arbitrary, σ n,⋆ is a subgameperfect ε-NE. It remains to show that the strategy profile is a subgame-perfect social welfare optimal ε-NE, which follows from the fact that when solving the bimatrix games during value iteration social welfare optimal NE are returned. ⊓ ⊔ It remains to consider the model checking algorithms for nonzero-sum properties for which the sum of objectives contains both a finite-horizon and an infinitehorizon objective. In this case (see Section 4.2.3), for a given game G C and sum of objectives θ, the algorithms first build a modified game G ′ with states S ′ ⊆ S×N and sum of infinite-horizon objectives θ ′ and then computes SWNE/SCNE values of θ ′ in G ′ . The correctness of these algorithms follows by first showing there exists a bijection between the profiles of G C and G ′ and then that, for any profile σ of G C and σ ′ , the corresponding profile of G ′ under this bijection, we have: for all states s of G C and 1 i 2. This result follows from the fact that in Section 4.2.3 we used a standard construction for converting the verification of finite-horizon properties to infinite-horizon properties.

Implementation and Tool Support
We have implemented support for modelling and automated verification of CSGs in PRISM-games 3.0 [48], which previously only handled TSGs and zero-sum objectives [51]. The PRISM-games tool is available from [80] and the files for the case studies, described in the next section, are available from [81].

Modelling
We extended the PRISM-games modelling language to support specification of CSGs. The language allows multiple parallel components, called modules, operating both asynchronously and synchronously. Each module's state is defined by a number of finite-valued variables, and its behaviour is defined using probabilistic guarded commands of the form [a] g → u, where a is an action label, g is a guard (a predicate over the variables of all modules) and u is a probabilistic state update. If the guard is satisfied then the command is enabled, and the module can (probabilistically) update its variables according to u. The language also allows for the specification of cost or reward structures. These are defined in a similar fashion to the guarded commands, taking the form [a] g : v (for action rewards) and g : v (for state rewards), where a is an action label, g is a guard and v is a real-valued expression over variables.
For CSGs, we assign modules to players and, in every state of the model, each player can choose between the enabled commands of the corresponding modules (or, if no command is enabled, the player idles). In contrast to the usual behaviour of PRISM, where modules synchronise on common actions, in CSGs action labels are distinct for each player and the players move concurrently. To allow the updates of variables to depend on the choices of other players, we extend the language by allowing commands to be labelled with lists of actions [a 1 , . . . , an]. Moreover, updates to variables can be dependent on the new values of other variables being updated in the same concurrent transition, provided there are no cyclic dependencies. This ensures that variables of different players are updated according to a joint probability distribution. Another addition is the possibility of specifying "independent" modules, that is, modules not associated with a specific player, which do not feature nondeterminism and update their own variables when synchronising with other players' actions. Reward definitions are also extended to use action lists, similarly to commands, so that an action reward can depend on the choices taken by multiple players. For further details of the new PRISM-games modelling language, we refer the reader to the tool documentation [80].

Implementation
PRISM-games constructs a CSG from a given model specification and implements the rPATL model checking and strategy synthesis algorithms from Section 4. We extend existing functionality within the tool, such as modelling and property language parsers, the simulator and basic model checking functionality. We build, store and verify CSGs using an extension of PRISM's 'explicit' model checking engine, which is based on sparse matrices and implemented in Java. For strategy synthesis we have included the option to export the generated strategies to a graphical representation using the Dot language [31].
Computing values (and optimal strategies) of matrix games (see Section 2.1.1), as required for zero-sum formulae, is performed using the LPSolve library [54] via linear programming. This library is based on the revised simplex and branch-andbound methods. Computing SWNE or SCNE values (and SWNE or SCNE strategies) of bimatrix games (see Section 2.1.2) for nonzero-sum formulae is performed via labelled polytopes through a reduction to SMT. Currently, we implement this in both Z3 [26] and Yices [28]. As an optimised precomputation step, when possible we also search for and filter out dominated strategies, which speeds up computation and reduces calls to the solver.
Since bimatrix games can have multiple SWNE values, when selecting SWNE values of such games we choose the SWNE values for which the value of player 1 is maximal. In case player 1 is indifferent, i.e., their utility is the same for all pairs, we choose the SWNE values which maximise the value of player 2. If both players are indifferent, an arbitrary pair of SWNE values is selected. Table 1 presents experimental results for the time to solve bimatrix games using the Yices and Z3 solvers, as the numbers of actions of the individual games vary. The table also shows the number of NE in each game N, as found when determining the SWNE values, and also the number of NE in N − , as found when determining the SCNE values (see Lemma 1). These games were generated using  GAMUT (a suite of game generators) [62] and a time-out of 2 hours was used for the experiments. The results show Yices to be the faster implementation and that the difference in solution time grows as the number of actions increases. Therefore, in our experimental results in the next section, all verification runs use the Yices implementation. The results in Table 1 also demonstrate that the solution time for either solver can vary widely and depends on both the number of NE that need to be found and the structure of the game. For example, when solving the dispersion games, the differences in the solution times for SWNE and SCNE seem to correspond to the differences in the number of NE that need to found. On the other hand, there is no such correspondence between the difference in the solution times for the covariant games.
Regarding the complexity of solving bimatrix games, if each player has n actions, then the number of possible assignments to the supports of the strategy profiles (i.e., the action tuples that are chosen with nonzero probability) is (2 n −1) 2 , which therefore grows exponentially with the number of actions, surpassing 4.2 billion when each player has 16 actions. This particularly affects performance in cases where one or both players are indifferent with respect to a given support. More precisely, in such cases, if there is an equilibrium including pure strategies over these supports, then there are also equilibria including mixed strategies over these supports as the indifferent player would get the same utility for any affine combination of pure strategies. Example 6. Consider the following bimatrix game: Since the entries in the rows for the utility matrix for player 1 are the same and the columns are the same for player 2, it is easy to see that both players are indifferent with respect to their actions. As can be seen in Table 2, all (2 2 −1) 2 = 9 possible support assignments lead to an equilibrium.
For the task of computing non-optimal NE values, the large number of supports can be somewhat mitigated by eliminating weakly dominated strategies [61]. However, removing such strategies is not a straightforward task when computing SWNE or SCNE values, since it can lead to the elimination of SWNE or SCNE profiles, and hence also SWNE or SCNE values. For example, if we removed the row corresponding to action a 2 or the column corresponding to action b 1 from the matrices in Example 6 above, then we eliminate a SWNE profile. As the number of actions for each player increases, the number of NE profiles also tends to increase and so does the likelihood of indifference. Naturally, the number of actions also affects the number of variables that have to be allocated, and the number and complexity of assertions passed to the SMT solver. As our method is based on the progressive elimination of support assignments that lead to NE, it takes longer to find SWNE and SCNE values as the number of possible supports grows and further constraints are added each time an equilibrium is found.

Case Studies and Experimental Results
To demonstrate the applicability and benefits of our techniques, and to evaluate their performance, we now present results from a variety of case studies. Supporting material for these examples (models and properties) is available from [81]. These can be run with PRISM-games 3.0 [48].

Efficiency and Scalability
We begin by presenting a selection of results illustrating the performance of our implementation. The experiments were run on a 2.10 GHz Intel Xeon with 16GB of JVM memory. In Table 3, we present the model statistics for the examples used: the number of players, states, transitions and model construction times (details of the case studies themselves follow in the next section). Due to improvements in the modelling language and the model building procedure, some of the model statistics differ from those presented in [45,46]. The main reason is that the earlier version of the implementation did not allow for variables of different players to be updated following a joint probability distribution, which made it necessary to introduce intermediate states in order to specify some of the behaviour. Also, some model statistics differ from [45] since models were modified to meet Assumptions 2 and 3 to enable the analysis of nonzero-sum properties. Tables 4 and 5 present the model checking statistics when analysing zero-sum and nonzero-sum properties, respectively. In both tables, this includes the maximum and average number of actions of each coalition in the matrix/bimatrix games solved at each step of value iteration and the number of iterations performed. In the case of zero-sum properties including reward formulae of the form F φ, value iteration is performed twice (see Section 4.1.2), and therefore the number of iterations for each stage are presented (and separated by a semi-colon). For zero-sum properties, the timing statistics are divided into the time for qualitative (column 'Qual.') and quantitative verification, which includes solving matrix games (column 'Quant.'). For nonzero-sum properties we divide the timing statistics into the time for CSG verification, which includes solving bimatrix games (column 'CSG'), and the instances of MDP verification (column 'MDP'). In the case of mixed nonzero-sum properties, i.e., properties including both finite and infinite horizon objectives, we must first build a new game (see Section 4.2.3); the statistics for these CSGs (number of players, states and transitions) are presented in Table 6. Finally, Table 7 presents the timing results for three nested properties. Here we give the time required for verifying the inner and outer formula separately, as well as the number of iterations for value iteration at each stage.  Our results demonstrate significant gains in efficiency with respect to those presented for zero-sum properties in [45] and nonzero-sum properties in [46] (for the latter, a direct comparison with the published results is possible since it uses an identical experimental setup). The gains are primarily due to faster SMT solving and reductions in CSG size as a result of modelling improvements, and specifically the removal of intermediate states as discussed above.
The implementation can analyse models with over 3 million states and almost 18 million transitions; all are solved in under 2 hours and most are considerably quicker. The majority of the time is spent solving matrix or bimatrix games, so   To study the benefits of nonzero-sum properties, we compare the results with corresponding zero-sum properties. For example, for a nonzero-sum formula of the form C: , we compute the value and an optimal strategy σ ⋆ C for coalition C of the formula C P max=? [ F φ 1 ], and then find the value of an optimal strategy for the coalition C ′ for P min=? [ F φ 2 ] and P max=? [ F φ 2 ] in the MDP induced by CSG when C follows σ ⋆ C . The aim is to showcase the advantages of cooperation since, in many real-world applications, agents' goals are not strictly opposed and adopting a strategy that assumes antagonistic behaviour can have a negative impact from both individual and collective standpoints.   Table 7: Statistics for verification of nested properties for CSGs.
As will be seen, our results demonstrate that, by using nonzero-sum properties, at least one of the players gains and in almost all cases neither player loses (in the one case study where this is not the case, the gains far outweigh the losses). The individual SWNE/SCNE values for players need not be unique and, for all case studies (except Aloha and medium access in which the players are not symmetric), the values can be swapped to give alternative SWNE/SCNE values.
Finally, we note that, for infinite-horizon nonzero-sum properties, we compute the value of ε for the synthesised ε-NE and find that ε = 0 in all cases.
Robot Coordination. Our first case study concerns a scenario in which two robots move concurrently over a grid of size l×l, briefly discussed in Example 5. The robots start in diagonally opposite corners and try to reach the corner from which the other starts. A robot can move either diagonally, horizontally or vertically towards its goal. Obstacles which hinder the robots as they move from location to location are modelled stochastically according to a parameter q (which we set to 0.25): when a robot moves, there is a probability that it instead moves in an adjacent direction, e.g., if it tries to moves north west, then with probability q/2 it will instead move north and with the same probability west.
We can model this scenario as a two-player CSG, where the players correspond to the robots (rbt 1 and rbt 2 ), the states of the game represent their positions on the grid. In states where a robot has not reached its goal, it can choose between actions Fig. 3: Robot coordination on a 3×3 grid: probabilistic choices for one pair of action choices in the initial state. Solid lines indicate movement in the intended direction, dotted lines where there is deviation due to obstacles.
that move either diagonally, horizontally or vertically towards its goal (under the restriction that it remains in the grid after this move). For i ∈ {1, 2}, we let goal i be the atomic proposition labelling those states of the game in which rbt i has reached its goal and crash the atomic proposition labelling the states in which the robots have crashed, i.e., are in the same grid location. In Figure 3, we present the states that can be reached from the initial state of the game when l = 3, when the robot in the south west corner tries to move north and the robot in the north east corner tries to move south west. As can be seen there are six different outcomes and the probability of the robots crashing is q 2 ·(1−q).
We first investigate the probability of the robots eventually reaching their goals without crashing for different size grids. In the zero-sum case, we find the values for the formula rbt 1 P max=? [ ¬crash U goal 1 ] converge to 1 as l increases; for example, the values for this formula in the initial states of game when l = 5, 10 and 20 are approximately 0.9116, 0.9392 and 0.9581, respectively. On the other hand, in the nonzero-sum case, considering SWNE values for the formula rbt 1 :rbt 2 max=? (P[ ¬crash U goal 1 ]+P[ ¬crash U goal 2 ]) and l 4, we find  that each robot can reach its goal with probability 1 (since time is not an issue, they can collaborate to avoid crashing).
We next consider the probability of the robots reaching their targets without crashing within a bounded number of steps. Figure 4 presents both the value for the (zero-sum) formula rbt 1 P max=? [ ¬crash U k goal 1 ] and SWNE values for the formula rbt 1 :rbt 2 max 2 (P[ ¬crash U k goal 1 ]+P[ ¬crash U k goal 2 ]), for a range of step bounds and grid sizes. When there is only one route to each goal within the bound (along the diagonal), i.e., when k = l−1, in the SWNE profile both robots take this route. In odd grids, there is a high chance of crashing, but also a chance one will deviate and the other reaches its goal. Initially, as the bound k increases, for odd grids the SWNE values for the robots are not equal (see Figure 4 right).
Here, both robots following the diagonal does not yield a NE profile. First, the chance of crashing is high, and therefore the probability of the robots satisfying their objectives is low. Therefore it is advantageous for a robot to switch to a longer route as this will increase the probability of satisfying its objective, even taking into account that there is a greater chance it will run out of steps and changing its route will increase the probability of the other robot satisfying its objective by a greater amount (as the other robot will still be following the diagonal). Dually, both robots taking a longer route is not an NE profile, since if one robot switches to the diagonal route, then the probability of satisfying its objective will increase. It follows that, in a SWNE profile, one robot has to follow the diagonal and the other take a longer route. As expected, if we compare the results, we see that the robots can improve their chances of reaching their goals by collaborating.
The next properties we consider concern the minimum expected number of steps for the robots to reach their goal. In Figure 5 we have plotted the values corresponding to the formula rbt 2 R rsteps min=? [ F goal 2 ] and SCNE values for the individual players for rbt 1 :rbt 2 min=? (R rsteps [ F goal 1 ]+R rsteps [ F goal 2 ]) as the grid size l varies. The results again demonstrate that the players can gain by collaborating.
Futures market investors. This case study is a model of a futures market investor [56], which represents the interactions between investors and a stock market. For the TSG model of [56], in successive months, a single investor chooses whether to invest, next the market decides whether to bar the investor, with the restriction that the investor cannot be barred two months in a row or in the first month, and then the values of shares and a cap on values are updated probabilistically.
We have built and analysed several CSGs variants of the model, analysing optimal strategies for investors under adversarial conditions. First, we made a  single investor and market take their decisions concurrently, and verified that this yielded no additional gain for the investor (see [81]). This is because the market and investor have the same information, and so the market knows when it is optimal for the investor to invest without needing to see its decision. We next modelled two competing investors who simultaneously decide whether to invest (and, as above, the market simultaneously decides which investors to bar). If the two investors cash in their shares in the same month, then their profits are reduced. We also consider several distinct profit models: 'normal market', 'later cash-ins', 'later cash-ins with fluctuation' and 'early cash-ins'. The first is from [56] and the remaining reward models either postponing cashing in shares or the early chasing in of shares (see [81] for details). The CSG has 3 players: one for each investor and one representing the market who decides on the barring of investors. We study both the maximum profit of one investor and the maximum combined profit of both investors. For comparison, we also build a TSG model in which the investors first take turns to decide whether to invest (the ordering decided by the market) and then the market decides on whether to bar any of the investors. Figure 6 shows the maximum expected value over a fixed number of months under the 'normal market' for both the profit of first investor and the combined profit of the two investors. For the former, we show results for the formulae  Figure 7 shows the maximum expected combined profit for the other two profit models.
When investors cooperate to maximise the profit of the first, results for the CSG and TSG models coincide. This follows from the discussion above since all the second investor can do is make sure it does not invest at the same time as the first. For the remaining cases and given sufficient months, there is always a strategy in the concurrent setting that outperforms all turn-based strategies. The increase in profit for a single investor in the CSG model is due to the fact that, as the investors decisions are concurrent, the second cannot ensure it invests at the same time as the first, and hence decreases the profit of the first. In the case of combined profit, the difference arises because, although the market knows when it is optimal for one investor to invest, in the CSG model the market does not know which one will, and therefore may choose the wrong investor to bar.
We performed strategy synthesis to study the optimal actions of investors. By way of example, consider i 1 R    normal market (see Figure 6 left). The optimal TSG strategy for the first investor is to invest in the first month (which the market cannot bar) ensuring an expected profit of 3.75. The optimal (randomised) CSG strategy is to invest: -in the first month with probability ∼0.4949; -in the second month with probability 1, if the second investor has cashed in; -in the second month with probability ∼0.9649, if the second investor did not cash in at the end of the first month and the shares went up; -in the second month with probability ∼0.9540, if the second investor did not cash in at the end of the first month and the shares went down; -in the third month with probability 1 (this is the last month to invest).
Following this strategy, the first investor ensures an expected profit of ∼4. 33. We now make the market probabilistic, where the probability the market bars an individual investor equals pbar , and consider nonzero-sum properties of the form i 1 :i 2 max=? (R profit 1 [ F cashed in 1 ]+R profit 2 [ F cashed in 2 ]), in which each investor tries to maximise their individual profit, for different reward structures. In Figures 8 and 9 we have plotted the results for the investors where the profit models of the investors follow a normal profile and where the profit models of the investors differ ('later cash-ins' for the first investor and 'early cash-ins' for second), when pbar equals 0.1 and 0.5 respectively. The results demonstrate that, given more time and a more predictable market, i.e., when pbar is lower, the players can collaborate to increase their profits.
Performing strategy synthesis, we find that the strategies in the mixed profiles model are for the investor with an 'early cash-ins' profit model to invest as soon as possible, i.e., it tries to invest in the first month and if this fails because it is barred, it will be able to invest in the second. On the other hand, for the investor with the 'later cash-ins' profile, the investor will delay investing until the chances of the shares failing starts to increase or they reach the month before last and then   invest (if the investor is barred in this month, they will be able to invest in the final month).
Trust models for user-centric networks. Trust models for user-centric networks were analysed previously using TSGs in [50]. The analysis considered the impact of different parameters on the effectiveness of cooperation mechanisms between service providers. The providers share information on the measure of trust for users in a reputation-based setting. Each measure of trust is based on the service's previous interactions with the user (which previous services they paid for), and providers use this measure to block or allow the user to obtain services.
In the original TSG model, a single user can either make a request to one of three service providers or buy the service directly by paying maximum price. If the user makes a request to a service provider, then the provider decides to accept or deny the request based on the user's trust measure. If the request was accepted, the provider would next decide on the price again based on the trust measure, and the user would then decide whether to pay for the service and finally the provider would update its trust measure based on whether there was a payment. This sequence of steps would have to take place before any other interactions occurred between the user and other providers. Here we consider CSG models allowing the user to make requests and pay different service providers simultaneously and for the different providers to execute requests concurrently. There are 7 players: one for the user's interaction with each service provider, one for the user buying services directly and one for each of the 3 service providers. Three trust models were considered. In the first, the trust level was decremented by 1 (td = 1) when the user does not pay, decremented by 2 in the second (td = 2) and reset to 0 in the third (td = inf ). Figure 10 presents results for the maximum fraction and number of unpaid services the user can ensure for each trust model, corresponding to the formulae usr R ratio − min=? [ F finished ] and usr R unpaid − min=? [ F finished ] (to prevent not requesting any services and obtaining an infinite reward being the optimal choice of the user, we negate all rewards and find the minimum expected reward the user can ensure). The results for the original TSG model are included as dashed lines. The results demonstrate that the user can take advantage of the fact that in the CSG model it can request multiple services at the same time, and obtain more services without paying before the different providers get a chance to inform each other about nonpayment. In addition, the results show that imposing a more severe penalty on the trust measure for non-payment reduces the number of services the user can obtain without paying.
Aloha. This case study concerns three users trying to send packets using the slotted ALOHA protocol. In a time slot, if a single user tries to send a packet, there is a probability (q) that the packet is sent; as more users try and send, then the probability of success decreases. If sending a packet fails, the number of slots a user waits before resending is set according to an exponential backoff scheme. More precisely, each user maintains a backoff counter which it increases each time there is a failure (up to bmax) and, if the counter equals k, randomly chooses the slots to wait from {0, 1, . . . , 2 k −1}.
We suppose that the three users are each trying to maximise the probability of sending their packet before a deadline D, with users 2 and 3 forming a coalition, which corresponds to the formula usr 1 :usr 2 ,usr 3 max=? P[ F (sent 1 ∧ t D) ]+ P[ F (sent 2 ∧ sent 3 ∧ t D) ]. Figure 11 presents total values as D varies (left) and individual values as q varies (right). Through synthesis, we find the collaboration is dependent on D and q. Given more time there is a greater chance for the users to collaborate by sending in different slots, while if q is large it is unlikely users need to repeatedly send, so again can send in different slots. As the coalition has more messages to send, their probabilities are lower. However, for the scenario with two users, the probabilities of the two users would still be different. In this case, although it is advantageous to initially collaborate and allow one user to try and send its first message, if the sending fails, given there is a bound on the time for the users to send, both users will try to send at this point as this is the best option for their individual goals.
We have also considered when the users try to minimise the expected time before their packets are sent, where users 2 and 3 form a coalition, represented by the formula usr 1 :usr 2 ,usr 3 min=? (R time [ F sent 1 ]+R time [ F (sent 2 ∧ sent 3 ) ]). When synthesising the strategies we see that the players collaborate with the coalition of users 2 and 3, letting user 1 to try and send before sending their messages. However, if user 1 fails to send, then the coalition either lets user 1 try again in case the user can do so immediately, and otherwise the coalition attempts to send their messages.
Finally, we have analysed when the players collaborate to maximise the probability of reaching a state where they can then send their messages with probability 1 within D time units (with users 2 and 3 in coalition), which is represented by the formula usr 1 ,usr 2 ,usr 3 P max=? [F usr 1 :usr 2 ,usr 3 min 2 P[ F (sent 1 ∧ t D) ] + P[ F (sent 2 ∧ sent 3 ∧ t D) ]].
Intrusion detection policies. In [78], CSGs are used to model the interaction between an intrusion detection policy and attacker. The policy has a number of libraries it can use to detect attacks and the attacker has a number of different attacks which can incur different levels of damage if not detected. Furthermore, each library can only detect certain attacks. In the model, in each round the policy chooses a library to deploy and the attacker chooses an attack. A reward structure is specified representing the level of damage when an attack is not detected. The goal is to find optimal intrusion detection policies which correspond to finding a strategy for the policy that minimises damage, represented by synthesising a strategy for the formula policy R damage min=? [ F (r = rounds) ]. We have constructed CSG models with two players (representing the policy and the attacker) for the two scenarios outlined in [78].
Jamming multi-channel radio systems. A CSG model for jamming multi-channel cognitive radio systems is presented in [79]. The system consists of a number of channels (chans), which can be in an occupied or idle state. The state of each channel remains fixed within a time slot and between slots is Markovian (i.e. the state changes randomly based only on the state of the channel in the previous slot). A secondary user has a subset of available channels and at each time-slot must decide which to use. There is a single attacker which again has a subset of available channels and at each time slot decides to send a jamming signal over one of them. The CSG has two players: one representing the secondary user and the other representing the attacker. Through the zero-sum property user P max=? [ F (sent slots/2) ] we find the optimal strategy for the secondary user to maximize the probability that at least half their messages are sent against any possible attack.
Medium Access Control. This case study extends the CSG model from Example 4 to three users and assumes that the probability of a successful transmission is dependent on the number of users that try and send (q 1 = 0.95, q 2 = 0.75 and q 3 = 0.5). The energy of each user is bounded by emax. We suppose the first user acts in isolation and the remaining users form a coalition. The first nonzero-sum property we consider is p 1 :p 2 ,p 3 max=? (R sent1 [ C k ]+R sent 2,3 [ C k ]), which corresponds to each coalition trying to maximise the expected number of messages they send over a bounded number of steps. On the other hand, the second property is p 1 :p 2 ,p 3 max=? (P[ F k (mess 1 = smax) ]+P[ F (mess 2 +mess 3 = 2·smax) ]) and here the coalitions try to maximise the probability of successfully transmitting a certain number of messages (smax for the first user and 2·smax for the coalition of the second and third users), where in addition the first user has to do this in a bounded number of steps (k).
Power Control. Our final case study is based on a model of power control in cellular networks from [14]. In the model, phones emit signals over a cellular network and the signals can be strengthened by increasing the power level up to a bound (pow max ). A stronger signal can improve transmission quality, but uses more energy and lowers the quality of other transmissions due to interference. We extend this model by adding a failure probability (q fail ) when a power level is increased and assume each phone has a limited battery capacity (emax). Based on [14], we associate a reward structure with each phone representing transmission quality dependent both on its power level and that of other phones due to interference. We consider the nonzero-sum property p 1 :p 2 max=? (R r1 [ F (e 1 = 0) ]+R r2 [ F (e 2 = 0) ]), where each user tries to maximise their expected reward before their phone's battery is empty. We have also analysed the property p 1 :p 2 max=? (R r1 [ F (e 1 = 0) ] + R r2 [ C k ]), where the objective of the second user is to instead maximise their expected reward over a bounded number of steps (k).

Conclusions
In this paper, we have designed and implemented an approach for the automatic verification of a large subclass of CSGs. We have extended the temporal logic rPATL to allow for the specification of equilibria-based (nonzero-sum) properties, where two players or coalitions with distinct goals can collaborate. We have then proposed and implemented algorithms for verification and strategy synthesis using this extended logic, including both zero-sum and nonzero-sum properties, in the PRISM-games model checker. In the case of finite-horizon properties the algorithms are exact, while for infinite-horizon they are approximate using value iteration. We have also extended the PRISM-games modelling language, adding new features tailored to CSGs. Finally, we have evaluated the approach on a range of case studies that have demonstrated the benefits of CSG models compared to TSGs and of nonzero-sum properties as a means to synthesise strategies that are collectively more beneficial for all players in a game.
The main challenge in implementing the model checking algorithms is efficiently solving matrix and bimatrix games at each state in each step of value iteration for zero-sum and nonzero-sum properties, respectively, which are non-trivial optimisation problems. For bimatrix games, this furthermore requires finding an optimal equilibrium, which currently relies on iteratively restricting the solution search space. Solution methods can be sensitive to floating-point arithmetic issues, particularly for bimatrix games; arbitrary precision representations may help here to alleviate these problems.
There are a number of directions for future work. First, we plan to consider additional properties such as multi-objective queries. We are also working on ex-tending the implementation to consider alternative solution methods (e.g., policy iteration and using CPLEX [40] to solve matrix games) and a symbolic (binary decision diagram based) implementation and other techniques for Nash equilibria synthesis such as an MILP-based solution using regret minimisation. Lastly, we are considering extending the approach to partially observable strategies, multicoalitional games, building on [47], and mechanism design.

A Convergence of Zero-Sum Reachability Reward Formulae
In this appendix we give a witness to the failure of convergence for value iteration when verifying zero-sum formulae with an infinite horizon reward objective if Assumption 1 does not hold.  Consider the CSG in Figure 12 with players p 1 and p 2 and the zero-sum state formula φ = p 1 , p 2 R r max=? [ F a ], where a is the atomic proposition satisfied only by state t. Clearly, state s 1 does not reach either the target of the formula or an absorbing state with probability 1 under all strategy profiles, while the reward for the state-action pair (s 1 , (a 1 , a 2 )) is negative. Applying the value iteration algorithm of Section 4, we see that the values for state s 1 oscillate between 0 and −1, while the values for state s 2 oscillate between 0 and 1.