Combining quantitative and qualitative reasoning in concurrent multi-player games

We propose a general framework for modelling and formal reasoning about multi-agent systems and, in particular, multi-stage games where both quantitative and qualitative objectives and constraints are involved. Our models enrich concurrent game models with payoffs and guards on actions associated with each state of the model and propose a quantitative extension of the logic ATL∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textsf {ATL}}^{*}$$\end{document} that enables the combination of quantitative and qualitative reasoning. We illustrate the framework with some detailed examples. Finally, we consider the model-checking problems arising in our framework and establish some general undecidability and decidability results for them.


Introduction
Quantitative and qualitative reasoning about agents and multi-agent systems is pervasive in many areas of AI and game theory, including multi-agent planning and intelligent robotics.In particular, the studies of cooperative and non-cooperative multi-player games deal with both aspects of strategic abilities of agents, but usually separately.Quantitative reasoning studies the abilities of agents to achieve quantitative objectives, such as optimizing payoffs (e.g., maximizing rewards or minimizing cost) or, more generally, preferences on outcomes.This tradition comes from game theory and economics and usually studies one-shot 2 Page 2 of 33 normal form games, their (finitely or infinitely) repeated versions, and extensive form games. On the other hand, qualitative reasoning, coming mainly from logic and computer science, is about strategic abilities of players for achieving qualitative objectives: reaching or maintaining states with desired properties, e.g., winning states or safe states, etc.
Put as a slogan, quantitative reasoning is concerned with how players can become maximally rich, or how to pay as little cost as possible, while qualitative reasoning is about how players can achieve a state of 'happiness', e.g.winning, or how to avoid reaching a state of 'unhappiness' (losing) in the game.
The most essential technical difference between qualitative and quantitative objectives is that the former are typically expressed by temporal patterns over Boolean properties of game states on a given play in a finite state space and their verification requires limited memory, whereas the satisfaction of the latter depends on numerical data associated with the history of the play (accumulated utilities) or even with the whole play (average payoffs and their limit, or discounted accumulated utilities) and therefore generally requires larger, or even unbounded memory.It is thus generally computationally more demanding and costly to design or verify strategies satisfying quantitative objectives than qualitative ones.More generally, decision theory and game theory study rational behaviour of players aiming at optimising their performance in accordance with their preferences between outcomes.Preferences can be regarded as both qualitative and quantitative objectives and, if equipped with a suitable mechanism for preference aggregation over a series of outcomes accumulated in the course of the play, then our work presented here -based on quantitative payoffs in naturally ordered numerical domains -can be suitably generalised to that setting.
Often both types of reasoning about multi-agent systems are essential and must be explored interactively.For instance, in multi-agent planning and robotics it is important to achieve the agents' qualitative goals while satisfying various quantitative constraints on time and resource consumption.This motivates the need for developing a modelling framework for combining qualitative and quantitative reasoning, which is the main objective of the present paper.
Our contribution Here we introduce a general framework for combined qualitative and quantitative reasoning, by enriching the arguably most studied models in the qualitative reasoning tradition, viz.concurrent game models, cf.[6,50], with a quantitative dimension as follows.The concurrent game models are multi-agent transition systems where transitions are determined by simultaneous collective actions taken by all players.States are labelled with various atomic propositions describing their important features (e.g., winning state, safe state, etc.) and enabling qualitative reasoning in the system.In the enriched models proposed here agents are associated with accumulating utilities (e.g., resources) and the state transitions determine utility payoffs to each player according to payoff tables for the one-shot normal form games associated with all possible tuples of actions that can be applied at the states.Thus, combination of quantitative game-theoretic reasoning with the qualitative logical reasoning is enabled.The resulting models can also be regarded as multi-stage games, see [35], with additional qualitative objectives.Again, put as a slogan, our framework allows, for instance, reasoning about whether and how a player can reach or maintain a state of 'happiness' while becoming or remaining as rich as desired, or paying an explicitly limited price on the way.
We illustrate the framework with two detailed running examples.The first one is of a more abstract, game-theoretic nature, where two players play an infinite-round combination of 3 well-known normal form games (Prisoners Dilemma, Battle of the Sexes, and Coordination Game) associated with the 3 states of the model, and the transitions between these games are determined by the action profiles applied at each round, while the players accumulate utilities in the process of the plays.The second example is of a more concrete nature, illustrating resource-bounded reasoning by modelling a scenario where a team of 3 robots has to accomplish a certain mission determined by a qualitative objective while satisfying some quantitative resource constraints (maintaining energy levels required for the execution of the required actions) throughout the operation.
To enable combined qualitative and quantitative logical reasoning we introduce a quantitative extension of the logic * , introduced in [6], provide formal semantics for it in concurrent game models enriched with payoffs and guards, and show how it can be used for specifying properties of the running examples combining qualitative and quantitative objectives.We then study the model checking problems arising in our framework and establish some general undecidability and decidability results.
Structure of the paper In the preliminary Sect. 2 we present in detail the purely qualitative concurrent game models and the associated logic for strategic abilities * , as well as basic concepts needed to introduce quantitative constraints in the models and the logical language.In Sect. 3 we present the new modelling framework based on concurrent game models with payoffs and guards and provide detailed examples.In Sect. 4 we introduce multi-agent a quantitative extension * of the logic * provide semantics for it in concurrent game models with payoffs and guards.In Sect. 5 we establish some general decidability and undecidability results for the model-checking problems in fragments of * .We end with a concluding Sect.6 discussing perspectives for further study, followed by a short technical appendix.
Related work As mentioned above, the two traditions-of quantitative and qualitative reasoning-have followed rather separate developments with generally quite different agendas, methods and results.Still, some ideas, approaches and techniques can converge to enable the study of multi-agent systems and games combining features from both.A non-exhaustive overview with inevitably incomplete list of references on the main research developments in the area are listed below.Our framework shares various common or similar conceptual and technical features with some of these works, yet, there are essential differences with each of them, justifying the originality of our framework, which are briefly discussed here.
-Purely qualitative logics of games and multi-agent systems, such as the Coalition logic CL [50], the Alternating time temporal logic * [6], and some extensions and variations of it, incl.[21,42,43,56] etc., formalizing and studying qualitative reasoning in concurrent game models.This is the closest conceptually and technically logic-based framework on which ours builds, by expanding with the quantitative features, based on payoffs and guards.
-Resource-bounded models and logics [1-5, 7, 8, 19, 46, 48], endowing concurrent game models with some quantitative aspects by considering cost of agents' actions and reasoning about what players with bounded resources can achieve.These are both technically quite close and conceptually related to the present work, so we provide further more detailed parallels between them and our framework.-Extensions of qualitative reasoning (e.g., reachability and Büchi objectives) in multiplayer concurrent games with some quantitative aspects by considering a preference preorder on the set of qualitative objectives, see e.g., [14,15], thereby adding payoffmaximizing objectives and thus creating a setting where traditional game-theoretic issues such as game value problems and Nash equilibria become relevant.Our framework is technically related, though richer in its quantitative features and the logical language, and more widely applicable than these.
-Essentially related in spirit to our work are also stochastic games introduced by Shapley in 1953 (see [47,54]) and, in particular, stochastic games with quantitative objectives, such as energy games and discounted and mean-payoff games, see e.g.[51], and in particular the more recent works [58] (on multi-mean-payoff and multi-energy games) and [16] (on average-energy games).Technically, our models build on deterministic analogues of stochastic games and extend them with the qualitative aspects and the associated logical language.-Deterministic or stochastic infinite games on graphs, with qualitative objectives: typically, reachability, and more generally, parity objectives or -regular objectives see e.g.[25,[28][29][30].Our framework is again technically related, but richer in its multi-agent aspects, quantitative features, and the logical language, and with wider modelling scope and potential applications.-Purely quantitative repeated games, much studied in game theory (see e.g., [35,49]), which can be naturally regarded as a quite special case of our framework, viz.onestate concurrent game models with accumulating payoffs paid to each player after every round.-Conceptually different, but technically quite relevant to the purely operational models which our framework generates are studies of counter automata, Petri nets, vector addition systems (VAS, introduced in [44]) also VAS extended with states (VASS) [11,12,38,41], etc. -essentially a study of the purely quantitative single-agent case of concurrent game models (see e.g.[11,32]), where only accumulated utilities but no qualitative objectives are taken into account and a typical problem is to decide reachability from a given initial payoff configuration of payoff configurations satisfying formally specified arithmetic constraints.More recently, two-player games (between controller and environment) on VAS and VASS have been studied, e.g. in [9,17].-There have also been several recent threads of research on combinations of qualitative and quantitative game analysis, coming closer in spirit to the present work, such as [59], which considers infinite 2-player turn-based games where every move is associated with a 'reward' (e.g., priority in parity games) after every move and eventually the payoffs are determined by the resulting infinite sequence of rewards.Also, there is an active research on mean-payoff and energy parity games, including [23,24,26,27], and [13] combining parity objectives with quantitative requirements on mean payoffs or maintaining non-negative energy.Our framework is again technically related, but richer in its multi-agent aspects, quantitative features, and the logical language, and modelling scope.-Other relevant references discussing the interaction between qualitative and quantitative reasoning in multi-player games include [52], and [37].
As noted above, resource-bounded models and logics are more closely related, both technically and conceptually, to our framework than most of the other discussed research areas, so we provide here a more detailed comparison1 with these.First, we note that resourcebounded models and logics are more restricted in scope, as they focus only on the resourcebased interpretation of the payoffs.That is an essential conceptual difference with our framework, which models a much more general scenario, where individual agents stand to gain or lose (value or resources) as a result of their collective actions, which also have the qualitative effect of determining the subsequent 'games' to play.Thus, the payoffs in our framework can be regarded not only as resource consumption and generation, but also as gains, incentives, rewards, etc.That, in particular, leads also to some technical differences between the above mentioned works and ours, also reflected in the logical language, formal semantics, and model-checking problems and procedures.More specifically, many of these frameworks assume only consumption of resources in the transitions, and a typical modelchecking problem asks whether the proponent coalition has a resource-bounded strategy to achieve its objective with a given initial resource budget.Some of the frameworks mentioned above also consider increase of resources within a fixed total amount (e.g., money), which also has a similar technical effect of enabling decidability of the model checking, at the price of limiting the applicability.Yet others, incl.[1,2], also allow production of resources but impose various limitations, e.g. on the number of resource units, or on the possible range of the resources [2,7,8], etc.More importantly, however, the consumption and production of resources in these works typically depends only on the individual agents' actions or on the joint actions of the proponent coalition.While this makes very good sense in various real scenarios, and also often results in decidable (and sometimes even tractable) model checking, it does not cover many situations -for instance, related to multi-agent teams -where agents not only consume, but also generate resources in a way which is determined by the actions of all agents, not only those in the proponent coalition, as enabled in our framework and exemplified here in Example 2. Furthermore, our framework essentially involves the use of guards which determine the available actions to the individual agents depending on their current accumulated payoff, resp.resource availability.These make substantial technical differences and, as shown in the paper, can easily (and not surprisingly) lead to undecidable model checking, thus making the problem for constructing desirable strategies for the agents quite more challenging.In summary, the framework presented in this work shares and combines features of several previous developments in a way that we believe to be conceptually natural, simple, elegant and uniform, while technically very rich and with a wide range of potential applications.In particular, we emphasise that the design of this framework was not driven by aiming at ensuring decidability results, but by its naturalness and intended scope of applicability.

Preliminaries
Concurrent game models ( [6,50]) can be regarded as multi-stage combinations of normal form games as they incorporate a whole family of such games, each associated with a state of a given transition system.However, in the concurrent game models the outcomes of a normal form game associated with a given state are simply the possible successor states with their associated games, etc. whereas no payoffs, or even preferences on outcomes, are assigned.Thus, a play in a concurrent game model consists of a sequence of -generally different -one-shot normal form games played in succession.All that is taken into account in the purely logical framework are the properties-expressed by formulae of a logical language-of the states occurring in the play.Concurrent game models can also be viewed as generalisation of (possibly infinite) extensive form games where cycles and simultaneous moves of different players are allowed, but no payoffs are assigned.-a non-empty, fixed set of players (agents) = {1, … , k} and a set of actions ≠ ∅ for each ∈ .
For any A ⊆ we will denote A ∶= ∏ ∈A and will use A to denote a tuple from A .In particular, is the set of all possible action profiles in S. -a non-empty set of game states .
-for each ∈ , a map   ∶  → P(  ) setting for each state s the actions available to at s. -a transition function  ∶  ×   →  that assigns to every state q and action profile = ⟨ , … , ⟩ , such that ∈ (q) for every ∈ (i.e., every that can be executed by player in state q), the (deterministic) successor (outcome) state (q, ).-a set of atomic propositions and a labelling function  ∶  → P().
Thus, all players in a CGM execute their actions synchronously and the combination of these actions, together with the current state, determines the transition to a (unique) successor state in the CGM.A play in a CGM M is an infinite sequence of such subsequent states.For further technical details refer to [6,21,31,Ch.9], The logic of strategic abilities , i.e., a multimodal logic extending the linear-time temporal logic -comprising the temporal operators ("at the next state"), ("always from now on") and ("until")-with strategic path quantifiers ⟨⟨C⟩⟩ indexed with coalitions C of players.There are two types of formulae of * , state formulae, which constitute the logic and that are evaluated at game states, and path formulae, that are evaluated on game plays.These are defined by mutual recursion with the following grammars, where C ⊆ , ∈ : -state formulae are defined by ∶∶= | ¬ | ( ∧ ) | ⟨⟨C⟩⟩ , -and path formulae by . * is very expressive and that comes at a high computational price: both model checking and satisfiability are 2ExpTime-complete ([6, 53]).A computationally better behaved fragment is the logic , which is the multi-agent analogue of , only involving state formulae defined by the following grammar, for C ⊆ , ∈ : For this logic model checking and satisfiability are P-complete and ExpTime-complete, respectively ( [6,36]).We will, however, build our extended logical formalism on and will essentially use its path-based semantics; the reduction to -based versions is straightforward.
Payoffs Payoffs usually have numerical values.The essential divide is between integer values and arbitrary real values, because every finite game model with rational payoffs can be scaled up to one with integer payoffs.Logical reasoning and computation with real payoffs is more involved, as it crucially depends on the representation of the real values and the commensurability of the payoffs.For the sake of generality, we define all basic components of our framework in terms of an abstract numerical domain of payoffs , (possibly, closed under some basic arithmetical operations).More generally, that domain can be assumed to be any ordered divisible Abelian group.However, for the purposes of the present work it will suffice to assume that is the set of integers ℤ , possibly extended later, for technical purposes, by adding 'infinity' to obtain ℤ .Thus, hereafter we only work with integer payoffs and hence integer accumulated utilities, Arithmetic constraints We define a language of arithmetic constraints to express conditions about the payoffs and accumulated utilities of players in a given play.More precisely, the use of arithmetic constraints in our framework will be two-fold: for specifying players' objectives (e.g.reaching or maintaining a desired level of accumulated utility) and for defining action guards, where an action guard for a given player is a mapping from states and values of the accumulated utility of that player to sets of actions that the guard declares available for the player at the current configuration.Formally, the arithmetic constraints are built on a fixed set of constants symbols X, used to name special values in the domain , and a set V = {v | ∈ } of special variables used to refer to the accu- mulated utilities of the players at the current state.For any A ⊆ , we denote by V A the restriction of V to A. Definition 1 (Arithmetic Constraints) Given sets X ⊆ and A ⊆ , we define the following: -The elements of T 0 (X, A) = X ∪ V A are called basic terms over X and A. Terms over X and A are built from basic terms by applying addition ( + ).The set of these terms is denoted by T(X, A). -A arithmetic constraint over X and A is any expression of the form t 1 * t 2 where * ∈ {<, ≤, =, ≥, >} ∪ {≡ n | n ∈ ℕ} and t 1 , t 2 ∈ T(X, A) .The set of these arithmetic con- straints is denoted by (X, A) .The arithmetic constraints in (X, A) which only involve relations from {<, ≤, =, ≥, >} are called simple arithmetic constraints over X and A. The set of simple arithmetic constraints is denoted by s (X, A) .The constraints in s (X, A) that only involve basic terms (without addition) will be called basic arithmetic constraints over X and A. The set of basic arithmetic constraints is denoted by b (X, A).
-An arithmetic constraint formula (over X and A) is any Boolean combination of arithmetic constraints from (X, A) , i.e. defined by the following grammar: ∶∶=c | ¬ | ∧ , where c ∈ (X, A) .The set of these arithmetic constraint formu- las is denoted (X, A) .We also assume to have constant arithmetic constraints ⊤ and ⊥ either as primitives or defined as ⊤ = (c = c) and ⊥ = ¬⊤ , where c ∈ X is arbitrar- ily fixed.Boolean combinations of simple (respectively, basic) arithmetic constraints are called simple (resp., basic) arithmetic constraint formulae, denoted by s (X, A) (respectively, b (X, A) ).Note that, in the case when A = {a} , every formula in s (X, {a}) can be transformed to an equivalent basic formula ′ in b (X � , {a}) 2 Page 8 of 33 by simplifying the occurring terms and possibly slightly extending the set of constant parameters X to X ′ (depending both on X and ).Note that a finite set of constraints from in s (X, {a}) would only require a finite extension of X to X ′ and in the case when  = ℤ , after suitable re-scaling of X, we can assume that X ′ consists of integers, too.
The inclusion of the set of constant parameters X in the definitions above is only needed when basic terms and constraints over them are considered, but we keep it in the general case for the sake of uniformity.However, when X contains a name for every value in , or X and A are clear from context or arbitrary, we simply write , etc. Arithmetic constraint formulae have a standard interpretation in ( , ≤) once the ele- ments of X ∪ V A are evaluated there.Note that the full language (X, A) is expressively equivalent on the domains of natural numbers or integers to the Presburger arithmetic PrA (after quantifiers elimination).Without essential loss of generality, for the purpose of this paper we restrict our framework to the case of simple arithmetic constraints, as defined above, which is equivalent to the strictly weaker quantifier-free fragment of PrA, not involving congruences but only = and < between terms.

Concurrent game models with payoffs and guards
The extended concurrent game models that we are going to introduce here can be viewed both as multi-agent transition systems and as multi-stage games, where at every stage the result of the simultaneous collective action of all players is two-fold: first, players receive individual payoffs, just like in the normal form games and repeated games traditionally studied in Game theory, and second, a transition is effected to (possibly) another state, where (possibly) another such game is played, etc., infinitely.The combination of these two fundamental features makes the analysis of such games and the identification of optimal strategies quite challenging.An important class of games closely related to those studied here is stochastic games [47,54], where the players' strategies and the environment deciding the transitions are stochastic.In particular, a long-standing open question there is classifying the optimal strategies in games (introduced by Gillette, 1957) of the type of 'Big Match' [10], cf. the recent work [39] and references therein.

Definition and examples
We now extend concurrent game models with utility payoffs for every action profile applied at every state.Thus, every action profile applied at a given state has now two effects: (i) it assigns a payoff to each player, and (ii) determines a transition to a new state, where the game associated with it is played at the next round of the play.
Besides, we also add individual guards that determine which actions are available to a player at a given configuration consisting of a state and the vector of current accumulated utilities for each player, i.e. the current sum of all payoffs the player has received in the course of the current play of the game.

Definition 3 (Guarded CGM with payoffs)
A guarded CGM with payoffs (GCGMP) is a tuple consisting of: ), -a payoff function,   ∶  ×  ×   → assigning for every agent , state s, and action profile applied at s a payoff to that agent.We will also write (s, ) for ( , s, ).-a guard function  ∶  ×  ×   → (X, ) , such that for each ∈ , state s ∈ and action , the guard ( , s, ) is an arithmetic constraint for- mula in (X, { }) that determines whether is enabled for at the state s given the value of 's current utility in the play.To keep a total function, we can assume that ( , s, ) = (v ≠ v ) (i.e. a falsum), whenever ∉ .We will also write (s, ) for ( , s, ) and define the guard for to be the restriction   ∶  ×   → (X, {}).

Every guard
must satisfy consistency conditions that enable at least one action for at s. Formally, for each s ∈ , the arithmetic constraint formula ⋁ ∈ (s, ) must be valid.

Some comments:
-The guards refine the functions from the definition of a CGM, which can be regarded as state-based guard functions.To avoid duplicating the role of and we hereafter assume that (s) = .-In our definition, the guards assigned by only depend on the current state and the current payoff of .The idea is that when the payoffs are interpreted as costs, or -more generally -consumption of resources, the possible actions of a player would depend on her current availability of utility/resources.In a more general framework the guards may also take into account other players' current payoffs, e.g., when these players are supposed to act as a team (coalition).We leave this more general case to future work, as it changes the operational model substantially and raises further questions about how the communication and cooperation between agents are regulated, which need more detailed treatment.-Note that, for completeness, the transition function is defined for all action profiles, not only for those which are enabled at the given state by the respective guards applied to the current payoffs.
In what follows we will use the notation to refer to the component corresponding to in the vector .In particular, if The normal form games associated with the states are respectively versions of the Prisoners Dilemma at state s 1 , Battle of the Sexes at state s 2 and Coordination Game at state s 3 .
The guards for each player ∈ {I, II} are defined at each state as follows, where u is 's current accumulated utility.can apply: any action if u > 0 ; may only apply action C if u = 0 ; and must play an action maximizing her minimal possible payoff in the current game if u < 0 .Formally, for each ∈ {I, II}:

Example 2
The GCGMP shown in Fig. 2 describes the following scenario.A team of 3 robots is on a mission.The team must accomplish a certain task, formalized as 'reaching the state goal'.The robots work on batteries which need to be recharged in order to provide the robots with sufficient energy to be able to function.For simplicity, we measure the energy level of robots with non-negative integers.Every action of a robot consumes some of its energy.Collective actions of all robots may, additionally, increase or decrease the energy level of each of them.Thus, every collective action is associated with an 'energy consumption/payoff table' which represents the net change -increase or decrease -of the energy level after that collective action is performed at the given state.The system is so designed that the energy level of a robot may never go below 0 (which can be verified).Here is the detailed description of the components of the model.
States The model contains 2 states: the base station state, 'base', and the target state, 'goal'.
Actions The possible actions are: R: 'recharge'; N: 'do nothing'; G: 'reach for goal'; and B: 'return to base'.All robots have the same functionalities and abilities to perform actions, and their actions have the same effect.
Actions availability Each robot has the following actions possibly executable at the different states (for all other actions, the guards are set to false at the respective states): {R, N, G} at state base and {N, B} at state goal.
Transitions The transition function is specified in the tables in Fig. 3.Note that, since the robots abilities are assumed symmetric, it suffices to specify the action profiles as multisets, not as tuples.
Payoff tables Respectively, the payoffs are given in Fig. 3 as vectors with components that correspond to the order of the actions in the triple, not to the order of the agents which have performed them.
Here are some motivating explanations of the so defined transitions and payoffs: -The team has one recharging device which can recharge at most 2 batteries at a time and produces a total of 2 energy units in one recharge step.So if 1 or 2 robots recharge at the same time they receive a pro rata energy increase, but if all 3 robots try to recharge at the same time, the device blocks and does not charge any of them.-The transition from one state to the other consumes a total of 3 energy units.If all 3 robots take the action which is needed for that transition (G for transition from base to goal, and B for transition from goal to base), then the energy cost of the transition is distributed equally amongst them.If only 2 of them take that action, then each consumes 2 units and the extra unit is transferred to the 3rd robot (e.g., to enable providing help, when needed).-An attempt by a single robot to reach the other state fails and costs that robot 1 energy unit.

Guards
The guards are the same for each robot, specified in Table 4, where v is the variable representing the current accumulated utility of the respective robot.Some explanations: -As noted earlier, action B is disabled at state base and actions R and G are disabled at state goal.-The 'do nothing' action N does not have requirements to be enabled.
-A recharge can only be attempted if the current energy level of the robot is at most 2.
-For a robot to attempt a transition to the other state, that robot must have a minimal energy level 2.
Note that any set of two robots can ensure transition from one state to the other, but no single robot can do that.

Configurations, plays, and histories
Let M be a fixed GCGMP.A configuration (in M ) is a pair (s, ) consisting of a state s and a vector = (u 1 , … , u k ) of currently accumulated utilities, one for each agent, at that state.We define the set of possible configurations as where M is a GCGMP and = (s 0 , ) is an initial configu- ration, with s 0 ∈ an initial state and = (u 0 1 , … , u 0 k ) the vector of initial utilities of all players.The (partial) configuration transition function is defined as such that ̂ ((s, ), ) = (s � , � ) iff: (i) (s, ) = s � (the state s ′ is the successor of s when is executed).(ii) for each ∈ , the current utility satisfies the guard (s, ) for the action at the current state s.(iii) for each ∈ , � = + (s, ).An initialized GCGMP with a designated initial configuration (s 0 , ) gives rise to a configuration graph on M consisting of all configurations in M reachable from (s 0 , ) by ̂ .
A play in a GCGMP M is an infinite sequence The set of all plays in M is denoted by M .For each i ≥ 0 we write [i] to refer to the ith pair (c i , i ) on and [i, ∞] = c i i , c i+1 i+1 , … to denote the sub-play of starting from position i.A history is any finite initial sequence h = c 0 0 , c 1 1 , … , c n ∈ ( M × ) * M of a play in M .Note that a history ends in a configuration, but sometimes, for technical reasons, we also assume that n is added (as a placeholder) and equals .The set of all histories is denoted by M .Like for plays, we use the notation ] .We also allow j = ∞ and note that h[last] refers to the last pair (c n , ).
For a given set Z, let Z ≤ ∶=Z ∪ Z * denote the set of finite or infinite sequences of ele- ments of Z and let M ∶= M ∪ M .Finally, we introduce functions ⋅ c , ⋅ a , ⋅ u and ⋅ s , where Formally, these denote the projections of a given play or history respectively to the sequences of its configurations, action profiles, current utility vectors, and states.For illustration, consider the play = c 0 0 , c 1 1 , … .Then: , where c i = (s i , ).Next, for each player and configuration c = (s, ) , we define its local projection or local (view of the) configuration for to be c ∶=(s, ) .We then define local histories and local plays for to be the projections of histories and plays to the respective sequences of local configurations for and denote the respective sets by M and M .
Example 3 Some possible plays in Example 1, starting from the initial configuration (s 1 , (0, 0)) , are given below.Note that, according to the guards, the first action of any agent from this configuration must be C.
(1) Players cooperate forever: After the first round both players defect and the play moves to s 2 , where player I chooses to defect whereas II cooperates.Then I must cooperate while II must defect, but at the next rounds II can choose any action, so a possible play is: (3) After the first round player I defects while II cooperates and the play moves to s 3 , where they can get stuck indefinitely, until (if ever) they happen to coordinate, so a possible play is: Note, however, that once player I reaches accumulated utility 0 he may only play C at that round, so if player II has enough memory or can observe the current accumulated utility of I, then she can use the opportunity to coordinate with I at that round by playing C, thus escaping the trap at s 3 and making a sure transition to s 2 .This illustrates the use of (removing the state and action components from plays) ⋅ s ∶ M →  ≤ (removing the utillity and action component from plays) 2 Page 14 of 33 memory based strategies and also brings up the issue of the possible effects of the observation abilities assumed for the agents.

Strategies
Intuitively, a strategy of a player is a complete conditional plan which prescribes what action the player should take in every possible "situation".Strategies of players depend on their observation and memory abilities and can only be based on what players can observe, record and recall.Here are the main not mutually exclusive cases that arise with respect to these, and in each of them players can use bounded or unbounded memory: -All players have complete information about the model, and in particular they know the underlying concurrent game model and the payoff tables associated at all states in the GCGMP.This is the case we also assume here, but it need not always be the case.Conceivably, the players may have various types of incomplete information: about the state space, or the possible actions, guards, transitions, and the payoffs of the other players.Each of these cases requires detailed further analysis which we cannot reasonably cover within this paper.-Players can observe only their own local view of the state and their own payoff.This is the case of imperfect information which we will not discuss here, either, but will defer to a future study.-Players can observe the entire current state and only their own payoff, but not the other players' actions or payoffs.
Page 15 of 33 2 -Players can observe the current state and every player's actions.Using some bounded memory, such players can then compute and keep record of the other players' current utilities throughout the game, so we can just as well assume that such players can observe the other players' current utilities, too, and can take them into account in their strategies.-In the most general case, players' strategies are based on the entire history of the play.
We make these options precise below.

Definition 4 (Strategies)
A strategy of a player in a GCGMP M is a mapping  ∶  M →   which is consistent with the guards for , i.e., such that if That is, actions prescribed by the strategy must be enabled by the guard.
A strategy  ∶  M →   is: state-based, if it only depends on the state histories, i.e. those in s M .In the case of state-based strategies we will assume that the guards are also state-based.Formally, a state-based strategy for player is defined as a mapping  ∶  s M →   , consistent with the guards for .
configuration-based, if it only depends on the configuration histories, i.e. prescribes the same action to any two histories which have the same configuration projections.Formally, a configuration-based strategy is defined as a mapping  ∶  c M →   , con- sistent with the guards for .
memoryless (or, positional), if it only depends on the current configuration 2 .Formally, a memoryless strategy is a mapping  ∶  M →   , consistent with the guards for .-a local view strategy, if it only depends on the player's local configuration histories, i.e.
prescribes the same action to any two histories with the same local projections for the player.Formally, a local view strategy is a mapping  ∶   M →   , consistent with the guards for .
The combinations, such as memoryless configuration-based, memoryless local-view, etc., are defined likewise.The class of all (resp.state-based, configuration-based, local-view, and memoryless) strategies is denoted by Σ (resp.Σ s , Σ c , Σ l and Σ m ).Again, combinations are denoted analogously, e.g.Σ sm refers to state-based memoryless strategies.
The general definition of strategy above extends the notion of strategy from [6] where it is defined only on histories of states-that setting corresponds to state-based strategies combined with state-based guards.That more general notion also includes strategies that are typically considered e.g. in the study of repeated games, where the action prescribed to the player may depend not only on the state history but also on the previous action, or the history of actions, of the other player(s).Such are, for instance, the strategies Tit-fortat or Grim-trigger in repeated Prisoners Dilemma, as well as strategies for various card games.Also, the strategy of a gambling player could naturally depend not only on his current availability of money, but also on the history of his previous gains and losses, etc.The classification of strategies and comparison of their power is worth a separate study and 2 Page 16 of 33 will not be pursued here further.We only note that the choice of type of strategies affects essentially the computational cost of solving the associated model-checking problems, see e.g.[57].

Computationally effective strategies
This is an auxiliary subsection, where we introduce a general classification of strategies in terms of the computational resources they require from the agents.Not all notions defined in this subsection will be used further in the paper, but it is intended to serve as a common terminological and notational reference for further follow-up work.
There are at least two ways in which memory resources play a role in strategies: the memory needed to store the input of the strategy and the memory needed to compute the value of the strategy function.Note that, because the configuration graphs of GCGMPs are usually infinite, strategies are generally infinitary objects.For the sake of obtaining effective procedures we focus on "finitary" strategies.Intuitively, a finitary strategy is a program of the type: If C 1 apply action a 1 ; ... If C k apply action a k ; Otherwise, apply action a k+1 , where C 1 , … C k are pairwise exclusive conditions on the configurations or histories which the strategy takes into account.We will make this notion precise in Definition 6.
Note that finitary strategies can be memoryless (when the conditions only refer to the current configuration), finite memory, or perfect recall strategies.Moreover, even finitary strategies can still be non-computable; however, this issue will not be considered further here.
Hereafter we fix a GCGMP M with a state space , domain of payoffs  = ℤ and a global set of actions , which we assume to be the union of all players' sets of actions.In order to precisely define "finitary" and "effective" strategies in M we introduce a formal language.We call a set = { } ∈ of formulae from (X, ) strategy-defining for player at a state s ∈ of an GCGMP M iff each of the following holds: Intuitively, is the condition on current configurations which prescribes to the player to apply action .Then the clauses above state that: (1) the conditions prescribing different actions are mutually exclusive; (2) the condition prescribing an action ensures that the guard for that action at state s is satisfied; and (3) there always exists an enabled action.
An important case of strategy-defining sets of formulae is when each is a boolean constant ( ⊤ or ⊥ ).These define state-based strategies.
If the set above consists of formulae from (X, { }) we call it local strategy-defin- ing for player at a state s ∈ .Note that, without loss of generality, we can assume that Page 17 of 33 2 every formula in a strategy-defining set partitions the domain of payoffs into a finite set of disjoint intervals.Now we introduce memory transducers, i.e., finite automata with output, which are usually used as computational means to define finite memory strategies.Fomally, given a GCGMP M and a finite set (of memory cells) M, a global memory transducer for M with a memory space M (and memory size |M|) for M is a tuple T = (M, m 0 , , ) where  ∶ M ×  M →  and  ∶ M ×  M → M .A local memory transducer with a memory space M for M is a tuple T = (M, m 0 , , Intuitively, a global (resp., local) transducer reads a current configuration (resp., the local view of the configuration), prescribes an action based on it and on its current internal state (memory cell), and then updates its internal state.

Definition 5 (Effective memory transducers) A global memory transducer
) is effective for player iff there is a family of sets of formulae s,j s∈ ,j∈M from (X, ) for a finite set of payoff constants X, where each s,j = { s,j

| ∈
} is strategy-defining for player at state s, such that (j, (s, )) = iff satisfies s,j .Likewise, a local memory transducer is effective if each s,j is locally strategy-defining.Finally, an effective transducer as above is n-bounded if max{|c| ∶ c ∈ X} = n (where |c| is the absolute value of c).
Definition 6 (Effective strategies) A global memory transducer T = (M, m 0 , , ) determines a configuration-based strategy T  ∶  c M →  for player , defined for every history h = c 0 0 c 1 1 … c n ∈ M as follows:

Such a configuration-based strategy 𝖺 ∶ 𝖧𝗂𝗌𝗍 c
M →  for player is (m, n)-effective if it is defined by an n-bounded global transducer T with memory size m; is effective if it is (m, n)-effective for some m, n ∈ ℕ.
Likewise, an n-bounded local memory transducer T with memory size m defines an (m, n)-effective local-view configuration-based strategy by using the projection function ⋅ .
In particular, a (local-view) memoryless strategy is effective if it is (1, n)-effective for some n ∈ ℕ.
Let S be some class of strategies from Definition 4, e.g. S = Σ s .Then, we write S e and S e(m,n) to refer to all strategies from S that are effective and (m, n)-effective, respectively.Combinations with earlier defined classes of strategies are defined and denoted as expected, e.g.Σ lme denotes the class of local configuration-based, memoryless, effective strategies.

Syntax and semantics
We now extend the logic * with atomic quantitative objectives being arithmetic constraints over the players' currently accumulated utilities.) The language of * consists of state formulae , which constitute the logic, and path formulae , generated as follows, where A ⊆ , ∈ , and p ∈ : ). Outermost parentheses will usually be omitted.The "sometime" operator is defined, as usual, as  ≡ ⊤ .
We say that a formula is purely qualitative (resp.purely quantitative) if it contains no arithmetic constraints (resp.no propositional symbols).The semantics of * naturally extends the semantics of * over GCGMP.In order to make that semantics more realistic from a game-theoretic perspective, we assume that all players have their individual or collective objectives and act strategically in their pursuit; in particular, both proponents and opponents of a given objective follow strategies from given classes.Formally, we consider the semantics of every formula of the type ⟨⟨A⟩⟩ parameterised with the two classes of strategies S p for the proponents and S o for the opponents, used when evaluating the truth of that formula.Thus, the proponent coalition A selects an S p -strategy s A while the opponent coali- tion ∖A selects an S o -strategy s ∖A .Then M (c, (s A , s �A )) in a given GCGMP M refers to the outcome play emerging from the execution of the strategy profile (s A , s �A ) from configuration c in M onward.Definition 8 (Semantics) Let M be a GCGMP and let S p and S o be two fixed classes of strategies.The truth of a * formula at a configuration c, respectively, on a path in M , is defined by mutual recursion on state and path formulae as follows:

Now we define some important fragments of
iff there is a collective S p -strategy A for A such that M, M (c, ( A ,  �A )) ⊧ (S p ,S o )  for all collective S o -strategies We will write ⊧ for ⊧ (Σ,Σ) .

Expressing some properties
Besides capturing all purely qualitative, * -definable properties, the logic * can also express purely quantitative properties, such as meaning "Player has a strategy to maintain his accumulated utility to be always positive".Moreover, * can naturally express combined qualitative and quantitative properties, e.g.
saying "Player has a strategy to stay happy until becomes a millionaire", or saying "Players and have a joint strategy to keep their joint accumulated utility greater than the one of until becomes always happy thereafter".
More such examples can be extracted from Examples 3 and 4.

Example 5
The following * state formulae are true at state s 1 of the GCGMP in Example 1, where p i is an atomic proposition true only at state s i , for each i = 1, 2, 3 .For partial argumentation of these, see Example 3.
-⟨⟨I, II⟩⟩ (p Example 6 Suppose the objective of the team of robots in Example 2 is that, starting from state base where each robot has energy level 0, the state goal must eventually be reached and then the team must return to the base station. The following * state formulae are true at the initial configuration (base, 0, 0, 0) in the GCGMP in Example 2, where is an atomic proposition true only at state base and is an atomic proposition true only at state goal.For partial argumentation, see Example 4. -

Reductions of qualitative to quantitative objectives
Here we show that, given a finite GCGMP M with a state space one can technically eliminate the qualitative component at the cost of adding a fictitious extra player.
Proposition 1 Let M be a finite GCGMP.Then there is an effective translation # from * to a variation of * obtained by removing the propositional symbols and adding an additional agent , and an effective transformation of M to a GCGMP M * expand- ing M with the additional agent , such that for every state formula of * and a state q ∈ M: where is the state space of M. Proof Here is an informal but precise description of the construction of the expansion of M to M * and the translation of to # .We leave the formal details to the interested reader.
1. Re-label all states of M by integers, i.e., assume = {0, … , n − 1}. 2. Introduce an extra player with payoff function in M defined so that the current utility of always equals the number # of the current state.That is done by assigning only one unguarded action to at every state and defining its payoffs to be the difference: # (successor state) -#(current state).3.For every p ∈ define the quantitative formula: Note that in any play , (p) is true at a configuration (i, ) iff p ∈ (i) , for each i ∈ .4. Translate any * -formula into a purely quantitative one # by replacing every occurrence of each p ∈ by the respective (p).

◻ Remark 1
The reduction above only works if negative payoffs are allowed, but it can also be realised in a GCGMP with only non-negative payoffs, by using congruences.The idea is to maintain the current accumulated utility of to be always congruent to the number of the current state modulo the number n of all states, and the quantitative formula associated with every p ∈ is defined likewise, by replacing v = i with v ≡ n i .Moreover, this translation also works in the case of infinitely many states, if each proposition can only occur in the labels of finitely many states.

Undecidability results
The GCGMP models are too rich and the language of * is too expressive to expect computational efficiency, or even decidability, of either model checking or satisfiability.
In the following we show that model checking of * -and even of -in a GCGMP is undecidable under rather weak assumptions, e.g. if the proponents or the opponents can use any effective strategies.These undecidability results are not surprising, as GCGMPs are technically closely related to Petri nets and vector addition systems with states (VASS) and it is known that logic-based model checking over them is generally undecidable.For example, in [34] this is shown for fragments of and (statebased) over Petri nets.Essentially, the reason is that these logics allow encoding a "test for zero" over such models; for Petri nets this means to check whether a place contains a token or not.In our setting undecidability follows for the same reason, and we will sketch some results here.We outline the constructions and arguments in order to illustrate the expressiveness of the present framework, but do not provide full technical details, as they can be essentially retrieved from the references on similar results.
We show that model checking of is undecidable even if the proponents are only permitted to use state-based, local-view, effective strategies (formally: S p = Σ sle ).In our construction there will be no opponents; so, it does not matter which class of strategies we fix for them.The reduction can be done by applying ideas from e.g.[34], or from [19]-which are used here-to simulate a two-counter machine (TCM) (aka two-counter automaton, or 2-register Minsky machine [45]).Intuitively, TCM (see e.g., [40]) can be considered as a transition system equipped with two integer counters that enable/disable transitions.Each step of the machine depends on the current state, the symbol on the tape, and the counters, whether they are zero or not.After each step the counters can be incremented ( +1 ), or decremented ( −1 ) , the latter only if the respective counter is not zero.An alternative view on a TCM is essentially as a nondeterministic push-down automaton with two stacks and exactly two stack symbols (one of them is the initial stack symbol).It has the same computation power as a Turing machine, cf.[40].
A configuration in A is a triple (s, c 1 , c 2 ) , where s ∈ S is the current state and c 1 , c 2 are the (non-negative) current values of the two counters.The initial configuration is (s init , 0, 0) .The transition relation acts non-deterministically on configurations, as follows: given a current configuration (s, c 1 , c 2 ) , takes as input (s, w, E 1 , E 2 ) , where w ∈ ∪ { } is the currently read input symbol or the empty word, and for each i = 1, 2 , E i = 1 if the coun- ter i is non-empty (i.e, c i > 0 ), respectively, E i = 0 if c i = 0 .Then produces as output a set of triples (s � , C 1 , C 2 ) where for each i = 1, 2 , C i = 1 (resp.C i = −1 and C i = 0 ) denotes that counter i is incremented by 1, decremented by 1 and left unchanged, respectively.The case C i = −1 is allowed only when c i > 0 , i.e.E i = 1 .Every such triple determines a successor configuration (s � , c 1 + C 1 , c 2 + C 2 ) .Note that each of the new counter values is non-negative.
The TCM A reads an input word ∈ * just like a finite automaton, one symbol at a time, starting from the initial configuration, and makes non-deterministically a sequence of subsequent transitions according to the respective symbols from and the transition 2 Page 22 of 33 relation.A computation in A generated by an input word ∈ * is a sequence of sub- sequent configurations effected by transitions according to the input and the transition relation.
The word ∈ * is accepted by A if there is a computation in A generated by and ending in a configuration (s, c 1 , c 2 ) where s ∈ S f .
For our present purposes it will suffice to consider computations from the empty input , where the input alphabet can be ignored.
Lemma 1 (Reduction) For any two-counter machine A we can construct a finite, turn based GCGMP M A with two players and proposition such that the following holds: A accepts the empty word iff M A contains a play with c = (s 0 , )) … such that there exists j ∈ ℕ with ∈ (s j ).
Proof Let TCM A = (S, , s init , S f , ) be given.We first outline the construction of the model M A , and then we provide the full technical details.For the simulation, we associate each counter with a player.The player's current utility encodes the counter value; actions model the increment/decrement/no change of the counters; guards ensure that the actions respect the state of the counters.The accepting states are labelled by a special proposition .
As mentioned earlier, since we only need to simulate the runs of A on the empty input, the input alphabet can be ignored and the transition relation can be simplified to States of player 1 are given by where S 1 1 = S and S 2 1 = {s x 1 x 2 | x 1 , x 2 ∈ {0, 1}, s ∈ S} .(Intuitively, player 1 chooses the initial part of a transition (s, x 1 , ⋅) (⋅, ⋅, ⋅) in states from S, and from a state s x 1 x 2 player 1 decides how the counter value of counter 1 will change).The states of player 2 are , from states s x 1 player 2 decides which transition (s, x 1 , x 2 ) (⋅, ⋅, ⋅) to choose.From a state s x 1 x 2 x 3 player 2 decides how the counter value of counter 2 will change.) Actions model the possible transitions of the automaton.An action has the general form (s, x), where s ∈ S 1 ∪ S 2 indicates the successor state and x ∈ {−1, 0, 1} specifies how the payoff of the executing player changes.For example, an action (s in state s E 1 and simulates the change of counter 1 according to C 1 .Thus, every transition ((s, E 1 , E 2 ), (s � , C 1 , C 2 )) ∈ in A is simulated by a 5-state sequence of transitions in M A , illustrated in Fig. 5.
Crucial in the encoding are the guards.For example, to a state s we assign the guard 1 (s, (s, 0)) = (v 1 = 0) indicating that action (s, 0) is only enabled if counter 1 is indeed zero (i.e.v 1 = 0 ).Similarly, 1 (s, (s, 1)) = (v 1 ≥ 1) is used to ensure that action (s, 1) is only enabled if counter 1 is non-zero (i.e.v 1 ≥ 1 ).Analogously, for the other states and the other player.
Lastly, we define s 0 = s init and label all states s ∈ S f with the proposition .Here are the technical details of the construction.Given the TCM A = (S, , s init , S f , ) we construct the turn based GCGMP M = (S, , ) , where S = ( , , { } ∈ , { } ∈ , , , ) , as follows (cf.also Fig. 5): , where: - Intuitively, Player 1 chooses the initial part of a transition (s, From a state s x 1 x 2 Player 1 decides how the counter value of counter 1 will change.
Intuitively, from states s x 1 Player 2 decides which transition relation From a state s x 1 x 2 x 3 Player 2 decides how the counter value of counter 2 will change.

⊤, else
-We define s init as initial state and label all states s ∈ S f with proposition .
2 Page 24 of 33 Now, we can show by induction that the automaton accepts the empty word iff M A con- tains a path = (s 0 , )) … such that there exists j ∈ ℕ with ∈ (s j ).To conclude the proof, it is quite straightforward to check that the model M A allows a path reaching a state labelled starting from s 0 iff the automaton accepts the empty word.◻ The next theorem states two cases for which the model-checking problem is undecidable.By Lemma 1 it suffices to specify a formula which is true if, and only if, the halting state is reached.

Theorem 1 Model checking of
1 is undecidable in the 2-agent case where S o = Σ sle and S p is fixed arbitrarily.This holds even in each of the following cases: Proof (a) By Lemma 1 and the undecidability of the halting problem of TCMs on an empty input ( [40,45]) it is sufficient to show the following: )) 1 … such that there exists j ∈ ℕ with ∈ (s j ) if, and only if, M A , (s init , (0, 0)) ⊧ (Σ sle ,S o ) ⟨⟨1, 2⟩⟩ . The right-to-left direction is clear.For the left-to-right direction, we define the strategy profile s = ( 1 , 2 ) as follows: For each i = 1, 2 , the strategy i assigns action k to the sequence of states s 0 s k if s k is a player i's state, for k = 0, … , j , else an arbitrary action.Once the final state s j is reached the action that guarantees transition to that same state is performed.Clearly, such strategy needs only finite memory, it is state-based, hence local view, and is effective.
(b) Let M A ′ be defined like M A but all guards map to ⊤ (i.e. they are state-based).Moreover, we label all states s x 1 x 2 ∈ S 2  1 with a proposition and, additionally, with i iff x i = 0 , for i = 1, 2 .In these states the consistency of the choice of the transitions is veri- fied.Then, we have that M A contains a path = (s 0 , We illustrate the right-to-left direction by considering the case where the state s k is of the form s 0x 2 .Then action (s 0 k−2 , 0) must have been performed in state s k−2 and thus v k−2 1 = 0 .Thus, the guards in M A are correctly simulated.The reasoning for the other combinations of x 1 x 2 is similar.The remainder of the proof is analogous to (a).◻

Corollary 1 Model checking 2-agent
* is undecidable, where S p = Σ sle and S o is fixed arbitrarily.This holds even in the following cases: (a) -formulae, not involving arithmetic constraints; )) ) (note that ¬ can be expressed by means of ).Now, the proofs follows the same lines as in the proof of Theorem 1. ◻

Remark 2
The undecidability results above essentially use the possibility of negative payoffs, to decrement counters.As we will show later, decidability can be possibly restored if only non-negative payoffs are allowed in the model (cf.Theorem 2).However, if in the language of arithmetical constraints we allow addition and comparison of payoffs of different players, then undecidability can be re-established again, even if only non-negative payoffs are allowed.This can be done by introducing a fictitious new player and using the differences (which can be positive or negative) between his current utility and the current utilities of the other players to play the role of the players' current utilities used in the undecidability proofs above.
More undecidability results can be obtained likewise, using the formula ⟨⟨1, 2⟩⟩ , for the 2-agent cases with negative payoffs and no guards and any effective strategies, or with configuration based-guards and configuration based strategies.We leave out the technical details.

Decidability results
Despite the wide-ranging undecidability results, there are some natural semantic and syntactic restrictions of * where decidability of the model checking problem may be restored, by making the configuration space and the strategy search space finite.Such restrictions include: the enabling of only memoryless strategies, imposing non-negative payoffs, constraints on the transition graph of the model, restrictions of the arithmetical constraints and guards ensuring bounded players' accumulated utilities, etc.Here we outline one such non-trivial case and briefly discuss some others, but a more comprehensive study of decidable cases is left to further work.
Cover unfoldings.As already noted earlier, the GCGMP models are technically closely related to vector addition systems with states (VASS).Karp and Miller introduced in [44] a compact symbolic representation of an over-approximation of the set of reachable configurations in a given vector addition system W , by means of a finite labelled tree which is often called the cover graph of W and used it to solve, inter alia, the coverability problem for W , deciding membership in the so called coverability set of W , consisting of all con- figurations in W that can be 'exceeded', in terms of the lexicographic ordering over the vectors of counter values, by reachable configurations.We will formally define and explain here a version of the cover graph for the class of GCGMP models.
First, let ℤ be the set of integers extended with an 'infinity number' which is strictly greater than all integers.For each x ∈ ℤ we put + x = + = .Now, the arithmetic constraint formulae are readily extended and interpreted over ℤ .Let ∈ ℕ .Given two vectors , � ∈ ℤ k , for some k ∈ ℕ , we define ≤ ′ iff i = � i or  < i < ′ i , for each i = 1, … , k .Then, we define <  ′ iff ≤ ′ and ≠ ′ .Now, we define ⊕ ′ as the vector ̂ ∈ ℤ k such that ̂ i = i if i = � i , and ̂ i = otherwise, for i = 1, … , k.Now, given a GCGMP M , a configuration c ∈ M , and a natural number ∈ ℕ , the -cover unfolding of M from c, denoted M c , is a CGM essentially obtained by unfolding the initialized configuration graph of M from the designated initial configu- ration c, but each time we encounter a configuration c � = (s, � ) after we have already encountered a configuration c * = (s, ) with ≤ ′ , we add not c ′ but (s, ⊕ � ) to M c .Furthermore, given c ′ , we take a ≤ -largest such preceding configuration c * .The intuition is as follows: whenever a configuration in M can be reached which is greater in terms of ≤ than a previously reached configuration with the same state, then this 'increasing step' can be repeated infinitely, resulting either in a cyclic subsequence of configurations, or in a strictly increasing one on at least one coordinate component.Then ′′ ⊕ ′ places in each coordinate of strict increase, meaning that unboundedly large values can be reached on that coordinate.To make that formal, we define simultaneously the state space c and the outcome function c of M c as follows.We define the set c of generalised configurations reachable from c as the smallest set containing c and closed under c , which is defined as follows.First, we extend ̂ to act on gen- eralised configurations just like it does on standard configurations, but by taking into account the extended interpretation of + and the arithmetic constraint formulae in ℤ .
Proposition 2 Let M be a finite GCGMP with non-negative payoffs and c ∈ M .Then, M c is finite for any ∈ ℕ.

Proof
The proof is similar to the corresponding proof for Karp-Miller graphs [44]; cf. also a similar proof for covers of resource bounded models in [18].
Suppose M c is infinite (i.e., it has infinitely many states).Note that every (s, ) ∈ c has only finitely many c -successors.Then, by König's lemma, there is an infinite play = c 0 0 c 1 1 … in M c with c i = (s i , i ) consisting of distinct states in M c (recall, that these are generalised configurations in M ).Since the set of states in M is finite, there is some state s ∈ of M and an infinite subsequence of distinct configurations � = c i 1 c i 2 … of with c i j = (s, i j ) and i j < i j+1 for all j = 1, 2, … .Due to the construction of the -cover, it cannot be the case that i j ≤ i j ′ for any 1 ≤ j < j ′ ; otherwise, according to the definition of c , a configuration (s, i j ⊕ i j � ) would have been introduced in ′ , forcing each subsequent generalised configuration to be equal from that point on, which contradicts the infinite number of configurations in .So, for each j = 1, 2, ... there must be an agent j such that j j ≠ j+1 j and not  < j j < j+1 j .Since the set of agents is finite, at least one of them, say , will appear infinitely often in 1 , 2 , ... , so we can assume that ′ has been cho- sen so that each j is .Thus, j ≠ j+1 and not  < j < j+1 for each j = 1, 2, ... .Because the payoff vectors are non-negative, this implies that either  = j < j+1 , or j <  ≤ j+1 , or j < j+1 <  .But, clearly, each of these options can only occur finitely many times -a contradiction with the choice of ′ .Therefore, M c must be finite.◻ The idea of using -covers is to reduce model checking in a given GCGMP to model checking in its -covers.In order to formally extend the semantics of * to -covers defined as above, note that every -cover can also be seen as a GCGMP with state-based guards and arbitrary payoff function, e.g., one always assigning payoffs 0 to all players.Thus, the set of configurations in M c regarded a GCGMP-denoted ̂ (M c ) can be identified with the set of states in it and re-defined as ̂ (M c ) = c .Therefore, we can define the satisfaction relation � ⊧ (S p ,S o ) over -cover models analo- gously to its state-based version ⊧ ((S p ) s ,(S o ) s ) where configurations are drawn from ̂ .Thus, M c can be used to give meaningful semantics to * -formulae, with the truth definitions of all * -formulae (which only depend on the state-history) defined as usual (cf.Definition 8), whereas the truth of all atomic formulae ∈ is determined by , included in the state, with respect to the ordering in ℤ as defined above.Formally, we consider arithmetic constraints as atomic propositional formulas and define their truth directly in the model.However, note that comparisons between two players' utilities may not be possible to evaluate on configurations where both values are , so we have to restrict the language to the fragment * 1 that does not permit comparisons between players' utility values.The formal definition is given below.
Definition 10 (Extended cover model) Let be any * 1 -formula.Then we define C = { ∈ | occurs in } be the set of arithmetic constraints occurring in , and M c be the -cover of a GCGMP M .The -extended -cover M c, of M is the same as M c but the set of atomic propositions is extended by C where the labelling function of M c, is extended on C as follows: for all ∈ C and (s, ) ∈ , ∈ ((s, )) iff ⊧ where ∈ b (X, { }).
Let max (M, ) be the maximum of all constants occurring in any guard of M and in any arithmetic constraint occurring in .(If there are none, take any positive integer.)The next result shows that one can reduce truth of formulae of * 1 in a GCGMP with non-negative payoffs to truth in its -extended -cover for any  > max (M,) .We first introduce some auxiliary notation and prove a lemma.Given two integers x, x � ∈ ℤ and ∈ ℕ we write x ≡ x ′ iff x = x � or  < x, x ′ .We extend ≡ to vectors , � ∈ ℤ n and to sequences , � ∈ (ℤ ) as follows: ≡ ′ iff u i ≡ u ′ i for all i = 1, … , n and, respec- tively ≡ ′ iff x i ≡ x ′ i for all i = 1, 2, … .Then, we extend ≡ to generalised configu- rations: (s, ) ≡ (s � , � ) iff s = s � and ≡ ′ .Finally, for two plays and ′ we write ≡ ′ iff ( ) s = ( � ) s and ( ) u ≡ ( � ) u , i.e. the sequences of states are identical and the utility values are either pairwise equal or both strictly greater than .The following lemma shows that for a fixed set of basic constraints it is sufficient to consider integers up to a specific size.We note that the result can be extended to simple arithmetic constraints by evaluating addition.Lemma 2 Let be the maximum of all constants occurring in an arithmetic constraint ∈ b (X, { }) .Then, for any x, x � ∈ ℤ with x ≡ x ′ we have that x ⊧ iff x ′ ⊧ .
Proof An arithmetic constraint ∈ b (X, { }) is of the form v ∼ c or c ∼ d for c, d ∈ X and ∼∈ {<, ≤, =, ≥, >} .The truth of c ∼ d is independent of x and x ′ .It remains to con- sider v ∼ c .The claim clearly holds if x = x � .Finally, suppose that  < x, x ′ .As any con- stant occurring in is at most , it follows that x ⊧ v ∼ c iff x � ⊧ v ∼ c .◻ Lemma 3 Let S p , S o ∈ {Σ, Σ e } and M be a GCGMP with non-negative payoffs, c be a con- figuration in M and M c = (M, c) be the respective initialised GCGMP.Then: Proof We fix arbitrarily  > max (M,) .Now, both claims are proved by mutual structural induction on state and path formulae, simultaneously on all configurations c ′ in M c and c ′′ in M c, such that c ′ ≡ c ′′ , and all paths in M c and ′ in M c, such that ≡ ′ .
The case for atomic formulae follows directly from the semantics of the arithmetic constraints in M and in M c, and from Lemma 2. The cases of boolean connectives are routine, as usual.The cases of temporal connectives and path formulae follow easily from the respective cases of the inductive hypothesis for state subformulae, applied to all respective pairs of states on the two paths.Now, consider the case where = ⟨⟨A⟩⟩ .By the inductive hypothesis for , for any two plays over M c and ′ over M c, with ≡ ′ we have that M c ,  ⊧  iff M c,  ,  ′ ⊧ .Next, note that every history or play in M c generates a respective history or play ′ in M c, obtained by applying step-by-step c instead of ̂ to the previous configuration and the same action profile to produce the next generalised configuration in M c, .Moreover, ≡ ′ by the definition of c .Conversely, every history (respectively, path) in M c, is generated in such a way from some history (respectively, path) in M c .Now, let S p = Σ and suppose that M c , c � ⊧ (S p ,S o )  .Then, there is a joint S p -strategy A such that for all joint strategies �A ∈ S o and ∈ M (c � , ( A , �A )) it holds that M c ,  ⊧ (S p ,S o )  .That strategy induces a joint S p -strategy ′ A in M c, , defined on every history h ′ starting from c ′′ to prescribe the same joint action for A as the one prescribed by A on any respective history h starting from c ′ in M c emerging there as a result of the coalition A following their joint strategy A and generating h ′ .By construction, any play � ∈ M c, (c �� , ( � A , S o )) occurring in M c, is generated by some play ∈ M (c � , ( A , S o )) occurring in M c , and hence ≡ ′ .Then, by the inductive hypothesis, applied to , and ′ , we obtain that M c,  ,  � ⊧ (S p ,S o )  .Therefore, M c,  , c �� ⊧ (S p ,S o ) ⟨⟨A⟩⟩ .The converse implication follows from the fact that every play � ∈ M c �� , (c, ( � A , S o )) is generated by some such play ∈ M c (c � , ( A , S o )) in M c .The case when S p = Σ e is analogous, as the construction inducing strategies described above preserves effectiveness.This completes the induction.◻ Page 29 of 33 2 The theorem below states the main result on decidability of the model checking problem, included here.
Theorem 2 Let M be a finite GCGMP with non-negative payoffs, c ∈ M and be a *  [6].◻ There are various other ways to possibly achieve decidability, e.g. by restricting the class of agents strategies to effective strategies with fixed parameters.For instance, it is easy to see that for fixed m, n ∈ ℕ there are only finitely many (m, n)-effective strategies.With this observation we conjecture that model checking of * 1 over GCGMP (without restriction to only non-negative payoffs) is also decidable.The reason is that the configuration graph that can result from effective strategies has some regularity which is sufficient to decide the model checking problem.Formally: The semantics presented here is amenable to various further refinements or restrictions, e.g.following approaches from [19] and [11], aiming at obtaining decidable model checking and better complexity results.Further decidability results for cases of model checking of fragments of * over special classes of GCGMP models can also be obtained by adaptation from decidability results for reachability and safety problems and games in Petri nets, VASS, counter machines, and other similar models of computation from [2,7,9,11,12,17,32,33,38,41], etc.We leave these to follow-up work.

Concluding remarks
In this paper we have introduced a uniform framework for modelling and formal reasoning about strategic abilities of players and coalitions to achieve qualitative and quantitative objectives in concurrent multi-stage games.We have discussed some modelling and computational issues and have briefly illustrated the use of the proposed framework with two hypothetical examples.
We see our work as not only theoretical but also as providing a technical framework for various potential applications to AI, game theory and multi-agent systems.Detailed modelling and analysis of more concrete scenarios in these areas would be an important direction for further developments.More generic such applications include multiagent resource-based reasoning, as already indicated in the paper, as well as modelling 2 Page 30 of 33 and verification of multi-agent reinforcement learning (MARL) mechanisms (cf.[22] for a general overview of MARL and [55] for a game-theoretic perspective), where in each transition round the agents in the team perform actions in pursuit of their assigned task and each agent receives a positive or negative reward from the environment or the teaching supervisor.If the agents follow suitably designed efficient (bounded-memory) configuration-based strategies that take into account the recent rewards, they gradually learn by maximising their (possibly discounted) accumulated rewards, while at the same time satisfying specified qualitative objectives e.g. to keep the system within a safe region.
Furthermore, various natural extensions of the presented framework are possible.We briefly outline just a couple here, leaving their exploration to a future research: -Probabilistic extensions, where the guards or the transitions are defined according to respective probability distributions, rather than deterministically.Such extension can be used, e.g., for an alternative, and more direct, modelling of MARL systems.-Adding quantitative reasoning about entire plays, by introducing as atomic formulae arithmetic path constraints interpreted over mean payoffs is a natural and important extension that would enable combined quantitative and fully qualitative reasoning over infinite plays.Another natural approach to handling uniformly accumulated payoffs over finite and infinite plays is based on discounted accumulated utilities, by applying discounting factors that depreciate these accumulated utilities over time and enable asymptotic quantitative reasoning.
Finally, the systematic exploration of the purely mathematical and the game-theoretic aspects of games modelled with GCGMP are other important general directions for further research.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http:// creat iveco mmons.org/ licen ses/ by/4.0/.

2
Page 6 of 33    Here is the precise technical definition.A concurrent game model (CGM) is a tuple comprising:

Page 10 of 33 Example 1
Consider the GCGMP shown in Fig. 1 with 2 players, I and II, and 3 states, where in every state each player has 2 possible actions, C ('cooperate') and D ('defect').The transition function is depicted in the figure.

Fig. 3
Fig.3The transition function for the team of robots example

Fig. 4
Fig.4 The guard functions for the team of robots example

Page 25 of 33 2
(b)extended with the release operator 3 and only state-based guards.Proof In the proof of Theorem 1 we replace the formulae used in (a) and (b) respectively by ¬⟨⟨�⟩⟩ ¬ and ¬⟨⟨�⟩⟩¬

1 .
For any state * 1 -formula ,  > max (M,) , and configurations c ′ in M c and c ′′ in M c, such that c ′ ≡ c ′′ , it holds that: M c , c � ⊧ (S p ,S o )  if and only if M c,  , c �� ⊧ (S p ,S o )  .2. For any path * 1 -formula , and paths in M c and ′ in M c, such that ≡ ′ , it holds that: M c ,  ⊧ (S p ,S o )  if and only if M c,  ,  � ⊧ (S p ,S o ) .
1 -formula.It is decidable whether M, c ⊧ (S p ,S o )  for S p , S o ∈ {Σ, Σ e }.Proof Let  >  (M,) .By Lemma 3 we have that M, c ⊧ (S p ,S o )  if, and only if, M c,  , c⊧ (S p ,S o )  .As M c, is finite (Proposition 2) we can replace each arithmetic constraint formula occurring in by a new proposition p and label all states in M c, in which holds with p .We denote the resulting formula by ̂ and the model by Mc, .The formula ̂ is purely qualitative.Note that in the purely qualitative case, ⊧ is simply the clas- sical satisfaction relation of * , where all versions of ⊧ (S p ,S o ) for S p , S p ∈ {Σ, Σ s , Σ se } are equivalent, so we can decide the problem of model checking M, c ⊧ (S p ,S o )  by reducing it to * -model checking of ̂ in Mc,