On the complexity of rational verification

Rational verification refers to the problem of checking which temporal logic properties hold of a concurrent/multiagent system, under the assumption that agents in the system choose strategies that form a game theoretic equilibrium. Rational verification can be understood as a counterpart to model checking for multiagent systems, but while classical model checking can be done in polynomial time for some temporal logic specification languages such as CTL, and polynomial space with LTL specifications, rational verification is much harder: the key decision problems for rational verification are 2EXPTIME-complete with LTL specifications, even when using explicit-state system representations. Against this background, our contributions in this paper are threefold. First, we show that the complexity of rational verification can be greatly reduced by restricting specifications to GR(1), a fragment of LTL that can represent a broad and practically useful class of response properties of reactive systems. In particular, we show that for a number of relevant settings, rational verification can be done in polynomial space and even in polynomial time. Second, we provide improved complexity results for rational verification when considering players’ goals given by mean-payoff utility functions—arguably the most widely used approach for quantitative objectives in concurrent and multiagent systems. Finally, we consider the problem of computing outcomes that satisfy social welfare constraints. To this end, we consider both utilitarian and egalitarian social welfare and show that computing such outcomes is either PSPACE-complete or NP-complete.


Introduction
The formal verification of computer systems has been a major research area in computer science for the past 60 years.Verification is the problem of checking program correctness: the key decision problem relating to verification is that of establishing whether or not a given system P satisfies a given specification.The most successful contemporary approach to formal verification is model checking, in which an abstract, finite state model of the system of interest P is represented as a Kripke structure K P (a labelled transition system), and the specification is represented as a temporal logic formula ϕ, the models of which are intended to correspond to "correct" behaviours of the system [11].The verification process then reduces to establishing whether the specification formula ϕ is satisfied in the Kripke structure K P (notation: K P |= ϕ), a process that can be efficiently automated in many settings of interest [7].For example, model checking Linear Temporal Logic (LTL) specifications can be done in polynomial space, and for specifications in Computation Tree Logic (CTL) it can be done in polynomial time [8].
In the context of multiagent systems, rational verification forms a natural counterpart of model checking [16,33,17].This is the problem of checking whether a given property ϕ, expressed as a temporal logic formula, is satisfied in a computation of a system that might be generated if agents within the system choose strategies for selecting actions that form a game-theoretic equilibrium.This game theoretic aspect of rational verification adds a new ingredient to the verification problem, as it becomes necessary to take into account the preferences of players with respect to the possible runs of the system.Typically, in rational verification, such preferences are given by associating an LTL goal γ i with each player i in the game: player i prefers all those runs of the system that satisfy γ i over those that do not, is indifferent between all those runs that satisfy γ i , and is similarly indifferent between those runs that do not satisfy γ i .In this setting, rational verification with respect to a specification ϕ is 2EXPTIME-complete, regardless of whether the representation of the system is given succinctly [17,16] or explicitly simply as a finite-state labelled transition graph [15].This high computational complexity represents a key barrier to the wider take-up of rational verification.
Our aim in this work is to improve this state of affairs: we present a range of settings for which we are able to give complexity results that greatly improve on the 2EXPTIME-complete result of the general LTL case.We first consider games where the goals of players are represented as GR(1) formulae.GR( 1) is an important fragment of LTL that can express a wide range of practically useful response properties of concurrent and reactive systems [4].We then consider mean-payoff utility functions: one of the most studied reward and quality measures used in games for automated formal verification.In each case, we study the rational verification problem for system specifications ϕ given as GR(1) formulae and as LTL formulae, with respect to system models that are formally represented as concurrent game structures [1].
Our main results, summarised in Table 1, show that in the cases mentioned above, the 2EXPTIME result can be dramatically improved, to settings where rational verification can be solved in polynomial space, NP, or even in polynomial time, if the number of players in the game is assumed to be fixed.In addition to characterising the complexity of the core rational verification problems for these settings, we also consider the problem of computing strategy profiles for players that maximise social welfare.Measures of social welfare are measures of how well society as a whole fares with some particular game outcome; thus social welfare measures are aggregate measures of utility.We look at two well-known measures of social welfare: utilitarian social welfare (in which we aim to maximise the sum of individual agent utilities) and egalitarian social welfare (in which we try to maximise the utility of the worst-off player).We show that, for mean payoff games, computing outcomes for these measures with LTL specifications is PSPACE-complete.

Related Work
The rational verification problem has been studied for a number of different settings, including iterated Boolean games, reactive modules games, and concurrent game structures [16,17,15,18].In each of these settings, the main rational verification problems are 2EXPTIME-complete, and hence highly intractable.Rational verification is closely related to rational synthesis, which is also 2EXPTIME-complete both in the Boolean case [13] and with rational environments [24].One might mitigate the problem of intractability by considering low-level languages such as omega-regular specifications [31,10] and turn-based setting [9].All of the above cases only consider perfect information.In settings with imperfect information, the problem has been shown to be undecidable both for games with succinct and explicit model representations [22,12].
Our work also relates to LTL and mean-payoff (mp) games in general.While the former are already 2EXPTIME-complete even for two-player games (and in fact already 2EXPTIME-hard for many LTL fragments [2]), the latter are NP-complete for multi-player games [32] and in NP ∩ coNP for two-player games [34], and in fact solvable in quasipolynomial time since they can be reduced to two-player perfectinformation parity games [6].Even though we provide several complexity results that improve on the complexity of the general case, our solutions are unlikely to run in polynomial time, for instance as CTL model checking, since rational verification subsumes problems that are typically not known to be solvable in polynomial time, such as model checking or automated synthesis with temporal logic specifications.

Preliminaries
Linear Temporal Logic.LTL extends propositional logic with two operators, X ("next") and U ("until"), for expressing properties of paths [27,11].The syntax of LTL is defined with respect to a set AP of atomic propositions as follows: where p ∈ AP.As usual, we define Fφ ≡ ⊤ U φ, and Gφ ≡ ¬F¬φ.We interpret LTL formulae with respect to pairs (α, t), where α ∈ (2 AP ) ω is an infinite sequence of sets of atomic proposition that indicates which propositional variables are true in every time point and t ∈ N is a temporal index into α.As usual, by α t ∈ 2 AP we denote the t-th element of the infinite sequence α.Formally, the semantics of LTL is given by the following rules: If (α, 0) |= φ, we write α |= φ and say that α satisfies φ.
General Reactivity of rank 1.The language of General Reactivity of rank 1, (GR(1)), is the fragment of LTL containing formulae that are written in the following form [4]: where subformulae ψ i and φ i are Boolean combinations of atomic propositions.
Mean-Payoff value.For an infinite sequence β ∈ R ω of real numbers, let mp(β) be denote mean-payoff value of β, that is, where, for n ∈ N, we define Arenas.An arena is a tuple We sometimes refer to an action profile a = (a 1 , . . ., a n ) ∈ Ac as a decision, and denote by a i the action taken by player i.We also consider partial decisions.For a set of players C ⊆ N and action profile a, we let a C and a −C be two tuples of actions, respectively, one for all players in C and one for all players in N \ C. We also write a i for a {i} and a −i for a N\{i} .For two decisions a and a ′ , we write ( a C , a ′ −C ) to denote the decision where the actions for players in C are taken from a and the actions for players in N \ C are taken from a ′ .
A path π = (s 0 , a 0 ), (s 1 , a 1 ), . . . is an infinite sequence in (St × Ac) ω such that tr(s k , a k ) = s k+1 for all k.Paths are generated in the arena by each player i selecting a strategy σ i that will define how to make choices over time.We model strategies as finite state machines with output.Formally, for arena A, a strategy σ i = (Q i , q 0 i , δ i , τ i ) for player i is a finite state machine with output (a transducer), where Q i is a finite and non-empty set of internal states, q 0 i is the initial state, a deterministic internal transition function, and τ i : Q i → Ac i an action function, Ac i ⊆ Ac for all i ∈ N. Let Str i be the set of strategies for player i.A strategy profile σ = (σ 1 , . . ., σ n ) is a vector of strategies, one for each player.As with actions, σ i denotes the strategy assigned to player i in profile σ.Moreover, by ( σ B , σ ′ C ) we denote the combination of profiles where players in disjoint B and C are assigned their corresponding strategies in σ and σ ′ , respectively.Once a state s and a strategy profile σ are fixed, the game has an outcome, a path in A, which we denote by π( σ, s).Because strategies are deterministic, π( σ, s) is the unique path induced by σ, that is, the sequence (s 0 , a 0 ), (s 1 , a 1 ), . . .such that -s k+1 = tr(s k , a k ), and a k+1 = (τ 1 (q k 1 ), . . ., τ n (q k n )), for all k ≥ 0. Where is the unique sequence of internal states of strategy σ i in σ obtained by feeding the result of previous computation at each step.
Arenas define the dynamic structure of games (the actions that agents can perform and their consequences), but lack the feature of games that gives them their strategic nature: players' preferences.A multi-player game is obtained from an arena A by associating each player with a goal.As indicated above, previous work has considered players with goals expressed as LTL formulae, with the idea being that an agent will act as best they can to ensure their LTL goal is satisfied (taking into account the fact that other players will act likewise).In the present article, we consider both goals that are expressed as GR(1) formulae, and mean payoff (mp) goals: -A multi-player GR(1) game is a tuple G GR(1) = A, (γ i ) i∈N where A is an arena and γ i is the GR(1) goal for player i. -A multi-player mp game is a tuple G mp = A, (w i ) i∈N , where A is an arena and w i : St → Z is a function mapping every state of the arena into an integer.
When it is clear from the context, we refer to a multi-player GR(1) or mp game as a game and denote it by G.In any game with arena A, a path π in A induces a sequence λ(π) = λ(s 0 )λ(s 1 ) • • • of sets of atomic propositions; if, in addition, A is the arena of an mp game, then, for each player i, the sequence w i (π) = w i (s 0 )w i (s 1 ) • • • of weights is also induced.
For a GR(1) game and a path π in it, the payoff of a player i is pay i (π) = 1 if λ(π) |= γ i and pay i (π) = 0 otherwise.Regarding an mp game, the payoff of player i is pay i (π) = mp(w i (π)).Moreover, for a GR(1) game and a path π, by Win(π) = {i ∈ N : λ(π) |= γ i } and Lose(π) = {j ∈ N : λ(π) |= γ j } we denote the set of winners and losers, respectively, over π, that is, the set of players that get their goal satisfied and not satisfied, respectively, over π.With an abuse of notation, we sometime denote Win( σ, s) = Win(π( σ, s)) and Lose( σ, s) = Lose(π( σ, s)), respectively, the set of winners and losers over the path generated by strategy profile σ when starting the game from s. Furthermore, we simply write π( σ) for π( σ, s 0 ).Nash equilibrium.Using payoff functions, we can define the concept of Nash equilibrium [25].For a game G, a strategy profile σ is a Nash equilibrium of G if, for every player i and strategy σ ′ i ∈ Str i , we have Let NE(G) be the set of Nash equilibria of G.
E-Nash and rational verification.In rational verification, a key question/problem is E-Nash, which is concerned with the existence of a Nash equilibrium that fulfils a given temporal specification ϕ.Formally, E-Nash is defined as follows: Definition 1 (E-Nash) Given a game G and a formula ϕ: Previous work [16,17,15,18] has demonstrated that, if we assume player goals are expressed as LTL formulae, the E-Nash problem is 2EXPTIME-complete, and hence highly intractable.Motivated by this, in this article, we study E-Nash for a number of relevant instantiations of the problem, which we show to have better (lower) computational complexity.In particular, we study cases where -Specifications ϕ are LTL and players' goals are GR(1); -Specifications ϕ are LTL and players have mp goals; -Both the specification ϕ and the goals are GR(1); -Specifications ϕ are GR(1) and players have mp goals.
Automata.Some of the algorithms we present for the E-Nash problem use techniques from automata theory.Specifically, we use deterministic automata on infinite words with Streett acceptance conditions.Formally, a deterministic Streett automaton on infinite words (DSW) is a tuple

Games of General Reactivity of Rank 1
We consider two variations of GR(1) games: in the first, the specification formula is expressed in LTL, while the goals are in GR(1); in the second, both the specification formula and the goals belong to GR(1).We begin by providing a general result characterizing Nash Equilibrium for GR(1), which is given in terms of punishments.We first require some notation.
For a GR(1) game G, player j ∈ N, and state s ∈ St, the strategy profile σ −j is punishing for player j in s if π(( σ −j , σ ′ j ), s) |= γ j , for every possible strategy σ ′ j of player j.We say that a state s is punishing for j if there exists a punishing strategy profile for j on s.Moreover, we denote by Pun j (G) the set of punishing states in G.A pair (s, a) ∈ St× Ac is punishing-secure for player j, if tr(s, ( a −j , a ′ j )) ∈ Pun j (G) for every action a ′ j .
Theorem 1 In a given GR(1) game G, there exists a Nash Equilibrium if and only if there exists an ultimately periodic path π such that, for every k ∈ N, the pair (s k , a k ) of the k-th iteration of π is punishing-secure for every j ∈ Lose(π).
Proof (Proof sketch) The proof proceeds by double implication.
From left to right, let σ ∈ NE(G) and π be the ultimately periodic path generated by σ.Assume by contradiction that π is not punishing-secure for some j ∈ Lose(π), that is, there is k ∈ N and action a ′ j such that tr(s k , ( a −j , a ′ j ) k ) / ∈ Pun j (G).Thus, j can deviate at s k and satisfy γ j , which is a contradiction to σ being a Nash equilibrium.
From right to left, recall that π can be generated by a finite transducer, say T π = T, t 0 , δ π , τ π with δ π : T × Ac → T being the internal function and τ π : T → Ac being the action function that generates π.Moreover, observe that such transducer can be decomposed into strategies Moreover, for every losing player j ∈ Lose(π), there is a memoryless punishing strategy profile σ pun −j : St → Ac −j for j in every s ∈ Pun j (G).Such strategy can also be decomposed and distributed to the agents different from j as σ pun,i −j (s) = σ pun −j (s) i for every i ∈ N \ {j}.Now, for every agent i, consider the strategy σ i = Q i , q 0 i , δ i , τ i defined as follows: -Q i = T × S × ({⊤} ∪ Lose(π)); -q 0 i = (t 0 , s 0 , ⊤); -δ i is defined as follows1 : Intuitively, the strategy σ i mimics the transducer T π to produce the play π.In addition to this, it keeps track of the actions taken by the losing agents, checking whether they adhere to the transducer or they deviate unilaterally from it.In case of a deviation of agent j, the strategy σ i flags the deviating agent and switches from mimicking T π to adopting the punishment strategy σ pun j .We need to show that the strategy profile σ is a Nash Equilibrium.
Clearly, as π( σ) = π, all the agents that are winning over π do not have a beneficial deviation.For a losing agent j, observe that a unilateral deviation σ j triggers the strategy profile σ −j to implement a punishment over j.Moreover, observe that GR(1) objectives are prefix-independent, which implies that the punishment takes effect no matter at which instant of the computation is started being adopted.Therefore, every deviation σ ′ j cannot be beneficial for agent j, and hence σ is a Nash Equilibrium.

⊓ ⊔
With this result in place, the following procedure can be seen to solve E-Nash: 1. Guess a set W ⊆ N of winners; 2. For each player j ∈ L = N \ W , a loser in the game, compute its punishment region Pun j (G); 3. Remove from G the states that are not punishing for players j ∈ L and the edges (s, s ′ ) that are labelled with an action profile a such that (s, a) is not punishing-secure for some j ∈ L, thus obtaining a game G −L ; 4. Check whether there exists an ultimately periodic path π in G −L such that π |= ϕ ∧ i∈W γ i holds.
Expressed more formally, the above procedure yields Algorithm 1.
1 Input: A game G GR(1) and a specification formula ϕ.
While line 6 requires solving the model checking problem for an LTL formula, which can be done in polynomial space, line 5 can be done in polynomial time.Line 4, on the other hand, makes the procedure run in exponential time in the number of players, but still in polynomial space.We then only need to consider line 3: this step can be done in polynomial time, as we now show.
Theorem 2 For a given GR(1) game G over the arena A = N, Ac, St, s 0 , tr, λ and a player j ∈ N, computing the punishing region Pun j (G) of player j can be done in polynomial time with respect to the size of both G and γ j .
Proof We reduce the problem to computing the winning region of a suitably defined Streett game with a single pair as the winning condition, whose complexity is known to be O(mn k+1 kk!) [26].Given that in our case we have k = 1, we obtain a polynomial time algorithm.
Recall that the goal of player j is of the form: where ψ j l 's and θ j r 's are boolean combinations of atomic propositions.Then, consider the arena A ′ = N, Ac, St ′ , s ′ 0 , tr ′2 where where otherwise.
otherwise.And by ⊕ k we denote the addition modulo k.
Intuitively, arena A ′ mimics the behaviour of A and carries two indexes, ι 1 and ι 2 .Index ι 1 is increased by one every time the path visits a state that satisfies ψ j ι 1 and resets to 0 every time the path visits a state that satisfies ψ j m j .Clearly, ι 1 is reset infinitely many times if and only if the path satisfies every ψ j l infinitely many times, and so if and only if it satisfies the temporal specification m j l=1 GFψ j l .The same argument applies to index ι 2 , but with respect to the boolean combinations θ j r 's.Now, consider the sets C j = St× {0} × {0, . . ., n j } and E j = St× {0, . . ., m j } × {0}.Clearly, the Streett pair (C j , E j ) is satisfied by all and only the paths in A ′ that satisfy γ j .Therefore, the winning region of γ j can be computed as the winning set of the Streett game with (C j , E j ) being the only Streett pair.Observe that the winning region is computable as Street games are determined.Moreover, having a number of pairs fixed, the computation can be done in polynomial time, which proves our statement.

⊓ ⊔
Based on Theorem 2, we have the following result.
Corollary 1 The E-Nash problem for GR(1) games with an LTL specification is PSPACE-complete.
Proof The upper-bound follows from the procedure described above.Regarding the lower-bound, note that model-checking an LTL formula ϕ against a Kripke structure K can be easily encoded as an instance of E-Nash where G is played over a Kripke structure K, taken to be its arena, players' goals being tautologies, and the specification being ¬ϕ.In such a case, we have that K |= ϕ if and only if E-Nash for the pair (G, ϕ) has a negative answer.

⊓ ⊔
Corollary 1 sharply contrasts with the complexity of E-Nash when goals expressed as LTL formulae: in this more general case, E-Nash is 2EXPTIME-complete.
The special case of GR(1) specifications.One of hardest parts of Algorithm 1 is line 6, where an LTL model checking problem must be solved, thereby making the running time of the overall procedure exponential in the size of the specification and goals of the players.As we show in the reminder of this section, one way to drastically reduce the complexity of our decision procedure is to require that the specification is also expressed in GR (1).In such a case, the LTL model checking procedure in line 6 of Algorithm 1 can be avoided, leading to a much simpler construction, which runs in polynomial time for every fixed number of players.In this section, we provide precisely such a simpler construction.
Recall that every GR(1) specification ϕ can be regarded as a Streett condition with a single pair over an arena A ′ suitably constructed from the original arena A [3]. Thus, by denoting (C ϕ , E ϕ ) and (C i , E i ) the Streett pairs corresponding to the GR(1) conditions ϕ and γ i , respectively, the problem of finding a path in A ′ satisfying the formula ϕ ∧ i∈W γ i amounts to deciding the emptiness of the Streett automaton A = Ac, St ′ , s ′ 0 , tr, Ω where Note that the size of A ′ is polynomial in the size of the GR(1) formulae involved, polynomial in the number of states and actions in the original arena A, and exponential in the number of players.More specifically, we have that |St ′ | = |St| • |γ| |N| and so the number of edges is at most |St ′ | 2 .Moreover, the emptiness problem of a deterministic Streett word automaton can be solved in time that is polynomial in the automaton's index and its number of states and transitions [29,23].The complexity of the E-Nash problem takes 2 |N| times a procedure for computing at most |N | punishing regions (that is polynomial in the size of both G and ϕ, γ 1 , . . ., γ N ) plus the complexity of the emptiness problem for a Streett automaton whose size is polynomial in G ϕ, γ 1 , . . ., γ N , and exponential in the number of players.
Based on the constructions described above, we have the following (fixedparameter tractable) complexity result.
Theorem 3 For a given GR(1) game G and a GR(1) formula ϕ, the E-Nash problem can be solved in time that is polynomial in |St|, |Ac|, and |ϕ|, |γ 1 |, . . ., |γ N | and exponential in the number of players |N|.Therefore, the problem is fixed-parameter tractable, parametrized in the number of players.

Mean-Payoff Games
We now focus on multi-player mean-payoff (mp) games.As in the previous case, we first characterise the Nash Equilibria of a game in terms of punishments and then reduce E-Nash to a suitable path-finding problem in the underlying arena.To do this, we first need to recall the notion of secure values for mean-payoff games [32].
For a player i and a state s ∈ St, by pun i (s) we denote the punishment value of i over s, that is, the maximum payoff that i can achieve from s, when all other players behave adversarially.Such a value can be computed by considering the corresponding two-player zero-sum mean-payoff game [34].Thus, it is in NP∩coNP, and note that both player i and coalition N \ {i} can achieve the optimal value of the game using memoryless strategies.
For a player i and a value z ∈ R, a pair (s, a) is z-secure for i if pun i (tr(s, ( Theorem 4 For every mp game G and ultimately periodic path π = (s 0 , a 0 ), (s 1 , a 1 ), . .., the following are equivalent 1.There is σ ∈ NE(G) such that π = π( σ, s 0 ); 2. There exists z ∈ R N , where z i ∈ {pun i (s) : s ∈ St} such that, for every i ∈ N (a) for all k ∈ N, the pair (s k , a k ) is z i -secure for i, and (b) z i ≤ pay i (π).
Proof The proof proceeds by double implication.
For the case (1) ⇒ (2), assume that σ ∈ NE(G) is such that π( σ) = π.Thus, define z i = max{pun i (tr(s k , ( a k −i , a ′ ))) : k ∈ N, a ′ ∈ Ac i }, that is, the max value agent i can achieve by unilaterally deviating from any point in π and getting immediately punished.By definition, we obtain that (s k , a k ) is z i -secure for i, at every k ∈ N.Moreover, assume by contradiction that pay i (π) < z i for some agent i.Then, let k ∈ N and a ′ i ∈ Ac i be such that z i = pun i (s k , ( a −i , a ′ i )).Thus, there exists a strategy σ ′ i that follows σ i for k steps and then deviates using a ′ i that ensures a payoff of z i for agent i.Such strategy is a beneficial deviation of agent i from σ, in contradiction with the fact that σ is a Nash Equilibrium.
For the case (2) ⇒ (1), we define a strategy profile σ and then prove it is a Nash Equilibrium.First observe that, being π ultimately periodic, there exists a finite transducer T π = T, t 0 , δ π , τ π with δ π : T × Ac → T being the internal function and τ π : T → Ac being the action function that generates π.Moreover, observe that such transducer can be decomposed into strategies σ π i = T, t 0 , δ π , τ π i where τ π i (t) = τ π (t) i .In addition to this, for every agent j, consider the memoryless strategy σ pun −j : St → Ac −j that minimizes the payoff of agent j in every state s ∈ St.Such strategy can also be decomposed and distributed to the agents different from j as σ pun,i −j (s) = σ pun −j (s) i for every i ∈ N \ {j}.Now, for every agent i, consider the strategy σ i = Q i , q 0 i , δ i , τ i defined as follows: - -q 0 i = (t 0 , s 0 , ⊤); -δ i is defined as follows: δ i (t, s, ⊤, a) = (δ π (t, a), tr(s, a), ⊤), if a = τ π (t) (δ π (t, a), tr(s, a), j), if a −j = (τ π (t)) −j and a j = (τ π (t)) j δ i (t, s, j, a) = (δ π (t, a), tr(s, a), j) Intuitively, the strategy σ i mimics the transducer T π to produce the play π.In addition to this, it keeps track of the actions taken by the other agents, checking whether they adhere to the transducer or they deviate unilaterally from it.In case of a deviation of agent j, the strategy σ i flags the deviating agent and switches from mimicking T π to adopting the punishment strategy σ pun j .Clearly, the strategy profile σ = σ1, . . ., σ n is such that π( σ) = π.It remains to show that it is a Nash Equilibrium.Note that, since every strategy σ i adopts the punishment for agent j at every possible deviation.Note that, being ∓ a prefix independent condition, the payoff for agent j is punished no matter at which instant the punishment strategy is started being adopted.At this point, being every pair (s k , a k ) in π z j -secure for agent j, it holds that every deviation of agent j does not ensure a payoff greater than z j , that is pay j ( σ −j , σ ′ j ) ≤ z j .On the other hand, from condition (b) of item 2 in the statement, we have that z j ≤ pay j ( σ).By putting these two conditions together, we obtain pay j ( σ −j , σ ′ j ) ≤ z j ≤ pay j ( σ).This proves that every deviation of agent j from σ is not beneficial, and so that σ is a Nash Equilibrium.

⊓ ⊔
The characterization of Nash Equilibria provided in Theorem 4 allows us to turn the E-Nash problem for mp games into a path finding problem over G. Similarly to the case of GR(1) games, we have the following procedure.
1.For every i ∈ N and s ∈ St, compute the value pun i (s); 2. Guess a vector z ∈ R N of values, each of them being a punishment value for a player i; 3. Compute the game G[z] by removing the states s such that pun i (s) ≤ z i for some player i and the transitions (s, a) that are not z i secure for some player i; 4. Find an ultimately periodic path π in game G[z] such that π |= ϕ and z i ≤ pay i (π) for every player i ∈ N.
Step 1 can be done in NP for every pair (i, s), step 2 can be done in exponential time and polynomial space in the number of z-secure values, and step 3 can be done in polynomial time, similar to the case of GR(1) games.Regarding the last step, its complexity depends on the specification language.For the case of ϕ being an LTL formula, consider the formula written in the language LTL Lim , an extension of LTL where statements about meanpayoff values over a given weighted arena can be made [5].Observe that formula ϕ E-Nash corresponds exactly to requirement 2(b) in Theorem 4.Moreover, since every path in G[z] satisfies condition 2(a) by construction, every path that satisfies ϕ E-Nash is a solution of the E-Nash problem and vice versa.We can solve the latter problem by model checking the formula against the arena underlying G[z].Since this can be done in PSPACE [5], we have the following result.

Corollary 2
The E-Nash problem for mp games with an LTL specification formula ϕ is PSPACE-complete.
As for the case of GR(1) games, we can summarize the procedure in the following algorithm (Algorithm 2).
The special case of GR(1) specifications.As in the case of GR(1) games, here we show that restricting the specification language to GR(1) also lowers the complexity for mp games.The reason for this is that the path finding problem for GR(1) specifications can be done while avoiding model-checking an LTL Lim formula.In order to do this, we follow a different approach.Using an mp game G and a GR(1) specification φ we define a linear program such that the linear program has a solution if and only if the pair (G, φ) is an instance of E-Nash.In Eq4: Σ src(e)∩V (ψ l ) =∅ x e = 0 ensures that no state in V (ψ l ) is in the cycle associated with the solution; Eq5: for each v ∈ V , Σ e∈out(v) x e = Σ e∈in(v) x e says that the number of times one enters a vertex is equal to the number of times one leaves that vertex.
By construction, it follows that LP(ψ l ) admits a solution if and only if there exists a path π in G such that z i ≤ pay i (π) for every player i and visits V (ψ l ) only finitely many times.Note that the condition z i ≤ pay i (π) is ensured by Eq3.Indeed, the value of a path π in G[z] that is represented in a solution to LP(ψ l ), and thus satisfying Eq3, is such that 0 ≤ pay representing the payoff function for agent i in the game G[z].Now observe that, as the weights in G[z] are all downshifted by a value z i for every agent i, it holds that pay i (π) = pay G[z] i (π) + z i , which in turns implies that z i ≤ pay i (π).Now, consider also the linear program LP(θ 1 , . . ., θ n ) defined with the following inequalities and equations: Eq1: x e ≥ 0 for each edge e a basic consistency criterion; Eq2: Σ e∈E x e ≥ 1 ensures that at least one edge is chosen; Eq3: for each i ∈ N, Σ e∈E w ′ i (src(e))x e ≥ 0 ensures that the total sum of any solution is positive; Eq4: for all 1 ≤ r ≤ n, Σ src(e)∩V (θ r ) =∅ x e ≥ 1 ensures that for every V (θ r ) at least one state is in the cycle; Eq5: for each v ∈ V , Σ e∈out(v) x e = Σ e∈in(v) x e says that the number of times one enters a vertex is equal to the number of times one leaves that vertex.
In this case, LP(θ 1 , . . ., θ n ) admits a solution if and only if there exists a path π such that z i ≤ pay i (π) for every player i and visits every V (θ r ) infinitely many times.
Since the constructions above are polynomial in the size of both G and φ, we can conclude it is possible to check in NP the statement that there is a path π satisfying ϕ such that z i ≤ pay i (π) for every player i in the game if and only if one of the two linear programs defined above has a solution.For the lower bound, we use [32] and observe that if φ is true, then the problem is equivalent to checking whether the mp game has a Nash equilibrium.

Social welfare verification
Until this point, the problems considered primarily concerned about the satisfaction of a temporal logic property ϕ over the game G.However, one might be interested in achieving an outcome that is somehow best also for the agent society.To capture this setting, we introduce social welfare measures.Social welfare measures are aggregate measures of utility.Thus, a social welfare measure takes as input a profile of utilities, one for each player in the game, and somehow aggregates these into an overall measure, indicating how good the outcome is for society as a whole.Note that since social welfare is inherently a quantitative measure, in this section we restrict our attention to mp games.Formally, for a game G with a set N of agents, a social welfare function sw takes the form sw : R N → R Thus, a social welfare function maps a N -tuple of real numbers into a real number which represents the aggregated payoff.More specifically, for a strategy profile σ, the social welfare of σ is given by sw(pay 1 ( σ), . . ., pay N ( σ)).With an abuse of notation, we denote sw( σ) the social welfare of σ.Many different social welfare functions have been proposed in the literature of economic theory.Here, we confine out attention to the two best known: utilitarian and egalitarian social welfare.These functions are defined as follows: -The utilitarian social welfare function is given by usw( σ) = i∈N pay i ( σ).
For simplicity, for a given game G and a formula ϕ, by E-Nash G (ϕ) = { σ ∈ NE : π( σ) |= ϕ} we denote the set of Nash equilibria that satisfy ϕ, that is, that are a solution to the E-Nash problem of (G, ϕ).For a fixed social welfare function sw on a game G, by: we denote the maximal and minimal social welfare achieved over a Nash equilibrium profile, respectively, satisfying a given specification ϕ.
The values of MaxNE and MinNE determine how good or bad the E-Nash solutions are from the perspective of the agents in the game collectively.Here, we consider both the decision and function problem.
Definition 2 (Threshold social welfare) For a given mp game G mp , a social welfare function sw, and a threshold value t, decide whether there exists a strategy profile σ in E-Nash G (ϕ) such that t ≤ sw( σ).In case of a positive answer to this decision question, the pair (G, ϕ) is called t-increase.
Analogously, decide whether there exists a strategy profile σ in E-Nash G (ϕ) such that t ≥ sw( σ).In case of a positive answer to this decision question, the pair (G, ϕ) is called t-decrease.The two definitions above can be instantiated with many different social welfare functions.In the following two subsections, we consider them in the context of the utilitarian and egalitarian welfare measures defined above.

Social welfare computation with LTL specifications
We first show how to check that a given mp game G mp and a LTL specification meets a given threshold t.As the utilitarian and egalitarian functions require different proofs, we address them separately.For the utilitarian function, we have the following.
Theorem 6 For a given mp game G mp = A, (w i ) i∈N , an LTL specification ϕ, and a threshold value t, deciding whether there exists a strategy profile σ ∈ E-Nash G (ϕ) such that t ≤ usw( σ) is PSPACE-complete.Analogously, deciding whether there exists a strategy profile σ ∈ E-Nash G (ϕ) such that t ≥ usw( σ) is PSPACE-complete.
Proof It is enough to show the case t ≤ usw( σ) as the other one is similar.The solution is a slight modification of the E-Nash problem for mp games with LTL specifications.Consider the arena A ′ = N ∪ {n + 1}, Ac, St, s 0 , tr ′ , λ with tr ′ defined as tr ′ (a 1 , . . ., a n , a n+1 ) = tr(a 1 , . . ., a n ) for every (a 1 , . . ., a n , a n+1 ) ∈ Ac |N|+1 , and the mp game G ′ mp = A ′ , (w i ) i∈N , (w n+1 ) with w n+1 (s) = i∈N (w i (s)) for every s ∈ St.
Intuitively, we have included an extra agent in the game, having no effect/impact on the executions, in a way that it carries information about the social welfare of the original game.Indeed, observe that, for every strategy profile σ in G ′ mp , it holds that We can employ the same construction for solving the E-Nash problem for mp games with LTL specifications to solve the threshold problem.It suffices to replace the LTL Lim formula ϕ E-Nash with ϕ usw,t E-Nash := ϕ E-Nash ∧ mp(n + 1) ≥ t.The computational complexity of the procedure is PSPACE as for E-Nash.The lower bound easily follows from the model checking of LTL.
⊓ ⊔ For the case of egalitarian social welfare, we have the following.
Theorem 7 For a given mp game G mp = A, (w i ) i∈N , an LTL specification ϕ, and a threshold value t, deciding whether there exists a strategy profile σ ∈ E-Nash G (ϕ) such that t ≤ esw( σ) is PSPACE-complete.Analogously, deciding whether there exists a strategy profile σ ∈ E-Nash G (ϕ) such that t ≥ esw( σ) is PSPACE-complete.
Proof It is enough to show the case t ≤ esw( σ) as the other one is similar.As for the case of utilitarian social welfare functions, the solution is a slight modification of the E-Nash problem for mp games with LTL specifications.Indeed, observe that we can specify that the payoff of agent i is greater than the threshold t by the LTL Lim formula mp(i) ≥ t.Therefore, specifying that the egalitarian social welfare is at least t can be done by the conjunction i∈N mp(i) ≥ t.Thus, it suffice to replace the LTL Lim ϕ E-Nash for the E-Nash problem with Again, the computational complexity of the procedure is PSPACE and the lower bound follows from the model checking of LTL.⊓ ⊔ 5.2 Social welfare computation with GR(1) specifications In this section, we address social welfare threshold problems with GR(1) specifications.The techniques are similar to the ones used in the case of LTL specifications.Firstly, we consider the utilitarian social welfare function.For a given mp game G mp = A, (w i ) i∈N , we build the arena A ′ and the game G ′ mp analogous to the way it is done in the proof of Theorem 6.Now, to solve the case t ≤ usw( σ), we adapt the procedure for solving E-Nash for mp games with GR(1) specifications (Theorem 5) as follows.We construct the corresponding multi-weighted graph W = V, E, (w ′ i ) i∈N∪n+1 where w ′ n+1 (v) = w n+1 (s) − t.Then, solving E-Nash problem for such an instance corresponds exactly to the threshold social welfare problem t ≤ usw( σ).For the case t ≥ usw( σ), we simply define w ′ n+1 (v) = t − w n+1 (s).To obtain the lower bounds, we reduce from the E-Nash problem for mp games with GR(1) specifications.For the case t ≤ usw( σ), we set t = min{w n+1 (s) : s ∈ St}, and the other case, we fix t = max{w n+1 (s) : s ∈ St}.Thus, we obtain the following result.
Theorem 8 For a given mp game G mp = A, (w i ) i∈N , a GR(1) specification ϕ, and a threshold value t, deciding whether there exists a strategy profile σ ∈ E-Nash G (ϕ) such that t ≤ usw( σ) is NP-complete.Analogously, deciding whether there exists a strategy profile σ ∈ E-Nash G (ϕ) such that t ≥ usw( σ) is NP-complete.Now we turn our attention to the egalitarian social welfare function.To solve the social threshold problem t ≤ esw( σ), we directly adapt from the procedure for solving E-Nash for mp games with GR(1) specifications (Theorem 5).For the game G[z], we build the underlying graph V, E, (w ′ i ) i∈N where w ′ i (v) = w i (s) − (max{z i , t}).Then we define the linear programs LP(ψ l ) and LP(θ 1 , . . ., θ n ) in the same way.Observe that, one of the two linear programs has a solution if and only if there is a path π satisfying ϕ such that for every player i, z i ≤ pay i (π) and t ≤ pay i (π).To obtain the lower bound, again, we reduce from the E-Nash problem for mp games with GR(1) specifications.The reduction simply follows from the fact that by fixing t = min{w i (s) : i ∈ N, s ∈ St}, we can encode E-Nash problem into the social threshold problem.The case t ≥ esw( σ) is similar.Therefore, we obtain the following result.
Theorem 9 For a given mp game G mp = A, (w i ) i∈N , a GR(1) specification ϕ, and a threshold value t, deciding whether there exists a strategy profile σ ∈ E-Nash G (ϕ) such that t ≤ esw( σ) is NP-complete.Analogously, deciding whether there exists a strategy profile σ ∈ E-Nash G (ϕ) such that t ≥ esw( σ) is NP-complete.
The threshold social welfare calculation can be used to approximate the MaxNE and MinNE values of a game, be it either utilitarian or egalitarian.Note that, for every agent i ∈ N and every strategy profile σ in the game, it holds that min(w i ) = min This establishes a bound also on the social welfare function, which is given by Moreover, observe that, for two values t < t ′ , if (G, ϕ) is t-increase but not t ′ -increase, then it holds that t ≤ MaxNE sw (G, ϕ) < t ′ .Analogously, if (G, ϕ) is t ′ -decrease, but not t-decrease, then it holds that t ≤ MinNE sw (G, ϕ) < t ′ .
These observations allow to apply a bisection-like method to approximate MaxNE and MinNE.Moreover, note that at each iteration of the method, the absolute error is halved, which ensures linear convergence of the method [30].Particularly, we obtain an approximation of the values within a fixed tolerance ǫ > 0 in a number n of iterations bounded by n ǫ = ⌈log 2 ( b−a ǫ )⌉, with a = i∈N min(w i ) and b = max(w i ).
6 Other Rational Verification Problems E-Nash is, we believe, the most fundamental problem in the rational verification framework, but it is not the only one.The two other key problems are A-Nash and Non-emptiness.The former is the dual problem of E-Nash, which asks, given a game G and a specification φ, whether φ is satisfied in all Nash equilibria of G.The latter simply asks whether the game G has at least one Nash equilibrium, and it can be thought of as the special case of E-Nash where the specification φ is any tautology.
We can conclude from (the proofs of) the results presented so far, which are summarised in Table 1, that while A-Nash for GR(1) games is also PSPACE and FPT, respectively, in case of LTL and GR(1) specifications, for mp games the problem is, respectively, PSPACE and coNP, in each case.In addition, we can also conclude that whereas Non-emptiness for GR(1) games is FPT, for mp games is NP-complete.These results contrast with those when players' goals are general LTL formulae, where all problems are 2EXPTIME-complete since LTL synthesis, which is 2EXPTIME-hard [28], can be encoded.These results also contrast with those presented in [14], where it is shown that, in succinct model representations given by iterated Boolean games or reactive modules, all problems in the rational verification framework can be polynomially reduced to Non-emptiness, which clearly cannot be the case here, unless the whole polynomial hierarchy collapses.

Concluding Remarks
We have presented improved complexity results for rational verification problems in three different settings: in the analysis of response properties of reactive systems modelled as multiagent systems; verification of mean-payoff games; and verification of collective properties of multiagent systems through the analysis of social welfare properties.The first scenario mostly concerns the verification of qualitative properties of reactive systems; the second the verification of quantitative properties; and the third the verification of "community" properties, as opposed to individual properties of agents in a system.In the remainder of this article, we discuss further the impact and relevance of our results in these three areas.
Reactive systems The logical analysis of reactive systems is typically carried out using either linear temporal logics, such as LTL, or branching time temporal logics, such as CTL and CTL * .Such analysis may involve verifying that a temporal logic property holds in a given system (model checking) or automatically constructing the system from a temporal logic specification (automated synthesis).Rational verification subsumes both problems, and applies to systems modelled in a distributed way as a collection of semi-autonomous agents (a multiagent system).Despite the greater scope of rational verification with respect to both model checking and automated synthesis, previous work has shown that the overall complexity of rational verification is typically not higher/worse than the combined complexity of the associated synthesis problem.This connection also transfers when considering goals expressed in the GR(1) fragment of LTL, where an initial solution in 2EXPTIME is reduced to complexities lying in the polynomial hierarchy.However, to do so, careful attention must be paid to how the additional game-theoretic analysis that rational verification entails must be done without blowing up the combined computational complexity.This is particularly important since, in rational verification, strategies for multiple agents must be synthesised, rather than a single model for a reactive system.
Mean-Payoff games In the computer science literature, mean-payoff games have been considered as a way of understanding the long-term behaviour (the average performance) of a system-the most common setting is that of a two-player game in which one of the players model the system and the other player models the environment.From a game-theoretic point of view, these are two-player games, which in a perfect information setting can be solved in NP ∪ coNP, thus without a known polynomial time algorithm to solve them.In case of rational verification with mean-payoff objectives, the problem is definitely harder, (unless P=NP, which is unlikely).We have shown that if the principal has an LTL goal, the problem matches the complexity of LTL model checking, a complexity gap that cannot be avoided since LTL model checking is a particular case.But, even with GR(1) specifications, the problem is very likely to be strictly harder than solving (twoplayer perfect-information) mean-payoff games since we have shown that with mean-payoff objectives the problem is NP-Complete.
Social Welfare While rational verification tends to privilege the preferences of individual agents in a system, social welfare measures focus, instead, on what is considered to be best for a society of agents.Because of this, our results regarding social welfare outcomes may complement nicely the analysis performed in rational verification as originally defined, where the perfromance of society as a whole was irrelevant.We have shown that even in this scenario, better complexity results can be achieved with respect to the complexity of the problem when only individual preferences are considered, as in a Nash equilibrium.In the specific scenario that we considered in the paper, we have shown that the problem is PSPACE-complete, and therefore still efficient with respect to the space complexity of the problem.
Future Work A limitation in adopting widely the use of rational verification instead of other reasoning techniques is its combined complexity, which is closely related to the complexity of associated automated synthesis problems.Our results are important because they show that for several significant settings, rational verification can be done with polynomial space algorithms.These results are much more attractive than in the general case, and hold out the hope of efficient practical tools (c.f. the Equilibrium Verification Environment (EVE) [20,21], a tool for the automated analysis of temporal equilibrium properties).Further practical implementations thus seem to be a natural step forward towards the deployment of rational verification in more realistic scenarios.

Definition 3 (
Max and Min social welfare) For a given mp game G mp and a social welfare function sw, compute MaxNE sw (G, ϕ) and MinNE sw (G, ϕ).

Table 1
Summary of main complexity results.
AP is a labelling function, which maps every state to a subset of atomic propositions-the atomic propositions that are true at that state.
Ac, St, s 0 , tr, λ where N, Ac, and St are finite non-empty sets of players (write N = |N|), actions, and states, respectively; s 0 ∈ St is the initial state; tr : St × Ac → St is a transition function mapping each pair consisting of a state s ∈ St and an action profile a ∈ Ac = Ac N , one for each player, to a successor state; and λ : St → 2