Rational veriﬁcation: game-theoretic veriﬁcation of multi-agent systems

We provide a survey of the state of the art of rational verification : the problem of checking whether a given temporal logic formula φ is satisfied in some or all game-theoretic equilibria of a multi-agent system – that is, whether the system will exhibit the behavior φ represents under the assumption that agents within the system act rationally in pursuit of their preferences. After motivating and introducing the overall framework of rational verification, we discuss key results obtained in the past few years as well as relevant related work in logic, AI, and computer science


Introduction
The deployment of AI technologies in a wide range of application areas over the past decade has brought the problem of verifying such systems into sharp focus.Verification is one of the most important and widelystudied problems in computer science [14].Verification is the problem of checking program correctness: the key decision problem relating to verification is that of establishing whether or not a given system P satisfies a given specification φ.The most successful contemporary approach to formal verification is model checking, in which an abstract, finite state model of the system of interest is represented as a Kripke structure (a labelled transition system), and the specification is represented as a temporal logic formula, the models of which are intended to correspond to "correct" behaviours of the system [31].The verification process then reduces to establishing whether the specification formula is satisfied in the given Kripke structure, a process that can be efficiently automated in many settings of interest [9,28].
In the present paper, we will be concerned with multiagent systems [73,82].Software agents were originally proposed in the late 1980s, but it is only over the past decade that the software agent paradigm has been widely adopted.At the time of writing, software agents are ubiquitous: we have software agents in our phone (e.g., Siri), processing requests online, automatically trading in global markets, controlling complex navigation systems (e.g., those in selfdriving cars), and even carrying out tasks on our behalf at home (e.g., Alexa).Typically, these agents do not work in isolation: they may interact with humans or with other software agents.The field of multi-agent systems is concerned with understanding and engineering systems that have these characteristics.
Since agents are typically "owned" by different principals, there is no requirement or assumption that the preferences delegated to different agents are aligned in any way.It may be that their preferences are compatible, but it may equally be that preferences are in opposition.Game theory provides a natural and widely-adopted framework through which to understand systems with these properties, where participants pursue their preferences rationally and strategically [61], and this observation has prompted a huge body of research over the past decade, attempting to apply and adapt game theoretic techniques to the analysis of multi-agent systems [63,73].

The research question
In the present article, we are concerned with the question of how we should think about the issues of correctness and verification in multi-agent systems (at this point we should clarify that, in this work, we are only concerned with systems composed solely of software agents: in Section 5 we briefly comment on the issue of verifying human-agent systems).
We argue that in a multi-agent setting, it is appropriate to ask what behaviours the system will exhibit under the assumption that agents act rationally in pursuit of their preferences.We advance the paradigm of rational verification for multi-agent systems, as a counterpart to classical verification.Rational verification is concerned with establishing whether a given temporal logic formula φ is satisfied in some or all game-theoretic equilibria of a multi-agent system -that is, whether the system will exhibit the behaviour represented by φ under the assumption that agents within the system act rationally in pursuit of their preferences/goals.
We begin by motivating our approach, describing in detail the issue of correctness and verification, and the hugely successful model checking paradigm for verification.We then discuss the question of what correctness means in the setting of multi-agent systems, and this leads us to introduce the paradigm of rational verification and equilibrium checking.Following this survey a range of semantic models for rational verification, summarising the key complexity results known for these models, and then examine three key tools for rational verification.We conclude by surveying some active areas of current research.

Setting the scene
The aim of this section is to explain how the concept of rational verification has emerged from various research trends in computer science and AI, and how it differs from the conventional conception of verification.

Correctness and formal verification
The correctness problem has been one of the most widely studied problems in computer science over the past fifty years, and remains a topic of fundamental concern to the present day [14].Broadly speaking, the correctness problem is concerned with checking that computer systems behave as their designer intends.Probably the most important problem studied within the correctness domain is that of formal verification.Formal verification is the problem of checking that a given computer program or system P is correct with respect to a given formal (i.e., mathematical) specification φ.We understand φ as a description of system behaviours that the designer judges to be acceptable -a program that guarantees to generate a behaviour as described in φ is deemed to correctly implement the specification φ.
A key insight, due to Amir Pnueli, is that temporal logic provides a suitable framework with which to express formal specifications of reactive system behaviour [66].Pnueli proposed Linear Temporal Logic (LTL) for expressing desirable properties of computations.LTL extends classical logic with tense operators X ("in the next state. .."),F ("eventually. .."),G ("always. .."), and U (". . .until. ..") [31].For example, the requirement that a system never enters a "crash" state can naturally be expressed in LTL by a formula G¬crash, where ¬crash denotes the complement (negation) of the set of "crash" states (namely states associated with a label crash).If we let P denote the set of all possible computations that may be produced by the program P , and let φ denote the set of state sequences that satisfy the LTL formula φ, then verification of LTL properties reduces to the problem of checking whether P ⊆ φ .Another key temporal formalism is Computation Tree Logic (CTL), which modifies LTL by prefixing path formulae (which depend on temporal operators) with path quantifiers A ("on all paths. ..") and E ("on some path. ..") [31].While LTL is suited to reasoning about runs or computational histories, CTL is suited to reasoning about states of transition systems that encode possible system behaviours.

Model checking
The most successful approach to verification using temporal logic specifications is model checking [28].Model checking starts from the idea that the behaviour of a finite state program P can be represented as a Kripke structure or transition system K P .Now, Kripke structures can be interpreted as models for temporal logic.So, checking whether P satisfies an LTL property φ reduces to the problem of checking whether φ is satisfied on paths through K P .Checking a CTL specification φ is even simpler: the Kripke structure K P is a CTL model, so we simply need to check whether K P |= φ, which boils down to performing reachability analysis over the states of K P (Fig. 1).These checks can be efficiently automated for many cases of interest.In the case of CTL, for example, checking whether K P |= φ can be solved in time O(|K P | • |φ|) [27,31]; for LTL, the problem is more complex (PSPACE-complete [31]), but using automata theoretic techniques it can be solved in time O(|K P | • 2 |φ| ) [80], the latter result indicating that such an approach is feasible for small specifications.Since the model checking paradigm was first proposed in 1981, huge progress has been made on extending the range of systems amenable to verification by model checking, and to extending the range of properties that might be checked [28].

Multi-agent systems
We now turn to the class of systems that we will be concerned with in the present paper.The field of multi-agent systems is concerned with the theory and practice of systems containing multiple interacting semiautonomous AI software components known as agents [73,82].Multi-agent systems are generally understood as distinct from conventional distributed or concurrent systems in several respects, but the most important distinction for our purposes is that different agents are assumed to be operating on behalf of different external principals, who delegate their preferences or goals to their agent.Because different agents are "owned" by different principals, there is no assumption that agents will have preferences that are aligned with each other.

How should we interpret correctness and formal verification in the context of multi-agent systems?
In an uninteresting sense, this question is easily answered: we can certainly think of a multi-agent system as nothing more than a collection of interacting non-deterministic computer programs, with non-determinism representing the idea that agents have choices available to them; we can express such a system using any readily available model checking framework, which would then allow us to start reasoning about the possible computational behaviours that the system might in principle exhibit.
While such an analysis is entirely legitimate, and might well yield important insights, it is nevertheless missing a very big part of the story that is relevant in order to understand a multi-agent system.This is because it ignores the fact that agents are assumed to pursue their preferences rationally and strategically.Thus, certain system behaviours that might be possible in principle will never arise in practice because they could not arise from rational choices by agents within the system.To take a specific example, consider eBay, the online auction house.When users create an auction on eBay, they must specify a deadline for bidding in the auction.This deadline, coupled with the strategic concerns of bidders, leads to a behaviour known as 'sniping' [69].Roughly, sniping is where bidders try to wait for the last possible moment to submit bids.Sniping is strategic behaviour, used by participants to try to get the best outcome possible.If we do not take into account preferences and strategic behaviour when designing a system like eBay, then we will not be able to predict or understand behaviours like sniping.
The classical formulation of correctness does not naturally match the multi-agent system setting because there can be no single specification φ, against which the correctness of a multi-agent system is judged.Instead, each agent within such a system carries its own specification: an agent is judged to be correct if it acts rationally to achieve its delegated preferences or goals.So, what should replace the classical notion of correctness and verification in the context of multi-agent systems?We posit that rational verification and equilibrium checking provide a suitable framework.

Rational verification and equilibrium checking
Along with many other researchers [63,73] we believe that game theory provides an appropriate formal framework for the analysis of multi-agent systems.Originating within economics, game theory is essentially the theory of strategic interaction between self-interested entities [61].While the mathematical framework of game theory was not developed specifically to study computational settings, it nevertheless seems that the toolkit of analytical concepts it provides can be adapted and applied to multi-agent settings.A game in the sense of game theory is usually understood as an abstract mathematical model of a situation in which self-interested players must make decisions.A game specifies the decisionmakers in the game -the "players" and the choices available to these players (their strategies).For every combination of possible choices by the players, the game also specifies what outcome will result, and each player has their own preferences over possible outcomes.
A key concern in game theory is to try to understand what the outcomes of a game can or should be, under the assumption that the players within it act rationally.To this end, a number of solution concepts have been proposed, of which the Nash equilibrium is the most prominent.A Nash equilibrium is a collection of choices, one for each participant in the game, such that no player can benefit by unilaterally deviating from this combination of choices.Nash equilibria seem like reasonable candidates for the outcome of a game because to move away from a Nash equilibrium would result in some player being worse off -which would clearly not be rational.In general, it could be the case that a given game has no Nash equilibrium or multiple Nash equilibria.Now, it should be easy to see how this general setup maps to the multi-agent systems setting: players map to the agents within the system, and each player's preferences are as defined in their delegated goals; the choices available to each player correspond to the possible courses of action that may be taken by each agent in the system.Outcomes will correspond to the computations or runs of the system, and agents will have preferences over these runs; they act to try and bring about their most preferred runs.
With this in mind, we believe it is natural to think of the following problem as a counterpart to model checking and classical verification.We are given a multi-agent system, and a temporal logic formula φ representing a property of interest.We then ask whether φ would be satisfied in some run that would arise from a Nash equilibrium collection of choices by agents within the system.We call this equilibrium checking, and refer to the general paradigm as rational verification.
3 Models for rational verification

An abstract model
Let us make our discussion a little more formal with some suggestive notation (we present some concrete models in later sections).Let P 1 , . . ., P n be the agents within a multiagent system.For now, we do not impose any specific model for agents P i : we will simply assume that agents are non-deterministic reactive programs.Non-determinism captures the idea that agents have choices available to them, while reactivity implies that agents are non-terminating.The framework we describe below can easily be applied to any number of computational models, for example concurrent games [5], event structures [81], interpreted systems [33], or multi-agent planning systems [15].
A strategy for an agent P i is a rule that defines how the agent makes choices over time.Each possible strategy for an agent P i defines one way that the agent can resolve its nondeterminism.We can think of a strategy as a function from the history of the system to date to the choices available to the agent in the present moment.We denote the possible strategies available to agent P i by Σ(P i ).The basic task of an agent P i is to select an element of Σ(P i ) -we will see later that agents select strategies in an attempt to bring about their preferences.When each agent P i has selected a strategy, we have a profile of strategies σ = (σ 1 , . . ., σ n ), one for each agent.This profile of strategies will collectively define the behaviour of the overall system.For now, we will assume that strategies are themselves deterministic, and that a collection of strategies therefore induces a unique run of the system, which we denote by σ .The set R(P 1 , . . ., P n ) of all possible runs of P 1 , . . ., P n is: Where the strategies that lead to a run do not need to be named, we will denote elements of R(P 1 , . . ., P n ) by ρ, ρ , etc. Returning to our earlier discussion, we typically use LTL as a language for expressing properties of runs: we will write ρ |= φ to mean that run ρ satisfies temporal formula φ.
Before proceeding, we state a version of the conventional model checking problem for our setting:

MODEL CHECKING:
Given: System P 1 , . . ., P n ; temporal formula φ.Question: This decision problem amounts to asking whether ∃ρ ∈ R(P 1 , . . ., P n ) such that ρ |= φ, that is, whether there is any possible computation of the system that satisfies φ, that is whether the system could in principle exhibit the behaviour φ.
Preferences So far, we have said nothing about the idea that agents act rationally in pursuit of delegated preferences.We assume that agents have preferences over runs of the system.Thus, given two possible runs ρ 1 , ρ 2 ∈ R(P 1 , . . ., P n ), it may be that P i prefers ρ 1 over ρ 2 , or that it prefers ρ 2 over ρ 1 , or that it is indifferent between the two.We represent preferences by assigning to each player P i a relation i ⊆ R(P 1 , . . ., P n ) × R(P 1 , . . ., P n ), requiring that this relation is complete, reflexive, and transitive.Thus ρ 1 i ρ 2 means that P i prefers ρ 1 at least as much as ρ 2 .We denote the irreflexive sub-relation of i by i , so ρ 1 i ρ 2 means that P i strictly prefers ρ 1 over ρ 2 .Indifference (where we have both ρ 1 i ρ 2 and ρ 2 i ρ 1 ) is denoted by ρ 1 ∼ i ρ 2 .We refer to a structure M = (P 1 , . . ., P n , 1 , . . ., n ) as a multi-agent system.
Alert readers will have noted that, if runs are infinite, then so are preference relations over such runs.This raises the issue of finite and succinct representations of preference relations over runs.Several approaches to this issue have been suggested.The most obvious is to assign each agent P i a temporal logic formula γ i representing its goal.The idea is that P i prefers all runs that satisfy γ i over all those that do not, is indifferent between all runs that satisfy γ i , and is similarly indifferent between runs that do not satisfy γ i .Formally, the preference relation i corresponding to a goal γ i is defined as follows: We discuss alternative (richer) preference models in Section 5.2.

Nash equilibrium
With this definition, we can now define the standard game theoretic concept of Nash equilibrium for our setting.Let M = (P 1 , . . ., P n , 1 , . . ., n ) be a multiagent system, and let σ = (σ 1 , . . ., σ i , . . ., σ n ) be a strategy profile.Then we say σ is a Nash equilibrium of M if for all players P i and for all strategies σ i ∈ Σ(P i ), we have: Let NE(M) denote the set of all Nash equilibria of M. Of course many other solution concepts have been proposed in the game theory literature [61] -to keep things simple, in this paper we will restrict our attention to Nash equilibrium.

Equilibrium checking
We are now in a position to introduce equilibrium checking, and the associated key decision problems.The basic idea of equilibrium checking is that, instead of asking whether a given temporal formula φ is satisfied on some possible run of the system, we instead ask whether it is satisfied on some run induced by a Nash equilibrium strategy profile of the system.Informally, we can understand this as asking whether φ could be made true as the result of rational choices by agents within the system.This idea is captured in the following decision problem (see Fig. 2): E-NASH: Given: Multi-agent system M; temporal formula φ.
The obvious counterpart of this decision problem is A-NASH, which asks whether a temporal formula φ is satisfied on all Nash equilibrium outcomes.
A higher-level question is simply whether a system has any Nash equilibria:

NON-EMPTINESS:
Given: Multi-agent system M. Question: Is it the case that NE(M) = ∅?The key difference to model checking is that we also take as input the preferences of each of the system components, and the key question asked is whether or not the temporal property φ holds on some/all equilibria of the system A system without any Nash equilibria is inherently unstable: whatever collection of choices we might consider for the agents within it, some player would have preferred to make an alternative choice.Notice that an efficient algorithm for solving E-NASH would imply an efficient algorithm for NON-EMPTINESS.
Finally, we might consider the question of verifying whether a given strategy profile represents a Nash equilibrium:

IS-NE:
Given: Multi-agent system M, strategy profile σ Question: Is it the case that σ ∈ NE(M)?
Recall that mathematically strategies are functions that take as input the history of the system to date, and give as output a choice for the agent in question.Since the computations generated by multi-agent systems will be infinitary objects, to study this decision problem we will need a finite representation for strategies.A common approach is to use finite state machines with outputs (e.g., Moore machines).

Iterated boolean games
A simple and elegant concrete computational model that we have found useful to explore questions surrounding rational verification is the framework of iterated Boolean Games (iBGs) [39].In an iBG, each agent P i is defined by associating it with a finite, non-empty set of Boolean variables Φ i , and preferences for P i are specified with an LTL formula γ i .It is assumed that each propositional variable is associated with a single agent.The choices available to P i at any given point in the game then represent the set of all possible assignments of truth or falsity to the variables under the control of P i .An iBG is "played" over an infinite sequence of rounds; in each round every player independently selects a valuation for their variables, and the infinite run traced out in this way thus defines an LTL model, which will either satisfy or fail to satisfy each player's goal.In iBGs, strategies are represented as finite state machines with output (Moore machines).This may seem like a limitation, but in fact it is not: in the setting of iBGs, finite state machine strategies are all that is required.
Let us now turn to the decision problems that we identified above, and consider their complexity in the iBG case.Before we state the complexity of these problems, it is worth recalling a special case of iBGs, which was first studied in the 1980s by Pnueli and Rosner [67].An LTL synthesis problem is a setting defined by two players, denoted E and A, two disjoint sets of propositional variables, Φ E and Φ A , and an LTL formula φ E defined over the variables Φ E ∪ Φ A .The setting is interpreted as a game in the following way: the play continues for an infinite sequence of rounds, where in each round the players simultaneously choose a valuation for their respective variable set.In this way, the play traces out a word in (Φ E ∪ Φ A ) ω , and this word can be understood as an LTL valuation.Player E wins if this valuation satisfies φ E , and loses otherwise.The LTL synthesis problem is then as follows: LTL SYNTHESIS: Given: Variables Φ E and Φ A , and LTL formula φ E .
Question: Can E force a win in the game induced by Φ E , Φ A , φ E ?That is, does there exists a strategy σ E for E such that for all strategies σ A for A, we have The LTL synthesis problem was introduced to study the problem of software settings in which we want to know whether a particular software component (represented by E in this case) can ensure that an overall system objective (φ E ) is satisfied in the presence of arbitrary, or adversarial input from the software environment (A).In game-theoretic terms, LTL synthesis is a two-player, strictly competitive win-lose game, and it can be seen as a special case of iBGs: we can model LTL synthesis in an iBG by assigning player E the goal φ E and A the goal ¬φ E .Now, the central result proved by Pnueli and Rosner was this: Theorem 1 [67] The LTL synthesis problem is 2EXPTIMEcomplete.
Observe that this is an extremely negative result, considerably worse than (for example) the PSPACEcomplete LTL model checking problem [74].The high complexity derives from the fact that the LTL synthesis problem requires quantifying over strategies for satisfying LTL formulae: checking Nash equilibrium properties of iBGs requires similar quantification, and it should therefore come as no surprise that iBGs inherit the high complexity of LTL synthesis.
Theorem 2 [39] For iBGs, IS-NE is PSPACE-complete (and hence no easier or harder than model checking or satisfiability for LTL).In contrast, NON-EMPTINESS, E-NASH, and A-NASH are all 2EXPTIME-complete.
It is not hard to see the close relationship between these problems and LTL synthesis.For example, we can immediately see that A-NASH is 2EXPTIME hard from the following reduction: given an instance (Φ E , Φ A , φ E ) of LTL synthesis, construct an iBG with players {E, A}, and propositional control sets as in the LTL synthesis instance, with goals for the players being φ E and ¬φ E respectively.Then ask whether φ E is satisfied on all Nash equilibrium runs of the game.It is straightforward to see that E has a winning strategy for φ E if and only if φ E is satisfied on all Nash equilibrium computations.
Although it may seem rather abstract, the iBG framework is quite general, and more widely applicable than it might at first appear.For example, frameworks in which agent programs P i can be axiomatised in LTL can be expressed in iBGs -see [37] for details.
One fascinating aspect of the development of the theory for iBGs is that, when understanding the equilibrium properties of iBGs, we can make use of the Nash folk theorems -classic results in game theory which relate to the equilibrium properties that can be sustained in iterated games [61].It is remarkable that a proof technique developed in the 1950s to study an abstract class of games turns out to be directly applicable to the verification of AI systems 70 years later: see [39] for details.

Concurrent game structures
Concurrent Game Structures are a widely-used model for concurrent and multi-agent systems [5].In this model, say M, typically presented in its deterministic form, there are N players who, at each state s, make an independent choice a i , with i ∈ N, which jointly define an action profile a = (a 1 , . . ., a |N| ) that uniquely determines the next state s , that is, a unique transition (s, a, s ) in M. Formally, a Concurrent Game Structure is given by a tuple: where, N and S are finite, non-empty sets of agents and system states, respectively, where s 0 ∈ S is an initial state; A i is a set of actions available to agent i, for each i; and Concurrent games are played as follows.The game begins in state s 0 , and each player i ∈ N simultaneously picks an action a 0 i ∈ A i .The game then transitions to a new state, s 1 = δ(s 0 , a 0 1 , . . ., a 0 |N| ), and this process repeats.Thus, the n th state transitioned to is s n = δ(s n−1 , a n−1 1 , . . ., a n−1 |N| ).Since the transition function is deterministic, a play of a game will be an infinite sequence of states, denoted by π.Such a sequence of states is called a run.
Thus, to play a game, agents use strategies, which are formally defined as functions from sequences of states to next states.Because Concurrent Game Structures are deterministic, a profile of strategies for all agents f = (f 1 , . . ., f |N| ) determines a unique run in M, denoted by π(f).Assuming that agents have a preference relation i , with i ∈ N, over the set of runs in M, one can immediately define further game-theoretic concepts, such as the stable outcomes, runs, or profiles of a game.For instance, in case of Nash equilibrium, we say that a strategy profile f = (f 1 , . . ., f |N| ) is a Nash equilibrium if, for each agent i and every strategy f i of i we have: that is, agent i does not prefer the run induced by (f 1 , . . ., f i , . . ., f |N| ) over the run induced by f = (f 1 , . . ., f i , . . ., f |N| ), which we call a Nash equilibrium run.

Reactive module games
While concurrent games provide a natural semantic framework for multi-agent systems, they are not directly appropriate as a modelling framework to be used by people.For this, the framework of Reactive Module Games is more suitable [41].Within this framework, concurrent games are modelled using the Simple Reactive Modules Language (SRML) [78], a simplified version of the Reactive Modules language that is widely used within the model checking community [3].
The basic idea is that each system component (agent/player) in SRML is represented as a module, which consists of an interface that defines the name of the module and lists a non-empty set of Boolean variables controlled by the module, and a set of guarded commands, which define the choices available to the module at each state.There are two kinds of guarded commands: init, used for initialising the variables, and update, used for updating variables subsequently.
A guarded command has two parts: a "condition" part (the "guard") and an "action" part.The "guard" determines whether a guarded command can be executed or not given the current state, while the "action" part defines how to update the value of (some of) the variables controlled by a corresponding module.Intuitively, ϕ α can be read as "if the condition ϕ is satisfied, then one of the choices available to the module is to execute α".Note that the value of ϕ being true does not guarantee the execution of α, but only that it is enabled for execution, and thus may be chosen.If no guarded command of a module is enabled in some state, then that module has no choice and the values of the variables controlled by it remain unchanged in the next state.More formally, a guarded command g over a set of variables Φ is an expression where the guard ϕ is a propositional logic formula over Φ, each x i is a member of Φ and ψ i is a propositional logic formula over Φ.It is required that no variable x i appears on the left hand side of more than one assignment statements in the same guarded command, hence no issue on the (potentially) conflicting updates arises.Here is a concrete example of a guarded command: The guard is the propositional logic formula (p ∧ q), so this guarded command will be enabled if both p and q are true.If the guarded command is chosen (to be executed), then in the next time-step, variable p will be assigned and variable q will be assigned ⊥.
Formally, an SRML module m i is defined as a triple m i = (Φ i , I i , U i ), where Φ i ⊆ Φ is the finite set of Boolean variables controlled by m i , I i a finite set of init guarded commands, and U i a finite set of update guarded commands.As in iBGs, it is required that variables are controlled by exactly one agent.
Figure 3 shows a module named toggle that controls a single Boolean variable, named x.There are two init guarded commands and two update guarded commands.The init guarded commands define two choices for the initialisation of variable x: true or false.The first update Fig. 3 Example of module toggle in SRML guarded command says that if x has the value of true, then the corresponding choice is to assign it to false, while the second command says that if x has the value of false, then it can be assigned to true.Intuitively, the module would choose (in a non-deterministic manner) an initial value for x, and then on subsequent rounds toggles this value.In this particular example, the init commands are non-deterministic, while the update commands are deterministic.We refer to [41] for further details on the semantics of SRML.In particular, in Figure 12 of [41], we detail how to build a Kripke structure that models the behaviour of an SRML system.
Module definitions allow us to represent the possible actions of individual agents, and the effects of their actions, but do not represent preferences.In RMGs, preferences are captured by associating each module with a goal, which is specified as a temporal logic formula.Given this, a reactive module game is given by a structure G = (N, m 1 , . . ., m n , γ 1 , . . ., γ n ), where N = {1, . . ., n} is the set of agents, m i is the module defining the choices available to agent i, as explained above, and γ i is the goal of player i.In [41], two possibilities were considered for the language of goals γ i : LTL and CTL.In the case of LTL, strategies σ i for individual players are essentially the same as in iBGs: deterministic finite state machines with output.At each round of the game, a strategy σ i chooses one of the enabled guarded commands to be executed.Because all strategies are deterministic, upon execution the collective strategies of all players will trace out a unique run, which will either satisfy or not satisfy each player's goal, as in the case of iBGs.In the case of CTL, however, strategies are non-deterministic: instead of selecting a single guarded command for execution, a strategy selects a set of guarded commands.The result of executing such strategies yields a tree structure, which will then either satisfy or fail to satisfy the CTL goals of players.
When it comes to the complexity of decision problems relating to RMGs, we find the following: Theorem 3 [41] -For LTL RMGs, IS-NE is PSPACE-complete, while E-NASH and A-NASH are both 2EXPTIME-complete.-For CTL RMGs, IS-NE is EXPTIME-complete, while E-NASH and A-NASH are both 2EXPTIME-hard.
The key conclusion relating to these results is that, despite the naturalness and expressive power of RMGs, computationally they are no more complex than iBGs.The high complexity of the key decision problems relating to RMGs indicates that naive algorithms to solve them will be hopelessly impractical: specialised techniques are required.
In Section 4.1, we will describe such techniques, and a system implemented based upon them.

Markov games
Markov Games, also known as Concurrent Stochastic Games (sometimes simply Stochastic Games), are a popular representation of (simultaneous) multi-agent decisionmaking scenarios with stochastic dynamics.In this latter respect they differ from Concurrent Game Structures, as discussed above, in which environments are assumed to be deterministic.They naturally generalise both Markov Decision Processes (a Markov Game with one player) and iterated Normal-Form Games (a Markov Game with one state).Such games proceed at each time-step, from a state s, by each agent P i using their strategy σ i to select an action a i , leading to a joint action a = (a 1 , . . ., a n ).The next state s is then drawn from the conditional probability distribution given by a Markovian transition function T (s | s, a).The strategy profile σ and the transition dynamics thus define a Markov Chain over the states S of the game, leading to a distribution Pr σ (ρ) over runs ρ = s 0 s 1 s 2 . . .through the state space.
On top of this underlying game structure one may then define different forms of objective for each of the agents.Common examples include the expected cumulative discounted reward: and the expected mean-payoff reward: lim Here, β ∈ [0, 1) is a discount factor, r i t+1 ∈ R is the reward given to agent i at time t + 1, and I (s) is an initial state distribution.Alternatively, for any set of runs R ⊆ R(P 1 , . . ., P n ) we may define an indicator random variable X R such that X R (ρ) = 1 if ρ ∈ R and X R (ρ) = 0 otherwise.A player's reward can then be defined as the expected value E σ [X R ] of this variable.For example, we could consider the probability of satisfying a temporal logic formula γ i by defining R as containing all and only those runs ρ such that ρ |= γ i .
The introduction of stochastic dynamics also introduces different 'ways of winning' when we have Boolean objectives that are either satisfied or not by a particular path [29].For example, a player may win by satisfying their goal γ i surely (with certainty), almost surely (with probability one), limit surely (with probability greater than 1−ε for every ε > 0), boundedly (with probability bounded away from one), positively (with positive probability), or existentially (possibly).Aside from these qualitative conditions, players may be interested in simply maximising the probability that their goal γ i is achieved.Such a perspective can also be carried over to the problem of rational verification, in which we may be interested in the sure, almost sure, or limit sure satisfaction of a property φ, or simply in the probability that φ is satisfied.

Tools
While synthesis problems (such as the LTL synthesis problem, introduced by Pnueli and Rosner and discussed above) have been increasingly studied within the verification community, rational verification has come to prominence only in the past few years.As such, relatively few software tools exist for this problem.Below, we briefly survey some of the most widely used.

EVE: the equilibrum verification environment
As we noted above, the high complexity of rational verification for RMGs (see above) indicates that naive algorithms for this purpose will be doomed to failure, even for systems of moderate size.It follows that any practical system will require sophisticated algorithmic techniques.The Equilibrium Verification Environment (EVE) is a system based on such techniques [45,47].
The basic approach embodied by EVE involves reducing rational verification to a collection of parity games [32], which are widely used for synthesis and verification problems.A parity game is a two-player zero-sum turnbased game given by a labelled finite graph a set of states partitioned into Player 0 (V 0 ) and Player 1 (V 1 ) states, respectively, E ⊆ V × V is a set of edges/transitions, and α : V → N is a labelling priority function.Player 0 wins if the smallest priority that occurs infinitely often in the infinite play is even.Otherwise, player 1 wins.It is known that solving a parity game (checking which player has a winning strategy) is in NP ∩ coNP [51], and can be solved in quasi-polynomial time [17]. 1he algorithm underpinning EVE uses parity games in the following way.It takes as input an RMG M and builds a parity game H whose sets of states and transitions are doubly exponential in the size of the input but with priority function only exponential in the size of the input game.Using a deterministic Streett automaton on infinite words (DSW) [52], we then solve the parity game, leading to a decision procedure that is, overall, in 2EXPTIME, and, therefore, given the hardness results we mentioned above, essentially optimal.The EVE system can: (i) solve the E-NASH and A-NASH problems for the given RMG; and (ii) synthesise individual player strategies in the game.
Experimental results show that EVE performs favourably compared with other existing tools that support rational verification.

PRISM-games
A separate though closely related thread of research into the verification of multi-agent systems has emerged from the probabilistic model-checking community.The most prominent example of this in recent years is the expansion of PRISM [54], a popular tool for probabilistic model-checking, to handle first Turn-Based [11] and now Concurrent Stochastic Games (Markov Games) [55,56].Earlier work was limited to non-cooperative turn-based or zero-sum concurrent settings.Later efforts considering cooperative, concurrent games were initially restricted to those with only two coalitions, but this restriction has been partially lifted in the most recent instantiation of the work, which supports model-checking of arbitrary numbers of coalitions in the special case of stopping games -those in which eventually, with probability one, the outcome of each player's objective becomes fixed [56].We note further that the current version of the tool also supports the use of Probabilistic Timed Automata in verifying Turn-Based Markov Games with real-valued clocks [57].
In PRISM-games, specifications are expressed in rPATL, probabilistic ATL (a generalisation of CTL that uses an extra quantifier A φ for reasoning about properties φ that that be ensured by some subset A of the agents [5]) with rewards [25].The logic is then further extended in order to be able to reason about equilibria in the game (in particular, subgame-perfect social-welfare optimal Nash equilibria).For example, this allows one to answer not only queries such as P 1 max≥0.5 (Pr[ψ]) -is it the case that P 1 can ensure that ψ holds with at least probability a half?-but also queries such as P 1 : P 2 max≥2 (Pr[ψ] + Pr[χ]) -is it the case that P 1 and P 2 can coordinate to ensure that both of their respective goals, ψ and χ, hold with probability one?-where ψ and χ are LTL formulae and similarly for expected rewards.More information can be found in [56].An alternative specification formalism that can express equilibria concepts is Probabilistic Strategy Logic [8], but it has no associated implementation.
From a technical standpoint, PRISM-games also makes use of the Reactive Modules language with individual players represented by a set of modules which may then choose an enabled command at each time-step.On top of this, users can include reward structures that produce quantitative rewards given a state and joint action as input, and define temporal logic properties expressed in the (extended version of) rPATL.For zero-sum properties PRISM-games relies on using value iteration to approximate values for all states of the game, and then solves a linear program for each state in order to compute a minimax strategy.For equilibriabased properties, a combination of backwards induction and value iteration are used, which is exact for finite-horizon and approximate for infinite-horizon properties, together with a sub-procedure for computing optimal Nash equilibria in nplayer Normal-Form Games that makes use of SMT and non-linear optimisation engines.

MCMAS
MCMAS [58] adopts interpreted systems [33] as the formal language to represent systems comprised of multiple entities.In MCMAS, interpreted systems are extended to incorporate game-theoretic notions such as those provided by ATL modalities [59].The formalisation used to model systems in MCMAS can be thought of as a "bottomup" approach, where the global state is defined as a tuple of the local states of the agents.In this setting, global states are given as the composition of local states of the agents and environment.MCMAS uses a dedicated programming language called Interpreted Systems Programming Language (ISPL) to describe the specification of Interpreted Systems.
There are different extensions of MCMAS that handle different specification logics.However, one particular extension that supports a specification language expressive enough to reason about Nash equilibrium is MCMAS-SLK [19].The tool's specification language is Strategy Logic with Knowledge (SLK) [18], an extension of Strategy Logic (SL) [24,62].Due to the undecidability of the modelchecking problem of multi-agent systems under perfect recall and incomplete information [4], the tool adopts imperfect recall semantics.
The NON-EMPTINESS problem can be solved using MCMAS by specifying the existence of Nash equilibrium with SLK.Let N = {1, . . ., n} be the set of players in a game, V ar be the set of strategy variables, and Γ be the set of goals of players in the game.Using SLK, we can express the existence of Nash equilibrium with the formula ϕ NE : where i ∈ N, x i , y i ∈ V ar, and γ i ∈ Γ .
Intuitively, formula ϕ NE can be explained as follows: for each player i with its chosen strategy x i in the game, if the goal of player i cannot be achieved using strategy x i then for every "alternative" strategy y i , the goal of player i cannot be achieved.This means that, players who do not get their goals achieved cannot benefit from unilaterally changing their strategies.Thus, if ϕ NE is true, then there exists a Nash equilibrium in the given game.The other problems of rational verification, namely E-NASH and A-NASH, can be reduced to NON-EMPTINESS [37].

Challenges
In this section, we provide a brief discussion of some current and future research challenges for rational verification.

Tackling complexity
Perhaps the most obvious challenge in making rational verification an industrial-strength reality is that of the high computational complexity of the basic decision problems.Whilst LTL formulae are expressive and natural [79], and moreover, widely used in industry [21,26,70,71], the 2EXPTIME-completeness results leave our problems grossly intractable.As such, it is important for us to consider other languages which strike a balance of complexity and expressiveness -how can we capture the richness of multiagent systems, whilst still being able to reason about them effectively?
Perhaps the most obvious thing to try is to consider fragments of LTL.Various restrictions of LTL are very well-studied [7,75] and the decision problems relating to them are much more computationally amenable.In [39], the authors consider games where all the players have propositional safety goals -that is, LTL goals of the form Gϕ, where ϕ is some propositional formula.In this setting, the E-NASH problem is PSPACEcomplete.Additionally, in [46], the authors consider GR(1) [12] goals and specifications.Here, the E-NASH problem is PSPACE-complete with GR(1) goals and LTL specifications, and lies in FPT (fixed parameter tractable) [30] when both the goals and the specifications are in GR (1).
In addition to considering restricted languages for goals and temporal queries, a number of other directions suggest themselves as possible ways in which to reduce complexity, although we have no concrete results with these directions at this time.The first possibility is to consider ways in which games can be decomposed into smaller games, while preserving the relevant game-theoretic properties.Similar techniques have been studied within the model checking community (see, e.g., [6]).Another possibility, also inspired by work within model checking, is to consider abstracting games to their smallest bisimulationequivalent form.Care must be taken in this case, however, because we need to ensure that the precise form of bisimulation to be used must preserve Nash equilibria across bisimulation-equivalent models, and naive attempts to define bisimulation, which preserve temporal logic properties under model checking, do not necessarily preserve Nash equilibria -we refer the interested reader to [40] for details.

Alternative preference models
What if we were to set aside temporal logics and consider different preference relations altogether?Staying in the qualitative mindset, in [13], the authors consider games where the players have ω-regular objectives and look at the NON-EMPTINESS problem, and obtained complexity results ranging from P-completeness all the way up to EXPTIME membership.Alternatively, one can adopt a quantitative approach and consider mean-payoff objectives -one can ask if there exists some Nash equilibrium where each player's payoff lies within a certain interval.As established in [76], this problem is NP-complete.
In order to be able to reason about games in a richer fashion, we can use quantitative and qualitative constructs in the same breath.If we look at games where the players' preferences are given by mean-payoff objectives, and we ask if there exists a Nash equilibrium which models an LTL specification, this problem is PSPACEcomplete.Moreover, if we restrict our attention to GR(1) specifications, then we retain the NP-completeness result of the original mean-payoff NON-EMPTINESS problem.However, balancing qualitative and quantitative goals and specifications is not always as straightforward as this.For instance, in two-player, zero-sum, mean-payoff parity games [23], where the first player gets their mean-payoff if some parity condition is satisfied, and −∞ otherwise, this same player may require infinite memory to act optimally.Thus, given the standard translation from non-deterministic Büchi automata to deterministic parity automata [65], this does not bode well for games with combined mean-payoff and LTL objectives -many of the techniques in rational verification depend on the existence of memoryless or finite-memory strategies in the corresponding two-player, zero-sum version of the game.Despite this, [43,44] look at games with lexicographic preferences, where the first component is either a Büchi condition or an LTL formula, and the second component is some mean-payoff objective.Rather than considering the standard NON-EMPTINESS problem, they study a closely related analogue -the problem of whether or not there exists some finite-state, strict -Nash Equilibrium.These additional restrictions are brought about precisely due to the necessity of infinite memory in mean-payoff parity games, as mentioned above.When the first component is a Büchi condition, then the given decision problem is NP-complete, and in the LTL setting, it is 2EXPTIME-complete.Thus, despite the relaxation of the solution concept, we sadly do not see any gains in computational tractability.
Finally, some work has been to introduce nondichotomous, qualitative preferences to rational verification.In [53], the authors introduce Objective LTL (OLTL) as a goal and specification format.An OLTL formula is simply a tuple of LTL formulae, along with a function which maps binary tuples of the same length to integers.In a given execution of a game, some LTL formulae will be satisfied and others will not.Marking the ones that are satisfied with 1, and the ones which are not by 0, we can pass the resulting tuple into the given function and get an integer -each agent in the game wants to maximise this integer.With this preference model, we can look at games where there is a set of agents, plus a system player, and ask if there exists some strategy for the system player, along with a Nash equilibrium for the remaining players such that the system player's payoff is above a certain threshold.This problem is no harder than the original rational synthesis problem for LTL [36], being 2EXPTIME-complete.Building on this, in [2], the authors study rational verification with LTL[F ] [1] goals and specifications.In short, LTL[F ] generalises LTL by replacing the classical Boolean operators with arbitrary functions which map binary tuples into the interval [0, 1].Again, the associated decision problem remains 2EXPTIME-complete.

Uncertain environments
Thus far, the investigation into rational verification has focused largely on settings that are deterministic, discrete, fully observable, and fully known.Indeed this is sufficient for modelling a great many scenarios of interest, such as software processes or high-level representations of multiagent control.Most of the real world, however, cannot be captured quite as neatly.This motivates the study of rational verification in uncertain environments, where this uncertainty might arise from stochastic dynamics, continuous or hybrid state and action spaces, or a structure that is only partially observable or partially known.Each of these features (and, moreover, their combination) represents an exciting direction for future work, the challenges of which we briefly outline here.
Perhaps the most natural and well-studied form of uncertainty in formal verification is of systems with stochastic dynamics.As noted above in Section 4.2, probabilistic model-checking techniques have recently been extended to the multi-agent setting by way of tools such as PRISMgames [57].Recent work on rational verification in Markov Games with goals defined by the almost sure or positive satisfaction of LTL properties has shown that the complexity classes of the main problems in both non-cooperative and cooperative rational verification remain essentially the same as in the non-stochastic setting: 2EXPTIME-complete [38].
Further results for other qualitative modes of winning (as well as for the quantitative case) are still to be obtained, however, there remain many other interesting open problems relating to ω-regular objectives in Markov Games [22].
In some situations especially when considering cyberphysical systems, it is more appropriate to model the state space (and possibly the action space) as continuous or as hybrid -with some discrete and some continuous elements.Whilst not in itself necessarily introducing uncertainty, such representations bring challenges related to the concise encoding of system dynamics agents' strategies over uncountable sets, and the careful definition of temporal logic formulae over paths through the state space.As well as modelling state or action spaces as continuous, one may also choose to represent time as being continuous, requiring new logics in which to encode specifications, such as Continuous-Time Stochastic Logic (CSL) [10] or Signal Temporal Logic (STL) [60].
When making a real-world decision in order to achieve a goal, it is rare to be able to observe all of the information relevant to that decision and goal.This intuition can be captured by models in which state space is only partially observable by the agents therein; in gametheoretic terms the agents have imperfect information.For example, Reactive Module Games in which each player may only observe a subset of the environmental variables are undecidable with three or more players, although the two-player case is solvable in 2EXPTIME [48].
Related work has explored the problem of rational synthesis in turn-based games under imperfect information (which is undecidable with three or more players and EXPTIME-complete for two players) [34], though the effects of partial observability on the rational verification problem remain under-explored.
Finally, there are scenarios in which larger portions of an environment are unknown, such as the transition dynamics, not only to the agents but also to those who wish to verify it.Here, traditional model-checking approaches do not apply and some form of learning must be introduced.As a result, different forms of guarantees about such systems are obtained, and relying on assumptions about the structure of the environment and the theoretical characteristics of the learning algorithms used.Verification methods that employ learning have recently been developed by those in both the model-checking community [16], the control, and learning community [50], though few have considered the multi-agent settings with more than two players and those that do restrict their attention to purely cooperative games [49].A further complication is raised when agents themselves employ learning in unknown environments in order to update their strategies over time.With the continuing advance of machine learning, this is likely to become an increasingly common occurrence that requires new techniques for rational verification.

Cooperative solution concepts
Rational verification was first defined for noncooperative games [39,41,83]: players were assumed to act alone, and binding agreements between players were assumed to be impossible.As such, the solution concepts used in previous studies have therefore been noncooperative -primarily Nash equilibrium and refinements thereof.
However, in many real-life situations, these assumptions misrepresent reality.In order to address this issue, in [42], such the noncooperative setting for rational verification was extended to include cooperative solution concepts [61,64].It was assumed that there is some (exogenous) mechanism through which agents in a system can reach binding agreements and form coalitions in order to collectively achieve goals.The possibility of binding cooperation and coalition formation eliminates some undesirable equilibria that arise in the noncooperative setting, and makes available a range of outcomes (i.e., computations of the system that can be sustained in equilibrium) which cannot be achieved without cooperation.
In this new cooperative setting, the focus was on the core, arguably one of the most relevant solution concepts in the cooperative game theory literature.The basic idea behind the core is that a game outcome is said to be core-stable if no subset of agents could benefit by collectively deviating from it; the core of a game is the set of core-stable outcomes.Now, in conventional cooperative games (characteristic function games with transferable utility [20]), this intuition can be given a simple and natural formal definition, and as a consequence the core is probably the most widely-studied solution concept for cooperative games.However, the conventional definition of the core does not easily map into the rational verification framework as originally defined, mainly because coalitions are subject to externalities: whether or not a coalition has a beneficial deviation depends not just on the makeup of that coalition, but also on the behaviour of the remaining agents in the system.
Coalition formation with externalities has been extensively studied in the cooperative game theory literature [35,77,84], where different variants of the core can be found.For instance, the α-core takes the pessimistic approach that requires that all members of a deviating coalition will benefit from the deviation regardless of the behaviour of the other coalitions that may be formed.Our main definition of the core precisely follows this approach.Even though coalition formation with externalities is common in and important for multi-agent systems [72], not much work has been done regarding the problem of stability, and its properties, in multi-agent coalition formation with externalities.Instead, in AI and multi-agent systems, most research has focused on the structure formation problem itself [68].Through our work on rational verification, we also address this gap in the literature of verification for AI systems.
The kinds of questions that are asked in the (rational verification) cooperative setting are exactly the same as in the non-cooperative framework, only that instead of (variants of) Nash equilibrium one refers to outcomes in the core of game-theoretic representations of multi-agent systems.Such questions, e.g., E-CORE, A-CORE, etc., bearing the same meaning as their "Nash" counterparts, are all 2EXPTIME-complete [42] for games with LTL goals, but have some computationally desirable properties: the set of outcomes in the core is never empty, is bisimulation invariant [40], and has an elegant formalisation in ATL * [5], which makes the automated solution of cooperative rational verification problems possible in practice using verification tools for multi-agent systems analysis, such as MCMAS or EVE, described before.

Rational verification of human-agent systems
In the present paper, we have focused exclusively on the verification of multi-agent systems in which the agents in question are software agents.In practice, of course, many (arguably most) systems of interest include multiple software and human participants.Might techniques surveyed here be suitable for verifying such systems?
One approach might be to model human choices and preferences using one of the frameworks described above, and then directly apply the techniques we have sketched out.However, this approach presents many natural challenges.The most obvious of these is that the techniques we have described are derived from concepts in game theory and decision theory, and in particular, they make a raft of assumptions about agents in the system.The most problematic of these is that agents are assumed to be perfectly rational (utility maximisers): they will act optimally in the furtherance of their preferences.Human decision-makers do not act in this way: game and decisiontheoretic models capture idealised rational actors.The field of behavioural economics seeks to understand the modes of decision-making that humans actually use, and if we are to verify human-agent systems, then we will need to accomodate behavioural decision-making models in our systems.At present we are aware of no work that seeks to do this.

Conclusions
Rational verification is a recent approach to the automated verification of multi-agent systems.In which we aim to automatically determine whether given properties of a system, expressed as temporal logic formulae will hold in that system under the assumption that system components (agent) behave rationally, by choosing (for example) strategies that form a game-theoretic equilibrium.Rational verification can be understood as a counterpart to the conventional model checking paradigm for automated verification.Although research in this area is at an early stage, the basic computational, logical, and algorithmic territory relating to rational verification has already been explored, and is described in the present article.An overarching goal for the future will be to make tools more practically applicable, and to understand the fundamental limitations of the paradigm.We have sketched out some of the key challenges that must be overcome to make this a reality: chief among them being dealing with complexity, broader preference models, richer modelling frameworks, and a wider range of game-theoretic solution concepts.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.

Fig. 1
Fig.1Model checking.A model checker takes as input a model, representing a finite state abstraction of a system, together with a claim about the system behaviour, expressed in temporal logic.It then determines whether or not the claim is true of the model or not; most practical model checkers will provide a counter example if not

Fig. 2
Fig.2Equilibrium checking.The key difference to model checking is that we also take as input the preferences of each of the system components, and the key question asked is whether or not the temporal property φ holds on some/all equilibria of the system

Acknowledgements
Wooldridge, Gutierrez, Harrenstein, and Perelli acknowledge the support of the ERC under grant 291528 ("RACE").Wooldridge and Harrenstein further acknowledge the support of the Alan Turing Institute, London.Kwiatkowska received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant 834115, "FUN2MODEL") and the EPSRC Programme Grant on Mobile Autonomy (EP/M019918/1).Abate achknowledges the HICLASS project (113213), a partnership between the Aerospace Technology Institute (ATI), Department for Business, Energy & Industrial Strategy (BEIS) and Innovate UK.Hammond acknowledges the support of an EPSRC Doctoral Training Partnership studentship (Reference: 2218880).Harrenstein was furthermore supported by the ERC under grant 639945 ("ACCORD").Steeples gratefully acknowledges the support of the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems EP/L015897/1 and the Ian Palmer Memorial Scholarship.Najib acknowledges the support of the ERC European Union's Horizon 2020 research and innovation programme (grant 759969).