Abstract
We provide a survey of the state of the art of rational verification: the problem of checking whether a given temporal logic formula ϕ is satisfied in some or all gametheoretic equilibria of a multiagent system – that is, whether the system will exhibit the behavior ϕ represents under the assumption that agents within the system act rationally in pursuit of their preferences. After motivating and introducing the overall framework of rational verification, we discuss key results obtained in the past few years as well as relevant related work in logic, AI, and computer science.
Introduction
The deployment of AI technologies in a wide range of application areas over the past decade has brought the problem of verifying such systems into sharp focus. Verification is one of the most important and widelystudied problems in computer science [14]. Verification is the problem of checking program correctness: the key decision problem relating to verification is that of establishing whether or not a given system P satisfies a given specification ϕ. The most successful contemporary approach to formal verification is model checking, in which an abstract, finite state model of the system of interest is represented as a Kripke structure (a labelled transition system), and the specification is represented as a temporal logic formula, the models of which are intended to correspond to “correct” behaviours of the system [31]. The verification process then reduces to establishing whether the specification formula is satisfied in the given Kripke structure, a process that can be efficiently automated in many settings of interest [9, 28].
In the present paper, we will be concerned with multiagent systems [73, 82]. Software agents were originally proposed in the late 1980s, but it is only over the past decade that the software agent paradigm has been widely adopted. At the time of writing, software agents are ubiquitous: we have software agents in our phone (e.g., Siri), processing requests online, automatically trading in global markets, controlling complex navigation systems (e.g., those in selfdriving cars), and even carrying out tasks on our behalf at home (e.g., Alexa). Typically, these agents do not work in isolation: they may interact with humans or with other software agents. The field of multiagent systems is concerned with understanding and engineering systems that have these characteristics.
Since agents are typically “owned” by different principals, there is no requirement or assumption that the preferences delegated to different agents are aligned in any way. It may be that their preferences are compatible, but it may equally be that preferences are in opposition. Game theory provides a natural and widelyadopted framework through which to understand systems with these properties, where participants pursue their preferences rationally and strategically [61], and this observation has prompted a huge body of research over the past decade, attempting to apply and adapt game theoretic techniques to the analysis of multiagent systems [63, 73].
The research question
In the present article, we are concerned with the question of how we should think about the issues of correctness and verification in multiagent systems (at this point we should clarify that, in this work, we are only concerned with systems composed solely of software agents: in Section 5 we briefly comment on the issue of verifying humanagent systems).
We argue that in a multiagent setting, it is appropriate to ask what behaviours the system will exhibit under the assumption that agents act rationally in pursuit of their preferences. We advance the paradigm of rational verification for multiagent systems, as a counterpart to classical verification. Rational verification is concerned with establishing whether a given temporal logic formula ϕ is satisfied in some or all gametheoretic equilibria of a multiagent system – that is, whether the system will exhibit the behaviour represented by ϕ under the assumption that agents within the system act rationally in pursuit of their preferences/goals.
We begin by motivating our approach, describing in detail the issue of correctness and verification, and the hugely successful model checking paradigm for verification. We then discuss the question of what correctness means in the setting of multiagent systems, and this leads us to introduce the paradigm of rational verification and equilibrium checking. Following this survey a range of semantic models for rational verification, summarising the key complexity results known for these models, and then examine three key tools for rational verification. We conclude by surveying some active areas of current research.
Setting the scene
The aim of this section is to explain how the concept of rational verification has emerged from various research trends in computer science and AI, and how it differs from the conventional conception of verification.
Correctness and formal verification
The correctness problem has been one of the most widely studied problems in computer science over the past fifty years, and remains a topic of fundamental concern to the present day [14]. Broadly speaking, the correctness problem is concerned with checking that computer systems behave as their designer intends. Probably the most important problem studied within the correctness domain is that of formal verification. Formal verification is the problem of checking that a given computer program or system P is correct with respect to a given formal (i.e., mathematical) specification ϕ. We understand ϕ as a description of system behaviours that the designer judges to be acceptable – a program that guarantees to generate a behaviour as described in ϕ is deemed to correctly implement the specification ϕ.
A key insight, due to Amir Pnueli, is that temporal logic provides a suitable framework with which to express formal specifications of reactive system behaviour [66]. Pnueli proposed Linear Temporal Logic (LTL) for expressing desirable properties of computations. LTL extends classical logic with tense operators X (“in the next state…”), F (“eventually…”), G (“always…”), and U (“… until…”) [31]. For example, the requirement that a system never enters a “crash” state can naturally be expressed in LTL by a formula G¬crash, where ¬crash denotes the complement (negation) of the set of “crash” states (namely states associated with a label crash). If we let ⟦P⟧ denote the set of all possible computations that may be produced by the program P, and let ⟦ϕ⟧ denote the set of state sequences that satisfy the LTL formula ϕ, then verification of LTL properties reduces to the problem of checking whether \(\llbracket {P}\rrbracket \subseteq \llbracket {\phi }\rrbracket \). Another key temporal formalism is Computation Tree Logic (CTL), which modifies LTL by prefixing path formulae (which depend on temporal operators) with path quantifiersA (“on all paths…”) and E (“on some path…”) [31]. While LTL is suited to reasoning about runs or computational histories, CTL is suited to reasoning about states of transition systems that encode possible system behaviours.
Model checking
The most successful approach to verification using temporal logic specifications is model checking [28]. Model checking starts from the idea that the behaviour of a finite state program P can be represented as a Kripke structure or transition system K_{P}. Now, Kripke structures can be interpreted as models for temporal logic. So, checking whether P satisfies an LTL property ϕ reduces to the problem of checking whether ϕ is satisfied on paths through K_{P}. Checking a CTL specification ϕ is even simpler: the Kripke structure K_{P} is a CTL model, so we simply need to check whether K_{P}⊧ϕ, which boils down to performing reachability analysis over the states of K_{P} (Fig. 1). These checks can be efficiently automated for many cases of interest. In the case of CTL, for example, checking whether K_{P}⊧ϕ can be solved in time O(K_{P}⋅ϕ) [27, 31]; for LTL, the problem is more complex (PSPACEcomplete [31]), but using automata theoretic techniques it can be solved in time O(K_{P}⋅ 2^{ϕ}) [80], the latter result indicating that such an approach is feasible for small specifications. Since the model checking paradigm was first proposed in 1981, huge progress has been made on extending the range of systems amenable to verification by model checking, and to extending the range of properties that might be checked [28].
Multiagent systems
We now turn to the class of systems that we will be concerned with in the present paper. The field of multiagent systems is concerned with the theory and practice of systems containing multiple interacting semiautonomous AI software components known as agents [73, 82]. Multiagent systems are generally understood as distinct from conventional distributed or concurrent systems in several respects, but the most important distinction for our purposes is that different agents are assumed to be operating on behalf of different external principals, who delegate their preferences or goals to their agent. Because different agents are “owned” by different principals, there is no assumption that agents will have preferences that are aligned with each other.
Correctness in multiagent systems
Now, consider the following question:
How should we interpret correctness and formal verification in the context of multiagent systems?
In an uninteresting sense, this question is easily answered: we can certainly think of a multiagent system as nothing more than a collection of interacting nondeterministic computer programs, with nondeterminism representing the idea that agents have choices available to them; we can express such a system using any readily available model checking framework, which would then allow us to start reasoning about the possible computational behaviours that the system might in principle exhibit.
While such an analysis is entirely legitimate, and might well yield important insights, it is nevertheless missing a very big part of the story that is relevant in order to understand a multiagent system. This is because it ignores the fact that agents are assumed to pursue their preferences rationally and strategically. Thus, certain system behaviours that might be possible in principle will never arise in practice because they could not arise from rational choices by agents within the system.
To take a specific example, consider eBay, the online auction house. When users create an auction on eBay, they must specify a deadline for bidding in the auction. This deadline, coupled with the strategic concerns of bidders, leads to a behaviour known as ‘sniping’ [69]. Roughly, sniping is where bidders try to wait for the last possible moment to submit bids. Sniping is strategic behaviour, used by participants to try to get the best outcome possible. If we do not take into account preferences and strategic behaviour when designing a system like eBay, then we will not be able to predict or understand behaviours like sniping.
The classical formulation of correctness does not naturally match the multiagent system setting because there can be no single specification ϕ, against which the correctness of a multiagent system is judged. Instead, each agent within such a system carries its own specification: an agent is judged to be correct if it acts rationally to achieve its delegated preferences or goals. So, what should replace the classical notion of correctness and verification in the context of multiagent systems? We posit that rational verification and equilibrium checking provide a suitable framework.
Rational verification and equilibrium checking
Along with many other researchers [63, 73] we believe that game theory provides an appropriate formal framework for the analysis of multiagent systems. Originating within economics, game theory is essentially the theory of strategic interaction between selfinterested entities [61]. While the mathematical framework of game theory was not developed specifically to study computational settings, it nevertheless seems that the toolkit of analytical concepts it provides can be adapted and applied to multiagent settings. A game in the sense of game theory is usually understood as an abstract mathematical model of a situation in which selfinterested players must make decisions. A game specifies the decisionmakers in the game – the “players” and the choices available to these players (their strategies). For every combination of possible choices by the players, the game also specifies what outcome will result, and each player has their own preferences over possible outcomes.
A key concern in game theory is to try to understand what the outcomes of a game can or should be, under the assumption that the players within it act rationally. To this end, a number of solution concepts have been proposed, of which the Nash equilibrium is the most prominent. A Nash equilibrium is a collection of choices, one for each participant in the game, such that no player can benefit by unilaterally deviating from this combination of choices. Nash equilibria seem like reasonable candidates for the outcome of a game because to move away from a Nash equilibrium would result in some player being worse off – which would clearly not be rational. In general, it could be the case that a given game has no Nash equilibrium or multiple Nash equilibria. Now, it should be easy to see how this general setup maps to the multiagent systems setting: players map to the agents within the system, and each player’s preferences are as defined in their delegated goals; the choices available to each player correspond to the possible courses of action that may be taken by each agent in the system. Outcomes will correspond to the computations or runs of the system, and agents will have preferences over these runs; they act to try and bring about their most preferred runs.
With this in mind, we believe it is natural to think of the following problem as a counterpart to model checking and classical verification. We are given a multiagent system, and a temporal logic formula ϕ representing a property of interest. We then ask whether ϕ would be satisfied in some run that would arise from a Nash equilibrium collection of choices by agents within the system. We call this equilibrium checking, and refer to the general paradigm as rational verification.
Models for rational verification
An abstract model
Let us make our discussion a little more formal with some suggestive notation (we present some concrete models in later sections). Let P_{1},…,P_{n} be the agents within a multiagent system. For now, we do not impose any specific model for agents P_{i}: we will simply assume that agents are nondeterministic reactive programs. Nondeterminism captures the idea that agents have choices available to them, while reactivity implies that agents are nonterminating. The framework we describe below can easily be applied to any number of computational models, for example concurrent games [5], event structures [81], interpreted systems [33], or multiagent planning systems [15].
A strategy for an agent P_{i} is a rule that defines how the agent makes choices over time. Each possible strategy for an agent P_{i} defines one way that the agent can resolve its nondeterminism. We can think of a strategy as a function from the history of the system to date to the choices available to the agent in the present moment. We denote the possible strategies available to agent P_{i} by Σ(P_{i}). The basic task of an agent P_{i} is to select an element of Σ(P_{i}) – we will see later that agents select strategies in an attempt to bring about their preferences. When each agent P_{i} has selected a strategy, we have a profile of strategies \(\vec {\sigma } = (\sigma _{1},\ldots ,\sigma _{n})\), one for each agent. This profile of strategies will collectively define the behaviour of the overall system. For now, we will assume that strategies are themselves deterministic, and that a collection of strategies therefore induces a unique run of the system, which we denote by \(\vec {\sigma }\). The set R(P_{1},…,P_{n}) of all possible runs of P_{1},…,P_{n} is:
Where the strategies that lead to a run do not need to be named, we will denote elements of R(P_{1},…,P_{n}) by \(\rho , \rho ^{\prime }\), etc. Returning to our earlier discussion, we typically use LTL as a language for expressing properties of runs: we will write ρ⊧ϕ to mean that run ρ satisfies temporal formula ϕ.
Before proceeding, we state a version of the conventional model checking problem for our setting:
Model Checking:
Given: System P_{1},…,P_{n}; temporal formula ϕ.
Question: Is it the case that \( \exists \vec {\sigma } \in {{\varSigma }}(P_{1}) \times {\cdots } \times {{\varSigma }}(P_{n}) : \rho (\vec {\sigma }) \models \phi ?\)
This decision problem amounts to asking whether ∃ρ ∈ R(P_{1},…,P_{n}) such that ρ⊧ϕ, that is, whether there is any possible computation of the system that satisfies ϕ, that is whether the system could in principle exhibit the behaviour ϕ.
Preferences
So far, we have said nothing about the idea that agents act rationally in pursuit of delegated preferences. We assume that agents have preferences over runs of the system. Thus, given two possible runs ρ_{1},ρ_{2} ∈ R(P_{1},…,P_{n}), it may be that P_{i} prefers ρ_{1} over ρ_{2}, or that it prefers ρ_{2} over ρ_{1}, or that it is indifferent between the two. We represent preferences by assigning to each player P_{i} a relation \({}\succeq _{i}{} \subseteq R(P_{1}, \ldots , P_{n}) \times R(P_{1}, \ldots , P_{n})\), requiring that this relation is complete, reflexive, and transitive. Thus ρ_{1} ≽_{i}ρ_{2} means that P_{i} prefers ρ_{1} at least as much as ρ_{2}. We denote the irreflexive subrelation of ≽_{i} by ≻_{i}, so ρ_{1} ≻_{i}ρ_{2} means that P_{i} strictly prefers ρ_{1} over ρ_{2}. Indifference (where we have both ρ_{1} ≽_{i}ρ_{2} and ρ_{2} ≽_{i}ρ_{1}) is denoted by \(\rho _{1} \sim _{i} \rho _{2}\). We refer to a structure M = (P_{1},…,P_{n},≽_{1},…,≽_{n}) as a multiagent system.
Alert readers will have noted that, if runs are infinite, then so are preference relations over such runs. This raises the issue of finite and succinct representations of preference relations over runs. Several approaches to this issue have been suggested. The most obvious is to assign each agent P_{i} a temporal logic formula γ_{i} representing its goal. The idea is that P_{i} prefers all runs that satisfy γ_{i} over all those that do not, is indifferent between all runs that satisfy γ_{i}, and is similarly indifferent between runs that do not satisfy γ_{i}. Formally, the preference relation ≽_{i} corresponding to a goal γ_{i} is defined as follows:
We discuss alternative (richer) preference models in Section 5.2.
Nash equilibrium
With this definition, we can now define the standard game theoretic concept of Nash equilibrium for our setting. Let M = (P_{1},…,P_{n},≽_{1},…,≽_{n}) be a multiagent system, and let \(\vec {\sigma } = (\sigma _{1}, \ldots , \sigma _{i}, \ldots , \sigma _{n})\) be a strategy profile. Then we say \(\vec {\sigma }\) is a Nash equilibrium of M if for all players P_{i} and for all strategies \(\sigma _{i}^{\prime } \in {{\varSigma }}(P_{i})\), we have:
Let NE(M) denote the set of all Nash equilibria of M. Of course many other solution concepts have been proposed in the game theory literature [61] – to keep things simple, in this paper we will restrict our attention to Nash equilibrium.
Equilibrium checking
We are now in a position to introduce equilibrium checking, and the associated key decision problems. The basic idea of equilibrium checking is that, instead of asking whether a given temporal formula ϕ is satisfied on some possible run of the system, we instead ask whether it is satisfied on some run induced by a Nash equilibrium strategy profile of the system. Informally, we can understand this as asking whether ϕ could be made true as the result of rational choices by agents within the system. This idea is captured in the following decision problem (see Fig. 2):
ENash: Given: Multiagent system M; temporal formula ϕ.Question: Is it the case that \(\exists \vec {\sigma } \in \mathit {NE}(M) : \rho (\vec {\sigma }) \models \phi ?\)
The obvious counterpart of this decision problem is ANash, which asks whether a temporal formula ϕ is satisfied on all Nash equilibrium outcomes.
ANash:Given: Multiagent system M; temporal formula ϕ.Question: Is it the case that \(\forall \vec {\sigma } \in \mathit {NE}(M) : \rho (\vec {\sigma }) \models \phi ?\)
A higherlevel question is simply whether a system has any Nash equilibria:
NonEmptiness: Given: Multiagent system M. Question: Is it the case that NE(M)≠∅?
A system without any Nash equilibria is inherently unstable: whatever collection of choices we might consider for the agents within it, some player would have preferred to make an alternative choice. Notice that an efficient algorithm for solving ENash would imply an efficient algorithm for NonEmptiness.
Finally, we might consider the question of verifying whether a given strategy profile represents a Nash equilibrium:
IsNE: Given: Multiagent system M, strategy profile \(\vec {\sigma }\) Question: Is it the case that \(\vec {\sigma } \in \mathit {NE}(M)\)?
Recall that mathematically strategies are functions that take as input the history of the system to date, and give as output a choice for the agent in question. Since the computations generated by multiagent systems will be infinitary objects, to study this decision problem we will need a finite representation for strategies. A common approach is to use finite state machines with outputs (e.g., Moore machines).
Iterated boolean games
A simple and elegant concrete computational model that we have found useful to explore questions surrounding rational verification is the framework of iterated Boolean Games (iBGs) [39]. In an iBG, each agent P_{i} is defined by associating it with a finite, nonempty set of Boolean variables Φ_{i}, and preferences for P_{i} are specified with an LTL formula γ_{i}. It is assumed that each propositional variable is associated with a single agent. The choices available to P_{i} at any given point in the game then represent the set of all possible assignments of truth or falsity to the variables under the control of P_{i}. An iBG is “played” over an infinite sequence of rounds; in each round every player independently selects a valuation for their variables, and the infinite run traced out in this way thus defines an LTL model, which will either satisfy or fail to satisfy each player’s goal. In iBGs, strategies are represented as finite state machines with output (Moore machines). This may seem like a limitation, but in fact it is not: in the setting of iBGs, finite state machine strategies are all that is required.
Let us now turn to the decision problems that we identified above, and consider their complexity in the iBG case. Before we state the complexity of these problems, it is worth recalling a special case of iBGs, which was first studied in the 1980s by Pnueli and Rosner [67]. An LTL synthesis problem is a setting defined by two players, denoted E and A, two disjoint sets of propositional variables, Φ_{E} and Φ_{A}, and an LTL formula ϕ_{E} defined over the variables Φ_{E} ∪Φ_{A}. The setting is interpreted as a game in the following way: the play continues for an infinite sequence of rounds, where in each round the players simultaneously choose a valuation for their respective variable set. In this way, the play traces out a word in \(({{\varPhi }}_{E} \cup {{\varPhi }}_{A})^{\omega }\), and this word can be understood as an LTL valuation. Player E wins if this valuation satisfies ϕ_{E}, and loses otherwise. The LTL synthesis problem is then as follows:
LTL Synthesis:Given: Variables Φ_{E} and Φ_{A}, and LTL formula ϕ_{E}.Question: Can E force a win in the game induced by Φ_{E},Φ_{A},ϕ_{E}? That is, does there exists a strategy σ_{E} for E such that for all strategies σ_{A} for A, we have ρ(σ_{E},σ_{A})⊧ϕ_{E}?
The LTL synthesis problem was introduced to study the problem of software settings in which we want to know whether a particular software component (represented by E in this case) can ensure that an overall system objective (ϕ_{E}) is satisfied in the presence of arbitrary, or adversarial input from the software environment (A). In gametheoretic terms, LTL synthesis is a twoplayer, strictly competitive winlose game, and it can be seen as a special case of iBGs: we can model LTL synthesis in an iBG by assigning player E the goal ϕ_{E} and A the goal ¬ϕ_{E}. Now, the central result proved by Pnueli and Rosner was this:
Theorem 1
[67] The LTL synthesis problem is 2EXPTIMEcomplete.
Observe that this is an extremely negative result, considerably worse than (for example) the PSPACEcomplete LTL model checking problem [74]. The high complexity derives from the fact that the LTL synthesis problem requires quantifying over strategies for satisfying LTL formulae: checking Nash equilibrium properties of iBGs requires similar quantification, and it should therefore come as no surprise that iBGs inherit the high complexity of LTL synthesis.
Theorem 2
[39] For iBGs, IsNE is PSPACEcomplete (and hence no easier or harder than model checking or satisfiability for LTL). In contrast, NonEmptiness, ENash, and ANash are all 2EXPTIMEcomplete.
It is not hard to see the close relationship between these problems and LTL synthesis. For example, we can immediately see that ANash is 2EXPTIME hard from the following reduction: given an instance (Φ_{E},Φ_{A},ϕ_{E}) of LTL synthesis, construct an iBG with players {E,A}, and propositional control sets as in the LTL synthesis instance, with goals for the players being ϕ_{E} and ¬ϕ_{E} respectively. Then ask whether ϕ_{E} is satisfied on all Nash equilibrium runs of the game. It is straightforward to see that E has a winning strategy for ϕ_{E} if and only if ϕ_{E} is satisfied on all Nash equilibrium computations.
Although it may seem rather abstract, the iBG framework is quite general, and more widely applicable than it might at first appear. For example, frameworks in which agent programs P_{i} can be axiomatised in LTL can be expressed in iBGs – see [37] for details.
One fascinating aspect of the development of the theory for iBGs is that, when understanding the equilibrium properties of iBGs, we can make use of the Nash folk theorems – classic results in game theory which relate to the equilibrium properties that can be sustained in iterated games [61]. It is remarkable that a proof technique developed in the 1950s to study an abstract class of games turns out to be directly applicable to the verification of AI systems 70 years later: see [39] for details.
Concurrent game structures
Concurrent Game Structures are a widelyused model for concurrent and multiagent systems [5]. In this model, say M, typically presented in its deterministic form, there are N players who, at each state s, make an independent choice a_{i}, with i ∈ N, which jointly define an action profile a = (a_{1},…,a_{N}) that uniquely determines the next state \(s^{\prime }\), that is, a unique transition \((s,\textbf {a},s^{\prime })\) in M. Formally, a Concurrent Game Structure is given by a tuple:
where, N and S are finite, nonempty sets of agents and system states, respectively, where s^{0} ∈ S is an initial state; A_{i} is a set of actions available to agent i, for each i; and δ : S × A_{1} ×⋯ × A_{N}→ S is a transition function.
Concurrent games are played as follows. The game begins in state s^{0}, and each player i ∈ N simultaneously picks an action \({a_{i}^{0}} \in A_{i}\). The game then transitions to a new state, \(s^{1} = \delta (s^{0}, {a_{1}^{0}}, \ldots , a_{N}^{0})\), and this process repeats. Thus, the n^{th} state transitioned to is \(s^{n} = \delta (s^{n1}, a_{1}^{n1}, \ldots , a_{N}^{n1})\). Since the transition function is deterministic, a play of a game will be an infinite sequence of states, denoted by π. Such a sequence of states is called a run.
Thus, to play a game, agents use strategies, which are formally defined as functions from sequences of states to next states. Because Concurrent Game Structures are deterministic, a profile of strategies for all agents f = (f_{1},…,f_{N}) determines a unique run in M, denoted by π(f). Assuming that agents have a preference relation ≽_{i}, with i ∈ N, over the set of runs in M, one can immediately define further gametheoretic concepts, such as the stable outcomes, runs, or profiles of a game. For instance, in case of Nash equilibrium, we say that a strategy profile f = (f_{1},…,f_{N}) is a Nash equilibrium if, for each agent i and every strategy \(f^{\prime }_{i}\) of i we have:
that is, agent i does not prefer the run induced by \((f_{1},\ldots ,f^{\prime }_{i},\ldots ,f_{N})\) over the run induced by \(\vec {f} = (f_{1},\ldots ,f_{i},\ldots ,f_{N})\), which we call a Nash equilibrium run.
Reactive module games
While concurrent games provide a natural semantic framework for multiagent systems, they are not directly appropriate as a modelling framework to be used by people. For this, the framework of Reactive Module Games is more suitable [41]. Within this framework, concurrent games are modelled using the Simple Reactive Modules Language (SRML) [78], a simplified version of the Reactive Modules language that is widely used within the model checking community [3].
The basic idea is that each system component (agent/player) in SRML is represented as a module, which consists of an interface that defines the name of the module and lists a nonempty set of Boolean variables controlled by the module, and a set of guarded commands, which define the choices available to the module at each state. There are two kinds of guarded commands: init, used for initialising the variables, and update, used for updating variables subsequently.
A guarded command has two parts: a “condition” part (the “guard”) and an “action” part. The “guard” determines whether a guarded command can be executed or not given the current state, while the “action” part defines how to update the value of (some of) the variables controlled by a corresponding module. Intuitively, φ ⇝ α can be read as “if the condition φ is satisfied, then one of the choices available to the module is to execute α”. Note that the value of φ being true does not guarantee the execution of α, but only that it is enabled for execution, and thus may be chosen. If no guarded command of a module is enabled in some state, then that module has no choice and the values of the variables controlled by it remain unchanged in the next state. More formally, a guarded command g over a set of variables Φ is an expression
where the guard φ is a propositional logic formula over Φ, each x_{i} is a member of Φ and ψ_{i} is a propositional logic formula over Φ. It is required that no variable x_{i} appears on the left hand side of more than one assignment statements in the same guarded command, hence no issue on the (potentially) conflicting updates arises.
Here is a concrete example of a guarded command:
The guard is the propositional logic formula (p ∧ q), so this guarded command will be enabled if both p and q are true. If the guarded command is chosen (to be executed), then in the next timestep, variable p will be assigned ⊤ and variable q will be assigned ⊥.
Formally, an SRML module m_{i} is defined as a triple m_{i} = (Φ_{i},I_{i},U_{i}), where \({{\varPhi }}_{i} \subseteq {{\varPhi }}\) is the finite set of Boolean variables controlled by m_{i}, I_{i} a finite set of init guarded commands, and U_{i} a finite set of update guarded commands. As in iBGs, it is required that variables are controlled by exactly one agent.
Figure 3 shows a module named toggle that controls a single Boolean variable, named x. There are two init guarded commands and two update guarded commands. The init guarded commands define two choices for the initialisation of variable x: true or false. The first update guarded command says that if x has the value of true, then the corresponding choice is to assign it to false, while the second command says that if x has the value of false, then it can be assigned to true. Intuitively, the module would choose (in a nondeterministic manner) an initial value for x, and then on subsequent rounds toggles this value. In this particular example, the init commands are nondeterministic, while the update commands are deterministic. We refer to [41] for further details on the semantics of SRML. In particular, in Figure 12 of [41], we detail how to build a Kripke structure that models the behaviour of an SRML system.
Module definitions allow us to represent the possible actions of individual agents, and the effects of their actions, but do not represent preferences. In RMGs, preferences are captured by associating each module with a goal, which is specified as a temporal logic formula. Given this, a reactive module game is given by a structure G = (N,m_{1},…,m_{n},γ_{1},…,γ_{n}), where N = {1,…,n} is the set of agents, m_{i} is the module defining the choices available to agent i, as explained above, and γ_{i} is the goal of player i. In [41], two possibilities were considered for the language of goals γ_{i}: LTL and CTL. In the case of LTL, strategies σ_{i} for individual players are essentially the same as in iBGs: deterministic finite state machines with output. At each round of the game, a strategy σ_{i} chooses one of the enabled guarded commands to be executed. Because all strategies are deterministic, upon execution the collective strategies of all players will trace out a unique run, which will either satisfy or not satisfy each player’s goal, as in the case of iBGs. In the case of CTL, however, strategies are nondeterministic: instead of selecting a single guarded command for execution, a strategy selects a set of guarded commands. The result of executing such strategies yields a tree structure, which will then either satisfy or fail to satisfy the CTL goals of players.
When it comes to the complexity of decision problems relating to RMGs, we find the following:
Theorem 3
[41]

For LTL RMGs, IsNE is PSPACEcomplete, while ENash and ANash are both 2EXPTIMEcomplete.

For CTL RMGs, IsNE is EXPTIMEcomplete, while ENash and ANash are both 2EXPTIMEhard.
The key conclusion relating to these results is that, despite the naturalness and expressive power of RMGs, computationally they are no more complex than iBGs. The high complexity of the key decision problems relating to RMGs indicates that naive algorithms to solve them will be hopelessly impractical: specialised techniques are required. In Section 4.1, we will describe such techniques, and a system implemented based upon them.
Markov games
Markov Games, also known as Concurrent Stochastic Games (sometimes simply Stochastic Games), are a popular representation of (simultaneous) multiagent decisionmaking scenarios with stochastic dynamics. In this latter respect they differ from Concurrent Game Structures, as discussed above, in which environments are assumed to be deterministic. They naturally generalise both Markov Decision Processes (a Markov Game with one player) and iterated NormalForm Games (a Markov Game with one state). Such games proceed at each timestep, from a state s, by each agent P_{i} using their strategy σ_{i} to select an action a_{i}, leading to a joint action a = (a_{1},…,a_{n}). The next state \(s^{\prime }\) is then drawn from the conditional probability distribution given by a Markovian transition function \(T(s^{\prime } ~\vert ~ s, \textbf {a})\). The strategy profile \(\vec {\sigma }\) and the transition dynamics thus define a Markov Chain over the states S of the game, leading to a distribution \(\Pr _{\vec {\sigma }}(\rho )\) over runs ρ = s_{0}s_{1}s_{2}… through the state space.
On top of this underlying game structure one may then define different forms of objective for each of the agents. Common examples include the expected cumulative discounted reward:
and the expected meanpayoff reward:
Here, β ∈ [0,1) is a discount factor, \(r^{i}_{t+1} \in \mathbb {R}\) is the reward given to agent i at time t + 1, and I(s) is an initial state distribution. Alternatively, for any set of runs \(R^{\prime } \subseteq R(P_{1},\ldots ,P_{n})\) we may define an indicator random variable \(X_{R^{\prime }}\) such that \(X_{R^{\prime }}(\rho ) = 1\) if \(\rho \in R^{\prime }\) and \(X_{R^{\prime }}(\rho ) = 0\) otherwise. A player’s reward can then be defined as the expected value \(\mathbb {E}_{\vec {\sigma }} [X_{R^{\prime }}]\) of this variable. For example, we could consider the probability of satisfying a temporal logic formula γ_{i} by defining \(R^{\prime }\) as containing all and only those runs ρ such that ρ⊧γ_{i}.
The introduction of stochastic dynamics also introduces different ‘ways of winning’ when we have Boolean objectives that are either satisfied or not by a particular path [29]. For example, a player may win by satisfying their goal γ_{i} surely (with certainty), almost surely (with probability one), limit surely (with probability greater than 1 − ε for every ε > 0), boundedly (with probability bounded away from one), positively (with positive probability), or existentially (possibly). Aside from these qualitative conditions, players may be interested in simply maximising the probability that their goal γ_{i} is achieved. Such a perspective can also be carried over to the problem of rational verification, in which we may be interested in the sure, almost sure, or limit sure satisfaction of a property ϕ, or simply in the probability that ϕ is satisfied.
Tools
While synthesis problems (such as the LTL synthesis problem, introduced by Pnueli and Rosner and discussed above) have been increasingly studied within the verification community, rational verification has come to prominence only in the past few years. As such, relatively few software tools exist for this problem. Below, we briefly survey some of the most widely used.
EVE: the equilibrum verification environment
As we noted above, the high complexity of rational verification for RMGs (see above) indicates that naive algorithms for this purpose will be doomed to failure, even for systems of moderate size. It follows that any practical system will require sophisticated algorithmic techniques. The Equilibrium Verification Environment (Eve) is a system based on such techniques [45, 47].
The basic approach embodied by Eve involves reducing rational verification to a collection of parity games [32], which are widely used for synthesis and verification problems. A parity game is a twoplayer zerosum turnbased game given by a labelled finite graph H = (V_{0},V_{1},E,α) such that V = V_{0} ∪ V_{1} is a set of states partitioned into Player 0 (V_{0}) and Player 1 (V_{1}) states, respectively, \(E\subseteq V\times V\) is a set of edges/transitions, and \(\alpha : V \to \mathbb {N}\) is a labelling priority function. Player 0 wins if the smallest priority that occurs infinitely often in the infinite play is even. Otherwise, player 1 wins. It is known that solving a parity game (checking which player has a winning strategy) is in NP∩coNP [51], and can be solved in quasipolynomial time [17].^{Footnote 1}
The algorithm underpinning Eve uses parity games in the following way. It takes as input an RMG M and builds a parity game H whose sets of states and transitions are doubly exponential in the size of the input but with priority function only exponential in the size of the input game. Using a deterministic Streett automaton on infinite words (DSW) [52], we then solve the parity game, leading to a decision procedure that is, overall, in 2EXPTIME, and, therefore, given the hardness results we mentioned above, essentially optimal. The Eve system can: (i) solve the ENash and ANash problems for the given RMG; and (ii) synthesise individual player strategies in the game.
Experimental results show that Eve performs favourably compared with other existing tools that support rational verification.
PRISMgames
A separate though closely related thread of research into the verification of multiagent systems has emerged from the probabilistic modelchecking community. The most prominent example of this in recent years is the expansion of PRISM [54], a popular tool for probabilistic modelchecking, to handle first TurnBased [11] and now Concurrent Stochastic Games (Markov Games) [55, 56]. Earlier work was limited to noncooperative turnbased or zerosum concurrent settings. Later efforts considering cooperative, concurrent games were initially restricted to those with only two coalitions, but this restriction has been partially lifted in the most recent instantiation of the work, which supports modelchecking of arbitrary numbers of coalitions in the special case of stopping games – those in which eventually, with probability one, the outcome of each player’s objective becomes fixed [56]. We note further that the current version of the tool also supports the use of Probabilistic Timed Automata in verifying TurnBased Markov Games with realvalued clocks [57].
In PRISMgames, specifications are expressed in rPATL, probabilistic ATL (a generalisation of CTL that uses an extra quantifier 〈〈A〉〉ϕ for reasoning about properties ϕ that that be ensured by some subset A of the agents [5]) with rewards [25]. The logic is then further extended in order to be able to reason about equilibria in the game (in particular, subgameperfect socialwelfare optimal Nash equilibria). For example, this allows one to answer not only queries such as \(\langle \langle {P_{1}}\rangle \rangle _{max \geq 0.5}(\Pr [ \psi ])\) – is it the case that P_{1} can ensure that ψ holds with at least probability a half? – but also queries such as \(\langle \langle {P_{1}:P_{2}}\rangle \rangle _{max\geq 2}(\Pr [ \psi ] + \Pr [ \chi ])\) – is it the case that P_{1} and P_{2} can coordinate to ensure that both of their respective goals, ψ and χ, hold with probability one? – where ψ and χ are LTL formulae and similarly for expected rewards. More information can be found in [56]. An alternative specification formalism that can express equilibria concepts is Probabilistic Strategy Logic [8], but it has no associated implementation.
From a technical standpoint, PRISMgames also makes use of the Reactive Modules language with individual players represented by a set of modules which may then choose an enabled command at each timestep. On top of this, users can include reward structures that produce quantitative rewards given a state and joint action as input, and define temporal logic properties expressed in the (extended version of) rPATL. For zerosum properties PRISMgames relies on using value iteration to approximate values for all states of the game, and then solves a linear program for each state in order to compute a minimax strategy. For equilibriabased properties, a combination of backwards induction and value iteration are used, which is exact for finitehorizon and approximate for infinitehorizon properties, together with a subprocedure for computing optimal Nash equilibria in nplayer NormalForm Games that makes use of SMT and nonlinear optimisation engines.
MCMAS
MCMAS [58] adopts interpreted systems [33] as the formal language to represent systems comprised of multiple entities. In MCMAS, interpreted systems are extended to incorporate gametheoretic notions such as those provided by ATL modalities [59]. The formalisation used to model systems in MCMAS can be thought of as a “bottomup” approach, where the global state is defined as a tuple of the local states of the agents. In this setting, global states are given as the composition of local states of the agents and environment. MCMAS uses a dedicated programming language called Interpreted Systems Programming Language (ISPL) to describe the specification of Interpreted Systems.
There are different extensions of MCMAS that handle different specification logics. However, one particular extension that supports a specification language expressive enough to reason about Nash equilibrium is MCMASSLK [19]. The tool’s specification language is Strategy Logic with Knowledge (SLK) [18], an extension of Strategy Logic (SL) [24, 62]. Due to the undecidability of the modelchecking problem of multiagent systems under perfect recall and incomplete information [4], the tool adopts imperfect recall semantics.
The NonEmptiness problem can be solved using MCMAS by specifying the existence of Nash equilibrium with SLK. Let \( N = \{1,\dots ,n\} \) be the set of players in a game, V ar be the set of strategy variables, and Γ be the set of goals of players in the game. Using SLK, we can express the existence of Nash equilibrium with the formula φ_{NE}:
where i ∈ N,x_{i},y_{i} ∈ V ar, and γ_{i} ∈Γ.
Intuitively, formula φ_{NE} can be explained as follows: for each player i with its chosen strategy x_{i} in the game, if the goal of player i cannot be achieved using strategy x_{i} then for every “alternative” strategy y_{i}, the goal of player i cannot be achieved. This means that, players who do not get their goals achieved cannot benefit from unilaterally changing their strategies. Thus, if φ_{NE} is true, then there exists a Nash equilibrium in the given game. The other problems of rational verification, namely ENash and ANash, can be reduced to NonEmptiness [37].
Challenges
In this section, we provide a brief discussion of some current and future research challenges for rational verification.
Tackling complexity
Perhaps the most obvious challenge in making rational verification an industrialstrength reality is that of the high computational complexity of the basic decision problems. Whilst LTL formulae are expressive and natural [79], and moreover, widely used in industry [21, 26, 70, 71], the 2EXPTIMEcompleteness results leave our problems grossly intractable. As such, it is important for us to consider other languages which strike a balance of complexity and expressiveness  how can we capture the richness of multiagent systems, whilst still being able to reason about them effectively?
Perhaps the most obvious thing to try is to consider fragments of LTL. Various restrictions of LTL are very wellstudied [7, 75] and the decision problems relating to them are much more computationally amenable. In [39], the authors consider games where all the players have propositional safety goals – that is, LTL goals of the form Gφ, where φ is some propositional formula. In this setting, the ENash problem is PSPACEcomplete. Additionally, in [46], the authors consider GR(1) [12] goals and specifications. Here, the ENash problem is PSPACEcomplete with GR(1) goals and LTL specifications, and lies in FPT (fixed parameter tractable) [30] when both the goals and the specifications are in GR(1).
In addition to considering restricted languages for goals and temporal queries, a number of other directions suggest themselves as possible ways in which to reduce complexity, although we have no concrete results with these directions at this time. The first possibility is to consider ways in which games can be decomposed into smaller games, while preserving the relevant gametheoretic properties. Similar techniques have been studied within the model checking community (see, e.g., [6]). Another possibility, also inspired by work within model checking, is to consider abstracting games to their smallest bisimulationequivalent form. Care must be taken in this case, however, because we need to ensure that the precise form of bisimulation to be used must preserve Nash equilibria across bisimulationequivalent models, and naive attempts to define bisimulation, which preserve temporal logic properties under model checking, do not necessarily preserve Nash equilibria – we refer the interested reader to [40] for details.
Alternative preference models
What if we were to set aside temporal logics and consider different preference relations altogether? Staying in the qualitative mindset, in [13], the authors consider games where the players have ωregular objectives and look at the NonEmptiness problem, and obtained complexity results ranging from Pcompleteness all the way up to EXPTIME membership. Alternatively, one can adopt a quantitative approach and consider meanpayoff objectives – one can ask if there exists some Nash equilibrium where each player’s payoff lies within a certain interval. As established in [76], this problem is NPcomplete.
In order to be able to reason about games in a richer fashion, we can use quantitative and qualitative constructs in the same breath. If we look at games where the players’ preferences are given by meanpayoff objectives, and we ask if there exists a Nash equilibrium which models an LTL specification, this problem is PSPACEcomplete. Moreover, if we restrict our attention to GR(1) specifications, then we retain the NPcompleteness result of the original meanpayoff NonEmptiness problem. However, balancing qualitative and quantitative goals and specifications is not always as straightforward as this. For instance, in twoplayer, zerosum, meanpayoff parity games [23], where the first player gets their meanpayoff if some parity condition is satisfied, and \(\infty \) otherwise, this same player may require infinite memory to act optimally. Thus, given the standard translation from nondeterministic Büchi automata to deterministic parity automata [65], this does not bode well for games with combined meanpayoff and LTL objectives  many of the techniques in rational verification depend on the existence of memoryless or finitememory strategies in the corresponding twoplayer, zerosum version of the game. Despite this, [43, 44] look at games with lexicographic preferences, where the first component is either a Büchi condition or an LTL formula, and the second component is some meanpayoff objective. Rather than considering the standard NonEmptiness problem, they study a closely related analogue – the problem of whether or not there exists some finitestate, strict 𝜖Nash Equilibrium. These additional restrictions are brought about precisely due to the necessity of infinite memory in meanpayoff parity games, as mentioned above. When the first component is a Büchi condition, then the given decision problem is NPcomplete, and in the LTL setting, it is 2EXPTIMEcomplete. Thus, despite the relaxation of the solution concept, we sadly do not see any gains in computational tractability.
Finally, some work has been to introduce nondichotomous, qualitative preferences to rational verification. In [53], the authors introduce Objective LTL (OLTL) as a goal and specification format. An OLTL formula is simply a tuple of LTL formulae, along with a function which maps binary tuples of the same length to integers. In a given execution of a game, some LTL formulae will be satisfied and others will not. Marking the ones that are satisfied with 1, and the ones which are not by 0, we can pass the resulting tuple into the given function and get an integer – each agent in the game wants to maximise this integer. With this preference model, we can look at games where there is a set of agents, plus a system player, and ask if there exists some strategy for the system player, along with a Nash equilibrium for the remaining players such that the system player’s payoff is above a certain threshold. This problem is no harder than the original rational synthesis problem for LTL [36], being 2EXPTIMEcomplete. Building on this, in [2], the authors study rational verification with \(\text {LTL}[{\mathscr{F}}]\) [1] goals and specifications. In short, \(\text {LTL}[{\mathscr{F}}]\) generalises LTL by replacing the classical Boolean operators with arbitrary functions which map binary tuples into the interval [0,1]. Again, the associated decision problem remains 2EXPTIMEcomplete.
Uncertain environments
Thus far, the investigation into rational verification has focused largely on settings that are deterministic, discrete, fully observable, and fully known. Indeed this is sufficient for modelling a great many scenarios of interest, such as software processes or highlevel representations of multiagent control. Most of the real world, however, cannot be captured quite as neatly. This motivates the study of rational verification in uncertain environments, where this uncertainty might arise from stochastic dynamics, continuous or hybrid state and action spaces, or a structure that is only partially observable or partially known. Each of these features (and, moreover, their combination) represents an exciting direction for future work, the challenges of which we briefly outline here.
Perhaps the most natural and wellstudied form of uncertainty in formal verification is of systems with stochastic dynamics. As noted above in Section 4.2, probabilistic modelchecking techniques have recently been extended to the multiagent setting by way of tools such as PRISMgames [57]. Recent work on rational verification in Markov Games with goals defined by the almost sure or positive satisfaction of LTL properties has shown that the complexity classes of the main problems in both noncooperative and cooperative rational verification remain essentially the same as in the nonstochastic setting: 2EXPTIMEcomplete [38]. Further results for other qualitative modes of winning (as well as for the quantitative case) are still to be obtained, however, there remain many other interesting open problems relating to ωregular objectives in Markov Games [22].
In some situations especially when considering cyberphysical systems, it is more appropriate to model the state space (and possibly the action space) as continuous or as hybrid – with some discrete and some continuous elements. Whilst not in itself necessarily introducing uncertainty, such representations bring challenges related to the concise encoding of system dynamics agents’ strategies over uncountable sets, and the careful definition of temporal logic formulae over paths through the state space. As well as modelling state or action spaces as continuous, one may also choose to represent time as being continuous, requiring new logics in which to encode specifications, such as ContinuousTime Stochastic Logic (CSL) [10] or Signal Temporal Logic (STL) [60].
When making a realworld decision in order to achieve a goal, it is rare to be able to observe all of the information relevant to that decision and goal. This intuition can be captured by models in which state space is only partially observable by the agents therein; in gametheoretic terms the agents have imperfect information. For example, Reactive Module Games in which each player may only observe a subset of the environmental variables are undecidable with three or more players, although the twoplayer case is solvable in 2EXPTIME [48].
Related work has explored the problem of rational synthesis in turnbased games under imperfect information (which is undecidable with three or more players and EXPTIMEcomplete for two players) [34], though the effects of partial observability on the rational verification problem remain underexplored.
Finally, there are scenarios in which larger portions of an environment are unknown, such as the transition dynamics, not only to the agents but also to those who wish to verify it. Here, traditional modelchecking approaches do not apply and some form of learning must be introduced. As a result, different forms of guarantees about such systems are obtained, and relying on assumptions about the structure of the environment and the theoretical characteristics of the learning algorithms used. Verification methods that employ learning have recently been developed by those in both the modelchecking community [16], the control, and learning community [50], though few have considered the multiagent settings with more than two players and those that do restrict their attention to purely cooperative games [49]. A further complication is raised when agents themselves employ learning in unknown environments in order to update their strategies over time. With the continuing advance of machine learning, this is likely to become an increasingly common occurrence that requires new techniques for rational verification.
Cooperative solution concepts
Rational verification was first defined for noncooperative games [39, 41, 83]: players were assumed to act alone, and binding agreements between players were assumed to be impossible. As such, the solution concepts used in previous studies have therefore been noncooperative – primarily Nash equilibrium and refinements thereof.
However, in many reallife situations, these assumptions misrepresent reality. In order to address this issue, in [42], such the noncooperative setting for rational verification was extended to include cooperative solution concepts [61, 64]. It was assumed that there is some (exogenous) mechanism through which agents in a system can reach binding agreements and form coalitions in order to collectively achieve goals. The possibility of binding cooperation and coalition formation eliminates some undesirable equilibria that arise in the noncooperative setting, and makes available a range of outcomes (i.e., computations of the system that can be sustained in equilibrium) which cannot be achieved without cooperation.
In this new cooperative setting, the focus was on the core, arguably one of the most relevant solution concepts in the cooperative game theory literature. The basic idea behind the core is that a game outcome is said to be corestable if no subset of agents could benefit by collectively deviating from it; the core of a game is the set of corestable outcomes. Now, in conventional cooperative games (characteristic function games with transferable utility [20]), this intuition can be given a simple and natural formal definition, and as a consequence the core is probably the most widelystudied solution concept for cooperative games. However, the conventional definition of the core does not easily map into the rational verification framework as originally defined, mainly because coalitions are subject to externalities: whether or not a coalition has a beneficial deviation depends not just on the makeup of that coalition, but also on the behaviour of the remaining agents in the system.
Coalition formation with externalities has been extensively studied in the cooperative game theory literature [35, 77, 84], where different variants of the core can be found. For instance, the αcore takes the pessimistic approach that requires that all members of a deviating coalition will benefit from the deviation regardless of the behaviour of the other coalitions that may be formed. Our main definition of the core precisely follows this approach. Even though coalition formation with externalities is common in and important for multiagent systems [72], not much work has been done regarding the problem of stability, and its properties, in multiagent coalition formation with externalities. Instead, in AI and multiagent systems, most research has focused on the structure formation problem itself [68]. Through our work on rational verification, we also address this gap in the literature of verification for AI systems.
The kinds of questions that are asked in the (rational verification) cooperative setting are exactly the same as in the noncooperative framework, only that instead of (variants of) Nash equilibrium one refers to outcomes in the core of gametheoretic representations of multiagent systems. Such questions, e.g., ECore, ACore, etc., bearing the same meaning as their “Nash” counterparts, are all 2EXPTIMEcomplete [42] for games with LTL goals, but have some computationally desirable properties: the set of outcomes in the core is never empty, is bisimulation invariant [40], and has an elegant formalisation in ATL^{∗} [5], which makes the automated solution of cooperative rational verification problems possible in practice using verification tools for multiagent systems analysis, such as MCMAS or EVE, described before.
Rational verification of humanagent systems
In the present paper, we have focused exclusively on the verification of multiagent systems in which the agents in question are software agents. In practice, of course, many (arguably most) systems of interest include multiple software and human participants. Might the techniques surveyed here be suitable for verifying such systems?
One approach might be to model human choices and preferences using one of the frameworks described above, and then directly apply the techniques we have sketched out. However, this approach presents many natural challenges. The most obvious of these is that the techniques we have described are derived from concepts in game theory and decision theory, and in particular, they make a raft of assumptions about agents in the system. The most problematic of these is that agents are assumed to be perfectly rational (utility maximisers): they will act optimally in the furtherance of their preferences. Human decisionmakers do not act in this way: game and decisiontheoretic models capture idealised rational actors. The field of behavioural economics seeks to understand the modes of decisionmaking that humans actually use, and if we are to verify humanagent systems, then we will need to accomodate behavioural decisionmaking models in our systems. At present we are aware of no work that seeks to do this.
Conclusions
Rational verification is a recent approach to the automated verification of multiagent systems. In which we aim to automatically determine whether given properties of a system, expressed as temporal logic formulae will hold in that system under the assumption that system components (agent) behave rationally, by choosing (for example) strategies that form a gametheoretic equilibrium. Rational verification can be understood as a counterpart to the conventional model checking paradigm for automated verification. Although research in this area is at an early stage, the basic computational, logical, and algorithmic territory relating to rational verification has already been explored, and is described in the present article. An overarching goal for the future will be to make tools more practically applicable, and to understand the fundamental limitations of the paradigm. We have sketched out some of the key challenges that must be overcome to make this a reality: chief among them being dealing with complexity, broader preference models, richer modelling frameworks, and a wider range of gametheoretic solution concepts.
Notes
 1.
Despite more than 30 years of research, and promising practical performance for algorithms to solve them, it remains unknown whether parity games can be solved in polynomial time.
References
 1.
Almagor S, Boker U, Kupferman O (2013) Formalizing and reasoning about quality. In: Fomin FV, Freivalds R, Kwiatkowska MZ, Peleg D (eds) Automata, Languages, and Programming  40^{th} International Colloquium, ICALP 2013, Proceedings, Part II, volume 7966 of Lecture Notes in Computer Science. Springer, Riga, pp 15–27
 2.
Almagor S, Kupferman O, Perelli G (2018) Synthesis of controllable nash equilibria in quantitative objective games. In: Proceedings of the 27^{th} International Joint Conference on Artificial Intelligence, IJCAI’18. AAAI Press, pp 35–41
 3.
Alur R, Henzinger TA (1999) Reactive modules. Formal Methods Syst Des 15(11):7–48
 4.
Alur R, Henzinger TA, Kupferman O (1997) Alternatingtime temporal logic. In: Proceedings of the 38th IEEE Symposium on Foundations of Computer Science, Florida, pp 100–109
 5.
Alur R, Henzinger TA, Kupferman O (2002) Alternatingtime temporal logic. J ACM 49 (5):672–713
 6.
Alur R, Henzinger TA, Kupferman O, Vardi MY (1998) Alternating refinement relations. In: Proceedings of the 9th International Conference on Concurrency Theory (CONCUR’98), volume 1466 of Lecture Notes in Computer Science. Springer, Berlin, pp 163–178
 7.
Alur R, Torre SL (2004) Deterministic generators and games for ltl fragments. ACM Trans Comput Log (TOCL) 5(1):1–25
 8.
Aminof B, Kwiatkowska M, Maubert B, Murano A, Rubin S (2019) Probabilistic strategy logic. In: In Proc. International Joint Conference on Artificial Intelligence (IJCAI19)
 9.
Baier C, Katoen JP (2008) Principles Of model checking. The MIT Press, Cambridge
 10.
Baier C, Haverkort B, Hermanns H, Katoen JP (2000) Model checking continuoustime markov chains by transient analysis. In: Computer Aided Verification. Springer, Berlin, pp 358–372
 11.
Basset N, Kwiatkowska M, Topcu U, Wiltsche C (2015) Strategy synthesis for stochastic games with multiple longrun objectives. In: Tools and Algorithms for the Construction and Analysis of Systems. Berlin, Springer, pp 256–271
 12.
Bloem R, Jobstmann B, Piterman N, Pnueli A, Sa’ar Y (2012) Synthesis of Reactive (1) designs. J Comput Syst Sci 78(3):911–938
 13.
Bouyer P, Brenguier R, Markey N, Ummels M (2015) Pure Nash equilibria in concurrent deterministic games. Logical Methods in Computer Science
 14.
Boyer R. S., Moore J. S. (eds) (1981) The Correctness Problem in Computer Science. The Academic Press, London
 15.
Brafman R, Domshlak C (2013) On the complexity of planning for agent teams and its implications for single agent planning. Artif Intell 198:52–71
 16.
Brázdil T, Chatterjee K, Chmelík M, Forejt V, Křetínský J, Kwiatkowska M, Parker D, Ujma M (2014) Verification of markov decision processes using learning algorithms. In: Automated Technology for Verification and Analysis. Springer International Publishing, pp 98–114
 17.
Calude CS, Jain S, Khoussainov B, Li W, Stephan F (2017) Deciding parity games in quasipolynomial time. In: STOC. ACM, pp 252–263
 18.
Čermák P, Lomuscio A, Mogavero F, Murano A (2014) Mcmasslk: A model checker for the verification of strategy logic specifications. In: Biere A., Bloem R. (eds) Computer Aided Verification. Springer International Publishing, Cham, pp 525–532
 19.
Cermȧk P, Lomuscio A, Mogavero F, Murano (2018) Practical verification of multiagent systems against slk specifications. Inf Comput 261(Part):588–614
 20.
Chalkiadakis G, Elkind E, Wooldridge M (2011) Computational aspects of cooperative game theory. MorganClaypool
 21.
Chan TS, Gorton I (1996) Formal validation of a high performance error control protocol using spin. Softw Practice Exper 26(1):105–124
 22.
Chatterjee K, Henzinger TA (2012) A survey of stochastic ωregular games. J Comput Syst Sci 78(2):394–413
 23.
Chatterjee K, Henzinger TA, Jurdzinski M (2005) Meanpayoff parity games. In: 20^{th} Annual IEEE Symposium on Logic in Computer Science (LICS’05). IEEE, pp 178–187
 24.
Chatterjee K, Henzinger TA, Piterman N (2010) Strategy logic. Inf Comput 208(6):677–693. https://doi.org/10.1016/j.ic.2009.07.004
 25.
Chen T, Forejt V, Kwiatkowska M, Parker D (2013) Simaitis, A Automatic verification of competitive stochastic systems. Formal Methods Syst Des 43(1):61–92
 26.
Choi Y (2007) From NuSMV to SPIN: Experiences with model checking flight guidance systems. Formal Methods Syst Des 30(3):199–216
 27.
Clarke EM, Emerson EA (1981) Design and synthesis of synchronization skeletons using branching time temporal logic. In: Logics of Programs — Proceedings 1981 (LNCS Volume 131). Springer, Berlin, pp 52–71
 28.
Clarke EM, Grumberg O, Peled DA (2000) Model the Checking. MIT press, Cambridge
 29.
de Alfaro L, Henzinger TA (2000) Concurrent omegaregular games. In: Proceedings of the 15th Annual IEEE Symposium on Logic in Computer Science, LICS ’00. IEEE Computer Society, USA, pp 141
 30.
Downey RG, Fellows MR (1999) Parameterized complexity. Springer, New York
 31.
Emerson EA (1990) Temporal and modal logic. In: Handbook of Theoretical Computer Science Volume B: Formal Models and Semantics. Amsterdam, Elsevier Science Publishers B.V., pp 996–1072
 32.
Emerson EA, Jutla CS (1991) Tree automata, mucalculus and determinacy. In: FOCS. IEEE, pp 368–377
 33.
Fagin R, Halpern JY, Moses Y, M. Y. Vardi. (1995) Reasoning about Knowledge. The MIT press, Cambridge
 34.
Filiot E, Gentilini R, Raskin JF (2018) Rational synthesis under imperfect information. In: Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science. ACM
 35.
Finus M, Rundshagen B (2003) A noncooperative foundation of corestability in positive externality ntucoalition games. Nota Di Lavoro 31.2003 Economics Energy Environment
 36.
Fisman D, Kupferman O, Lustig Y (2010) Rational synthesis. In: TACAS, volume 6015 of LNCS. Springer, pp 190–204
 37.
Gao T, Gutierrez J, Wooldridge M (2017) Iterated boolean games for rational verification. In: Larson K, Winikoff M, Das S, Durfee EH (eds) Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS. ACM, Sȧo Paulo, pp 705–713
 38.
Gutierrez J, Hammond L, Lin A, Najib M, Wooldridge M (2021) Rational verification for probabilistic systems. In: Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning (KR21), Virtual. Forthcoming
 39.
Gutierrez J, Harrenstein P, Wooldridge M (2015) Iterated boolean games. Inf Comput 242:53–79
 40.
Gutierrez J, Harrenstein P, Perelli G, Wooldridge M (2019) Nash equilibrium and bisimulation invariance. Log. Methods Comput. Sci. 15(3)
 41.
Gutierrez J, Harrenstein P, Wooldridge M (2017) From model checking to equilibrium checking: Reactive modules for rational verification. Artif Intell 248:123–157
 42.
Gutierrez J, Kraus S, Wooldridge M (2019) Cooperative concurrent games. In: Elkind E, Veloso M, Agmon N, Taylor ME (eds) Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19. International Foundation for Autonomous Agents and Multiagent Systems, Montreal, pp 1198–1206
 43.
Gutierrez J, Murano A, Perelli G, Rubin S, Steeples T, Wooldridge M (2020) Equilibria for games with combined qualitative and quantitative objectives. Acta Informatica. Springer, pp 1–26
 44.
Gutierrez J, Murano A, Perelli G, Rubin S, Wooldridge M (2017) Nash equilibria in concurrent games with lexicographic preferences. Association for the Advancement of Artificial Intelligence
 45.
Gutierrez J, Najib M, Perelli G, Wooldridge M (2018) Eve: A tool for temporal equilibrium analysis. In: ATVA, Vol 11138 of LNCS. Springer, Cham, pp 551–557
 46.
Gutierrez J, Najib M, Perelli G, Wooldridge M (2019) On computational tractability for rational verification. In: Proceedings of the TwentyEighth International Joint Conference on Artificial Intelligence, IJCAI19, pp 329–335, DOI https://doi.org/10.24963/ijcai.2019/47, (to appear in print)
 47.
Gutierrez J, Najib M, Perelli G, Wooldridge M (2020) Automated temporal equilibrium analysis: Verification and synthesis of multiplayer games. Artif Intell 287:103353
 48.
Gutierrez J, Perelli G, Wooldridge M (2016) Imperfect information in reactive modules games. Inf Comput 261:650–675. 2018 4th International Workshop on Strategic Reasoning (SR
 49.
Hammond L, Abate A, Gutierrez J, Wooldridge M (2021) Multiagent reinforcement learning with temporal logic specifications. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’21. International Foundation for Autonomous Agents and Multiagent Systems. Forthcoming
 50.
Hasanbeig M, Abate A, Kroening D (2018) Logicallyconstrained reinforcement learning. arXiv:1801.08099
 51.
Jurdzinski M (1998) Deciding the winner in parity games is in UP ∩ coup. Inf Process Lett 68(3):119–124
 52.
Kupferman O (2018) Automata theory and model checking. In: Handbook of Model Checking. Springer International Publishing, pp 107–151
 53.
Kupferman O, Perelli G, Vardi MY (2016) Synthesis with rational environments. Ann. Math. Artif Intell. 78(1):3–20
 54.
Kwiatkowska M, Norman G, Parker D (2011) PRISM 4.0: Verification of probabilistic realtime systems. In: Gopalakrishnan G, Qadeer S (eds) Proceedings of 23rd International Conference on Computer Aided Verification (CAV’11), volume 6806 of LNCS. Springer, pp 585–591
 55.
Kwiatkowska M, Norman G, Parker D, Santos G (2018) Automated verification of concurrent stochastic games. In: Quantitative Evaluation of Systems. Springer International Publishing, pp 223–239
 56.
Kwiatkowska M, Norman G, Parker D, Santos Gl (2020) Automatic verification of concurrent stochastic systems. Formal Methods in System Design. To appear
 57.
Kwiatkowska M, Norman G, Parker D, Santos G (2020) PRISMgames 3.0: Stochastic game verification with concurrency, equilibria and time. In: Computer Aided Verification. Springer International Publishing, pp 475–487
 58.
Lomuscio A, Raimondi F (2006) MCMAS: A tool for verifying multiagent systems. In: Proceedings of The Twelfth International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS2006). Springer, Berlin
 59.
Lomuscio A, Qu H, Raimondi F (2017) MCMAS: An opensource model checker for the verification of multiagent systems. Int J Softw Tools Technol Transfer 19(1):9–30
 60.
Maler O, Nickovic D (2004) Monitoring temporal properties of continuous signals. In: Formal Techniques, Modelling and Analysis of Timed and FaultTolerant Systems. Springer, Berlin, pp 152–166
 61.
Maschler M, Solan E, Zamir S (2013) Game Theory. Cambridge University Press. Cambridge
 62.
Mogavero F, Murano A, Perelli G, Vardi MY (2014) Reasoning about strategies: on the modelchecking problem. ACM Trans Comput Logic 15(4):47. https://doi.org/10.1145/2631917
 63.
Nisan N, Roughgarden T, Tardos E, Vazirani VV (eds) (2007) Algorithmic Game Theory. Cambridge University Press, Cambridge
 64.
Osborne MJ, Rubinstein A (1994) A course in game theory. The MIT press, Cambridge
 65.
Piterman N (2007) From nondeterministic büchi and streett automata to deterministic parity automata. Log Methods Comput Sci. 3(3)
 66.
Pnueli A (1977) The temporal logic of programs. In: Proceedings of the Eighteenth IEEE Symposium on the Foundations of Computer Science, pp 46–57
 67.
Pnueli A, Rosner R (1989) On the synthesis of an asynchronous reactive module. In: Proceedings of the Sixteenth International Colloquium on Automata, Languages, and Programs
 68.
Rahwan T, Michalak T, Wooldridge M, Jennings NR (2012) Anytime coalition structure generation in multiagent systems with positive or negative externalities. Artif Intell 186:95– 122
 69.
Roth A, Ockenfels A (2002) Lastminute bidding and the rules for ending secondprice auctions: Evidence from eBay and Amazon auctions on the internet. Am Econ Rev 92(4):1093– 1103
 70.
Ruane LM (1990) Process synchronization in the UTS kernel. Comput Syst 3(3):387–421
 71.
Ruys TC, Langerak R (1997) Validation of bosch’ mobile communication network architecture with spin. In: Inproceedings of SPIN97, the Third International Workshop on SPIN. University of Twente
 72.
Shehory O, Kraus S (1998) Methods for task allocation via agent coalition formation. Artif Intell 101(1):165–200
 73.
Shoham Y, LeytonBrown K (2008) Multiagent systems algorithmic, GameTheoretic, and logical foundations. Cambridge University Press, Cambridge
 74.
Sistla AP, Clarke EM (1985) The complexity of propositional linear temporal logics. J ACM 32(3):733–749
 75.
Strejcek J (2004) Linear temporal logic: Expressiveness and model checking. PhD thesis, PhD thesis, Faculty of Informatics Masaryk University in Brno
 76.
Ummels M, Wojtczak D (2011) The Complexity of Nash Equilibria in LimitAverage games. CoRR, arXiv:1109.6220
 77.
Uyanık M (2015) On the nonemptiness of the αcore of discontinuous games: Transferable and nontransferable utilities. J Econ Theory 158:213–231
 78.
van der Hoek W, Lomuscio A, Wooldridge M (2005) On the complexity of practical ATL model checking. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS2006), Hakodate
 79.
Vardi MY (2001) Branching vs. linear time: Final showdown. In: Margaria T, Yi W (eds) Proceedings of the 2001 Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2001 (LNCS Volume 2031). Springer, Berlin, pp 1–22
 80.
Vardi MY, Wolper P (1986) An automatatheoretic approach to automatic program verification. In: First Symposium in Logic in Computer Science (LICS)
 81.
Winskel G (1986) Event structures. In: Advances in Petri Nets
 82.
Wooldridge M (2009) An Introduction to Multiagent Systems, 2nd edn. Wiley
 83.
Wooldridge M, Gutierrez J, Harrenstein P, Marchioni E, Perelli G, Toumi A (2016) Rational verification: From model checking to equilibrium checking. In: Schuurmans D, Wellman MP (eds) Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, Phoenix, pp 4184–4191
 84.
Yi SS (1997) Stable coalition structures with externalities. Games Econ Behav 20(2):201–237
Acknowledgements
Wooldridge, Gutierrez, Harrenstein, and Perelli acknowledge the support of the ERC under grant 291528 (“RACE”). Wooldridge and Harrenstein further acknowledge the support of the Alan Turing Institute, London. Kwiatkowska received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant 834115, “FUN2MODEL”) and the EPSRC Programme Grant on Mobile Autonomy (EP/M019918/1). Abate achknowledges the HICLASS project (113213), a partnership between the Aerospace Technology Institute (ATI), Department for Business, Energy & Industrial Strategy (BEIS) and Innovate UK. Hammond acknowledges the support of an EPSRC Doctoral Training Partnership studentship (Reference: 2218880). Harrenstein was furthermore supported by the ERC under grant 639945 (“ACCORD”). Steeples gratefully acknowledges the support of the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems EP/L015897/1 and the Ian Palmer Memorial Scholarship. Najib acknowledges the support of the ERC European Union’s Horizon 2020 research and innovation programme (grant 759969).
Author information
Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: 30th Anniversary Special Issue
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Abate, A., Gutierrez, J., Hammond, L. et al. Rational verification: gametheoretic verification of multiagent systems. Appl Intell 51, 6569–6584 (2021). https://doi.org/10.1007/s1048902102658y
Accepted:
Published:
Issue Date:
Keywords
 Automated verification
 Game theory
 Multiagent systems
 Model checking
 Automated synthesis