Formal Modelling and Verification of Probabilistic Resource Bounded Agents

Many problems in Multi-Agent Systems (MASs) research are formulated in terms of the abilities of a coalition of agents. Existing approaches to reasoning about coalitional ability are usually focused on games or transition systems, which are described in terms of states and actions. Such approaches however often neglect a key feature of multi-agent systems, namely that the actions of the agents require resources. In this paper, we describe a logic for reasoning about coalitional ability under resource constraints in the probabilistic setting. We extend Resource-bounded Alternating-time Temporal Logic (RB-ATL) with probabilistic reasoning and provide a standard algorithm for the model-checking problem of the resulting logic Probabilistic resource-bounded ATL (pRB-ATL). We implement model-checking algorithms and present experimental results using simple multi-agent model-checking problems of increasing complexity.


Introduction
An increasingly important field of AI is autonomous agents and multi-agent systems, where agents are entities that can interact with their environment or other agents in pursuit of their goals.In general, multi-agent systems research refers to software agents.However, the agents in a Multi-agent System (MAS) could also be for example humans or robots.A key feature of a MAS is that agents in the system are concurrent.The primary aims of such systems are modularity, scalability, flexibility, robustness, and distributed computing (Jennings & Wooldridge, 1998).In multi-agent systems, there are many problems which also require coalition formation among the agents and such problems can only be usefully analysed in terms of the combined abilities of groups of agents.For example, it may be that no single agent, a potential home buyer having some financial deposit constraint, has a strategy to reach a particular state, buying a home, on its own, but two agents, perhaps husband and wife, cooperating with each other are capable of achieving this outcome.Similarly, in the prisoners dilemma (Rapoport, 1989), a single prisoner cannot ensure the optimal outcome, while a coalition of two prisoners can.The Alternating-time Temporal Logic (ATL) (Alur et al., 2002) was introduced as a logical formalism for analysing the strategic abilities of coalitions with temporal winning conditions.The semantics of ATL is usually given by a transition system specification based on concurrent game structures.In ATL, various interesting properties of coalitions and strategies such as reachability and controllability can be formulated.One can then encode a system and verify its desired properties expressed in ATL using standard ATL model-checking tools, such as jMocha (Alur et al., 2001) and MCMAS (Lomuscio et al., 2017).Coalition Logic (CL) (Pauly, 2002) is another logical formalism similar to ATL, intended to describe the ability of groups of agents to achieve a goal in a strategic game.It can specify what a group of agents can (not) achieve through choices of their actions.In the literature, several variants of ATL and CL have been proposed (see e.g., (Goranko, 2001;Ågotnes et al., 2009;Herzig et al., 2013)).These logics allow us to express many interesting properties of coalitions and strategies, such as "a coalition of agents A(⊆ N , the set of all agents) has a strategy to reach a state satisfying ϕ no matter what the other agents and/or environment (N \ A) in the system do", where ϕ characterises, e.g., lifting a heavy weight by a group of robots A, saving a building from fire by a group of fire extinguisher robots A or simply a solution to a problem.In fact, these logics can be used to state various qualitative properties of real-world concurrent systems.However, analysing quantitative properties of systems, such as reliability and uncertainty, which can not be expressed trivially in the logics described above, is equally or even more important.Reliability is the probability that a system will perform its specified function over a given period of time under defined environmental conditions.For example, a reliability property could be "if a fire is detected in a building, then the probability of it being put out and the building being saved by a coalition of robotic agents A within k time steps is at least p".
Many real-world systems, such as Internet of Things (IoT) and Cyber Physical Systems (CPS) are deeply rooted in activities of our daily living (Calvaresi et al., 2017).The multi-agent paradigm offers an excellent framework which can be used to model and implement such systems (Leitao et al., 2016).Such systems usually operate in unpredictable and/or uncertain environments (Faza et al., 2009;Zhang et al., 2016).Their applications encompass many safety critical domains, and many such applications run in resource constrained devices and environments (Abbas et al., 2015;Laszka et al., 2015).These systems therefore often require rigorous analysis and verification to ensure their designs are correct (Kwiatkowska, 2016).Thus, working together across theory and practice is fundamental to address real-world challenges and develop novel formal frameworks in tandem with the theory and tools required to ensure desired systems' reliability and correctness.In conventional verification via model-checking, given a model of a system, and a specification, a model checker determines if the system satisfies the specification by returning a yes or no answer.However, when considering stochasticity in the environment, agents in the system should be formalised in a way so that they exhibit probabilistic behaviour.PRISM (Kwiatkowska et al., 2002) is a tool for formal modelling and analysis of systems that exhibit random or probabilistic behaviour.A system model in PRISM can be developed using its own modelling language similar to reactive modules (Alur et al., 2001), and the properties can be written in an appropriate property specification language, including PCTL (Bianco & de Alfaro, 1995), CSL (Baier et al., 1999), PLTL (Baier, 1998), PCTL * (Baier, 1998), and rPATL (Chen et al., 2013).These are fundamentally probabilistic temporal logics.The logic rPATL allows to reason quantitatively about a system's use of resources and emphasises on expected reward related measures.In rPATL, we can express that a coalition of agents has a strategy which can ensure that either the probability of an event's occurrence or an expected reward measure meets some threshold.However, probabilistic resource-bounded properties such as: • "can coalition A have a strategy so that the probability to reach a state satisfying ϕ under the resource bound b is at least p?"; • "a coalition of agents A has a strategy to achieve a property ϕ with probability p provided they have resources b, but they cannot enforce ϕ under a tighter resource bound b "; • "a coalition of agents A can maintain ϕ until ψ becomes true with probability p provided they have resources b"; and • "if a property ϕ holds, then a coalition of agents A has a strategy to achieve a property ψ within n time steps with probability p provided they have resources b" can neither be expressed in rPATL nor in any other probabilistic temporal logics mentioned above in a straightforward way.In this paper, we propose a logic pRB-ATL for reasoning about coalitional ability under resource constraints in the probabilistic setting, which allows us to express such properties.The significance and novelty of the proposed logical framework, based on probabilistic reasoning and decision-theoretic principles, is that it allows us to analyse the implications of uncertainty and limited computational, communication, or any other resources on the design of autonomous agents in a more realistic and simple manner.This article is a revised and extended version of Nguyen and Rakib (2019).The main differences from Nguyen and Rakib (2019) are a complete literature review, addition of the complete proofs of the lemmas and theorems, development of the model-checking toolkit, and modeling a more complex example system with comprehensive experimental analysis and verification results.
The rest of this paper is organised as follows.In Sect.2, we review related work and discuss how our proposed logic pRB-ATL differs from other logics suggested in the literature.In Sect.3, we discuss the basic notions of probability distribution, and the underlying probabilistic formalisms of our logic such as Discrete-time Markov chains and Markov Decision Processes.In Sect. 4 we present the syntax and semantics of pRB-ATL.In Sect.5, we give a model-checking algorithm for pRB-ATL.In Sect.6, we outline an implementation of our model-checking prototyping tool.In Sect.7, we model, analyse, and present experimental results applying our techniques and tool.Finally, in Sect.8 we conclude the paper and outline directions for future work.

Related Work
In this section, we outline recent important developments on ATL and its extensions considering conventional, resource-bounded, and probabilistic reasoning by discussing important features, such as (un)decidability results, expressiveness, and model-checking problems.
A large number of existing studies in the multi-agent coalition literature have formulated reasoning about the abilities of coalitions of agents in terms of games (Pauly, 2002;Goranko, 2001;Wooldridge & Dunne, 2004;Ågotnes et al., 2009).The coalition logic basically generalises the notion of a strategic game, and its semantics is given in terms of state transition systems where each state has an associated strategic game.The logic ATL (Alur et al., 2002) was originally developed to reason about distributed processes in adversarial environments, and CL (Pauly, 2002) can be regarded as the one-step fragment of ATL (Goranko & Drimmelen, 2006;Goranko, 2001).That is, in CL, the outcome of a strategic game is realised in the next state, but in ATL, properties can be expressed holding in arbitrary future states.These logics allow us to express many interesting properties of coalitions and strategies, as mentioned previously, such as A ϕ, which states that coalition A has a strategy to reach a state satisfying ϕ.The exact semantics of the modalities of the coalition varies depending on whether or not the knowledge that each agent has about the current state of the game is complete (in modal logic it's attributed as complete/incomplete information), and whether agents can use past game state knowledge when deciding on their next move or not (in modal logic it's attributed as perfect/imperfect recall).It is shown in Alur et al. (2002) that the model-checking problem for complete information is decidable in polynomial time, and it's undecidable for the incomplete information and perfect recall case (Dima and Tiplea, 2011).
Recently, there has been growing interest in formal models of resource-bounded agents (Alechina et al., 2009;Alechina et al., 2010;Bulling & Farwer, 2010;Della Monica et al., 2011;Alechina et al., 2010;Nguyen et al., 2015).In resource-bounded reasoning agent research work, the emphasis is on the behavior of agents constrained by fixed resource bounds.For example, the authors of Alechina et al. (2009) introduced Coalition Logic for Resource Games (CLRG), an extension of Coalition Logic that allows explicit reasoning about the resource endowments of coalitions of agents and the resource bounds on strategies.Similarly, the Resource-bounded Alternating-time Temporal Logic (RB-ATL) (Alechina et al., 2010) was developed for reasoning about coalitional ability under resource bounds.The logic RB-ATL allows us to express various resource-bounded properties, such as A b ϕ which expresses that coalition A has a strategy to reach a state satisfying ϕ under the resource bound b.The modelchecking problem for RB-ATL is decidable and if resource bounds are encoded in unary, the model-checking algorithm for RB-ATL runs in time polynomial in the size of the formula and the structure, and exponential in the number of resources.
There also exist other works on extensions of temporal logics and logics of coalitional ability that are capable of expressing resource bounds (Bulling & Farwer, 2010;Della Monica et al., 2011).In Bulling and Farwer (2010) Resource-bounded Tree Logics RTL and RTL * were introduced.The logic RTL * , which extends CTL * with quantifiers representing the cost of paths, can allow only to analyse single-agent systems.RTL is a fragment of RTL * in which each temporal operator is immediately preceded by a path quantifier.Fundamentally, in their proposed language the existential path quantifier Eϕ of CTL has been replaced by ρ ϕ, where ρ represents a set of available resources.Intuitively, the formula ρ ϕ states that there exists a computation feasible with the given resources ρ that satisfies ϕ.It has been shown that the model-checking problem for RTL and some sub-classes of RTL * is decidable.
The Price Resource-bounded ATL (PRB-ATL) logic proposed in Della Monica et al. ( 2011) has introduced its model-checking problem and its syntax and semantics consider resource endowment of the whole system when evaluating a formula pertaining to a coalition of agents.In their model the resources are convertible to money and its amount is bounded.Example properties that can be expressed in PRB-ATL includes A $ ϕ, which states that the coalition A has a strategy such that, no matter what the opponent agents do, ϕ can be achieved under the expenses $.Similar to the RB-ATL, $ can be ∞ in the general case.That is, the meaning of A $ ϕ when $ = ∞ is the same as its counterpart in ATL.The model-checking problem for PRB-ATL is decidable and its complexity similar to that of RB-ATL.
In Alechina et al. (2010), the authors proposed a sound and complete logic RBCL that allows us to express the costs of strategies under resource bounds.They have demonstrated how to verify properties expressed in RBCL and provided a decision procedure for the satisfiability problem of RBCL as well as a model-checking algorithm.However, RBCL has some limitations.For example, properties like "coalition C has a strategy to maintain the property ϕ with resources b", or "coalition C can maintain ϕ until ψ becomes true provided C has resources b" cannot be expressed in RBCL.We can express and verify such properties using RB-ATL (Alechina et al., 2010;Nguyen et al., 2015).
In a more recent work (Belardinelli & Demri, 2021), the authors studied modelchecking problem complexity of RB±ATL + , a variant of RB±ATL (Alechina et al., 2018).The authors investigated the RB±ATL + version, which allows Boolean combinations of path formulas starting with single temporal operators, but is only able to analyse a single resource, providing an interesting trade-off between temporal expressivity and resource analysis.Its model-checking problem complexity is P 2 -complete when taking into account just one agent and one resource, which is similar to that of the standard CTL + logic.Additionally, they have demonstrated that the model-checking problem for RB±ATL + can be solved in EXPTIME with an arbitrary number of agents and a fixed number of resources by using a sophisticated Turing reduction to the parity game problem for alternating vector addition systems with states.Overall, the paper provides a thorough and rigorous treatment of the model-checking problem complexities of strategic reasoning in resource-bounded agents considering both the production and consumption of resources.
A large number of multi-agent application domains, such as IoT and CPS in general and disaster rescue and military operations in particular, require not only the reasoning about the team behavior of agents but also require that the agents and/or the environment may have random or unreliable behaviors.In such domains, the behaviour of an agent has to be described in terms of a distribution of probability over a set of possibilities.There has recently been increasing interest in developing logics with a probabilistic component and to link logical and probabilistic reasoning (see e.g., (Chen and Lu, 2007;Bulling & Jamroga, 2009;Forejt et al., 2011;Huang et al., 2012;Chen et al., 2013;Song et al., 2019;Fu et al., 2018;Wan et al., 2013)).These logics are essentially extensions of CTL or ATL which allow for probabilistic quantification of described properties.In general, probabilistic systems exhibit a combination of probabilistic and nondeterministic behaviour, and the semantics of the system models are defined in terms of probabilistic transition systems.For example, the semantics of the probabilistic ATL logics are defined over probabilistic extension of concurrent game structure (Alur et al., 2002), for which a commonly used underlying formalism is Markov Decision Processes (MDPs).Probabilistic model-checking is also a wellestablished technique, and a well-known tool PRISM exists based on Markov chains (MCs) and MDPs probabilistic models (Kwiatkowska et al., 2002).In Chen and Lu (2007), PATL/PATL * logics have been developed by extending ATL and interpreting over the probabilistic concurrent game structures.Interesting properties that can be expressed in PATL include A [ϕ p ] v; it can be read as: a coalition A has a strategy such that for all strategies of agents not in A, the probability that the path formula ϕ is satisfied is v ( ∈ {≤, <, >, ≥}).It was then further extended to develop the logics rPATL/rPATL * (Chen et al., 2013) for expressing quantitative properties of stochastic multi-player games.The logics rPATL/rPATL * extend PATL/PATL * with operators that can enforce an expected reward v.The logic rPATL * can express cumulative rewards given by the transition system.It is know that model-checking problem for rPATL * is 2EXPTME-complete.
Similar to rPATL, strategies of the agents in our proposed pRB-ATL logic are randomised.An agent uses a randomised strategy by selecting a probability distribution over moves; and the move to be played is then chosen at random, according to the distribution.However, the reasoning problem considered in our work differs from rPATL in two important ways.First, and most importantly, properties in rPATL related to rewards are of statistical nature.They are expressed and computed as constraints on expected values for rewards.In contrast, resource-bounded properties in our pRB-ATL logic lie within the realm of crisp values and constraints; actions and strategies are allowed if and only if they satisfy the resource-bounded constraints.That is, using pRB-ATL, it is possible to ask whether a strategy (a sequence of actions) exists to achieve some goal with probability 0.99 if the agents start with e.g., 100 units of energy.Second, semantics for rPATL is based on turn-based systems while ours is based on concurrent systems.However, recently developed PRISM-games 3.0 (Kwiatkowska et al., 2020) supports modelling concurrent games.We are aware that properties of resource-bounded systems can be verified by expressing them using rPATL.However, to encode a system using rPATL, we have to expand the model to incorporate the resource information into the states of the model.For different formulas with different resource bounds we have to then induce a new model to perform the model-checking algorithms.Our proposed approach allows model-checking algorithms to work directly with the original model.This opens up the possibility of future research including not only agents consume but also produce resources.In the resource production case, verifying resource-bounded properties by encoding resource-bounded system using rPATL is no longer feasible.Furthermore, we do not mention anything about optimal strategies in our framework.Our aim is to check existence of a strategy (which may not be optimal), where the resources each agent is prepared to commit to a goal are bounded.
The approach proposed by Huang et al. (2012) in developing probabilistic ATL logic relies on interpreted system semantics.The resulting logic PATL * essentially generalises the interpreted system (Fagin et al., 1995) by adding probabilistic modality and explicit local actions taken by the agents.An example property that can be expressed in PATL * is A v ϕ which expresses that coalition A has a strategy to enforce ϕ with a probability v.However, since the semantics is based on incomplete information and synchronous perfect recall, model-checking problem for PATL * is undecidable even for a single agent system.
In Bulling and Jamroga (2009), another alternative semantics for a probabilistic logic PATL has been proposed using the notion of prediction denotation operator.In PATL the reasoning about probabilistic success studied over complete information games.The success of the strategy of a coalition is measured according to a probability measure describing the potential actions of the rest of the agents in the system.An example property that can be expressed in PATL is A p ω ϕ which expresses that coalition A has a strategy to enforce ϕ with probability p when agents not in A behave according to ω.The model-checking problem for PATL with mixed strategies is bounded between Probabilistic polynomial time and PSPACE.
In Guan and Yu (2022), the authors presented a probabilistic continuous-time linear logic (CLL), to reason about the probability distribution execution of continuous-time Markov chains (CTMCs).In CLL, multiphase timed until formulas are allowed and the semantics of the formulas focuses on relative time intervals, meaning that time can be reset just like in timed automata.The model-checking problem is reduced to a reachability problem of absolute time intervals.
In Wang et al. (2021), the authors proposed a new concept of probabilistic conformance for Cyber-Physical Systems (CSP).This idea is based on approximately equal satisfaction probability for a given (infinite) set of signal temporal logic (STL) formulas.They have presented a verification algorithm for the probabilistic compliance of grey-box CPS, described by probabilistic uncertain systems.Their proposed statistical verification method is based on a statistical test that can determine if two probability distributions are equal at any chosen level of confidence.It is shown that statistically confirming compliance is possible when the STL formula is monotonically parameterized, meaning that the satisfaction probability of the formula changes monotonically with the parameters.
We must say that after ATL (Alur et al., 2002) was introduced a remarkably rich literature has been developed.Here we have discussed only an overview of the works that are closely related to the topic of the paper.However, in all the approaches, at least in the probabilistic setting, the basic idea of agents acting in an environment according to a set of rules in the pursuit of goals does not take into account resources.In real life, many actions that an agent may perform to achieve a goal can only be accomplished in the availability of certain resources.Certain actions are not possible without sufficient resources, which will lead to a plan failure.To the best of our knowledge, there are no existing works in the literature that address probabilistic variants of ATL for modeling and verifying resource-bounded agents explicitly.

Background and Preliminaries
In this section, we discuss the basic notions that are used in the technical part of the proposed logic.Let Q be a finite set and μ : Q → [0, 1] be a probability distribution function over Q such that q∈Q μ(q) = 1.We denote by D(Q) the set of all such distributions over Q.For a given μ A probability space is a measure space with total measure 1.The standard notation of a probability space is a triple ( , F, Pr), where is a sample space which represents all possible outcomes, F ⊆ P( ) is a σ -algebra over , i.e., it includes the empty subset and it is closed under countable unions and complement, and Pr : F → [0, 1] is a probability measure over ( , F).The interested reader is referred to Billingsley (1986) for a complete description relating to probability distributions and measures.We also denote the set of all finite, non-empty finite and infinite sequences of elements of Q by Q * , Q + and Q ω , respectively.

DTMC and MDP
Discrete-time Markov chains (DTMCs) are the simplest probabilistic models in which the systems evolve through discrete time steps.
Definition 1 A DTMC is a tuple M c = (Q, q 0 , , π, δ), where Q is a set of states, q 0 ∈ Q is the initial state, is a finite set of propositional variables, π : Q → ℘ ( ) is a labelling function, and δ : Q × Q → [0, 1] is a probability transition matrix such that q ∈S δ(q, q ) = 1 for all q ∈ Q.
Here, δ(q, q ) denotes the probability that the chain, whenever in state q, moves into next state q , and is referred to as a one-step transition probability.The square matrix P = (δ(q, q )) q,q ∈Q , is called the one-step transition matrix.Since when leaving state q the chain must move to one of the possible next states q ∈ Q, each row sums to one.
Definition 2 A path λ in a DTMC M c is a sequence of states q 0 , q 1 , q 2 . . .such that δ(q i , q i+1 ) > 0 for all i ≥ 0. The i th state in a path λ is denoted by λ(i).The set of all finite paths starting from q ∈ Q in the model M c is denoted by + M c ,q , and the set of all infinite paths starting from q is denoted by ω M c ,q .The prefix of the path λ of length n is q 0 , q 1 , q 2 . . .q n .Definition 3 A cylinder set C λ is the set of infinite paths that have a common finite prefix λ of length n.Let Mc ,q be the smallest σ -algebra generated by {C λ | λ ∈ + M c ,q }.Then, we can define μ on the measurable space ( M c ,q , Mc ,q ) as the unique probability measure such that: Markov decision processes (MDPs), an extension to ordinary DTMCs, are widely used formalisms for modelling systems that exhibit both probabilistic and nondeterministic behaviour (Forejt et al., 2011).
probabilistic transition function, and all the other components are the same as their counterparts in a DTMC.
The set of available actions at a state q is defined by A(q) = {α ∈ A | ∃q • δ(q, α)(q ) > 0}.Unlike DTMCs, in MDPs the transitions between states occur in two steps.Firstly, an action α is selected from a set of actions A(q) available at a given state q.Secondly, a successor state q is chosen randomly, according to the probability distribution δ(q, α).For a given state q and α ∈ A(q), δ(q, α) Definition 5 A path λ in an MDP M d is an infinite alternating sequence of states and actions λ = q 0 defined as usual as a prefix of an infinite path ending at a state q n .The set of finite paths is denoted by We use a strategy to resolve the nondeterministic choices in an MDP.An MDP M d 's behaviour is entirely probabilistic under a specific strategy, resulting in a DTMC M c .For a more detailed discussion we refer the interested reader to Baier and Katoen (2008, pp. 842-843, Definition 10.91. Scheduler).

Syntax and Semantics of pRB-ATL
In this section, we provide the syntax and semantics of pRB-ATL.Let us consider a multi-agent system consisting of a set N = {1, 2, . . .n} of n(≥ 1) concurrently executing agents.In order to reason about resources, we assume that the actions performed by the agents have costs.Let R = {res 1 , res 2 , . . ., res r } be a finite set of r ≥ 1 resources, such as money, energy, or anything else which may be required by an agent for performing an action.Without loss of generality, we assume that the cost of an action, for each of the resources, is a natural number.The set of resource bounds B over R is defined as B = (N ∪ {∞}) r , where r = |R|.We denote by 0 the smallest resource bound (0, . . ., 0) and by ∞ the greatest resource bound (∞, . . ., ∞).

Syntax of pRB-ATL
Let be a finite set of atomic propositions and N be the set of agents.The syntax of p R B-AT L is defined as follows: The two temporal operators have the standard meaning, for "next" and U ≤k for "bounded until" if k < ∞ or "until" otherwise.When k = ∞, we shall simply write U instead of U ∞ .Here, A b P v [ ϕ] means that a coalition A has a strategy to make sure that the next state satisfies ϕ under resource bound b with a probability in relation with constant v, regardless of the strategies of other players.The formula means that A has a strategy to enforce ϕ 2 while maintaining the truth of ϕ 1 , and the cost of this strategy is at most b with a probability in relation with constant v, regardless of the strategies of other players.Other temporal operators are defined as abbreviations in a standard way.Particularly, "eventually" is defined as ♦ϕ ≡ U ϕ, and "always" as ≡ ¬♦¬ϕ.Notice that these operators when b = ∞ mean the same as their counterparts in AT L, i.e., the AT L operator A corresponds to A ∞ .Similarly, if we consider the operator A ∞ P v , it would then be the same as A P v in PATL.Other classical abbreviations for ⊥, ∨, → and ↔ are defined as usual.

Semantics of pRB-ATL
To interpret this language, we extend the definition of resource-bounded Concurrent Game Structures (RB-CGS)) (Alechina et al., 2010) with probabilistic behaviours of agents.For consistency with Alur et al. (2002), in what follows the terms 'agents' and 'players' and the terms 'actions' and 'moves' have been used interchangeably.r , Q, , π, d, c, δ) where: • n ≥ 1 is the number of players (agents); • r ≥ 1 is the number of resources; • Q is a non-empty finite set of states; • is a finite set of propositional variables; available at a state for each agent, where is a partial function which maps a state q, an agent a and a move α ≤ d(q, a) to a vector of integers where the integer in position i indicates consumption of resource res i by the move α.We stipulate that c(q, a, 1) = 0 for any q ∈ Q and a ∈ N ; is a partial probabilistic transition function that for every q ∈ Q and a joint move m gives the state resulting from executing m in q.
Given a pRCGS S = (n, r , Q, , π, d, c, δ), we identify available moves at a state q ∈ Q of an agent i ∈ N by 1, . . ., d(q, i); then D i (q) = {1, . . ., d(q, i)} denotes the set of available moves; move 1 specifies idling which is always available with cost 0 by definition.Similar to ATL and RB-ATL, the zero-cost move 1 is required to avoid deadlock and, therefore, maintain totality.
A pRCGS is closely related to an MDP (Definition 4, Sect.3.1), where abilities of individual agents and coalitions of agents are constrained by available resources in a non-trivial way.Given A ⊆ N , a joint move m of A is a function m : A → N + .Given q ∈ Q, the set of available joint moves of A at q is denoted by D A (q) = {m : A → N + | ∀a ∈ A : m(a) ∈ D a (q)}.When A = N , we simply write D(q) instead of D N (q) to denote the set of all joint actions for N at q.Given q, q ∈ Q and m ∈ D(q), δ(q, m)(q ) is the conditional probability of a transition from q to q if every agent i ∈ N performs m(i).Then, q is called a successor of q if ∃m ∈ D(q) such that q ∈ supp(δ(q, m)).To this end, pRCGS is different from RB-CGS in defining the transition function δ.While the δ of a RB-CGS (Alechina et al., 2010) is deterministic, that of a pRCGS is a mapping to a distribution function over states and, hence, specifies non-determinism.
Example 1 Let us consider the design of an autonomous firefighting system consisting of two firefighter agents N = {1, 2} in a building.Each agent is equipped with two resources: electricity and water.Agents can perform three possible actions, namely, sense, pump water and idle.They can sense to detect if there is a fire in the building and pump water to stop the fire.Sensing the fire requires one unit of electricity, pumping water requires one unit of electricity and one unit of water, and idle costs nothing.This scenario is formalised by a pRCGS S f f as depicted in Fig. 1.Here, n = 2, r = 2, Q = {q 0 , q 1 , q 2 , q 3 , q 4 , q 5 , q 6 , q 7 , q 8 }, and = {low-burnt, medium-burnt, high-burnt, destroyed}.For convenience, the transition function δ is written in terms of labels on transitions.Each transition from a state q i to a state q j is annotated with one or more labels of the form x y/z where x y denotes the joint move, x is by agent 1's move and y is by agent 2's move performed at state q i , and z denotes the probability of arriving at the next state q j .
At the initial state q 0 , each agent can either stay idle (1) or perform sense (2) action.Therefore, the possible joint moves at q 0 are 11, 12, 21, and 22.The states q 1 and q 2 represent circumstances in which the agents detect a fire either individually or as a coalition, respectively.The severity level of the fire is believed to be low in these two states, that is, the building has just caught fire.At q 0 , if both the agents stay idle and never sense to detect the fire, the system will enter state q 8 where the building can be burnt out completely.At q 1 (only one agent detected the fire) and q 2 (as a coalition both of them detected the fire), each agent can either stay idle (1) or pump water (2).Thus, possible joint moves at each of these two states are 11, 21, 12, and 22.The system may then enter either q 3 or q 4 from both q 1 and q 2 , depending on the actions performed by the agents.The "green" state q 3 reflects the low burnt condition of the building being saved shortly after the fire, and it is labelled with a proposition "low-burnt".However, the state q 4 implies an increased fire intensity level from low to medium severity.At Fig. 1 pRCGS S f f of the two firefighters.The proposition "low-burnt" is labelled on the "green" state q 3 , "medium-burnt" is labelled on the "yellow" state q 5 , "high-burnt" is labelled on the "orange" state q 7 , and "destroyed" is labelled on the "red" state q 8 .(Color figure online) q 4 , agents can stay idle (1) or pump water (2).The system can then enter either q 5 or q 6 from q 4 , depending on the actions performed by the agents.The "yellow" state q 5 reflects the medium burnt condition of the building being saved sometime after the fire, and it is labelled with a proposition "medium-burnt".Reaching to q 6 does, however, mean a further rise in fire intensity from medium to high severity.In the same way, at q 6 agents can stay idle (1) or pump water (2), and the system can then enter either q 7 or q 8 depending on the actions performed by the agents.The "orange" state q 7 reflects the high burnt condition of the building being saved long after the fire ignited, and it is labelled with a proposition "high-burnt".However, reaching to q 8 means that the building is completely destroyed and is labelled with a proposition "destroyed".
If a fire does occur in the building, each agent autonomously decides to detect it or stay idle.When only one of the agents detects the fire, the chance of stopping it with "low-burnt" condition is 25% if only one of them acts.However, the chance of stopping it increases to 49% if both of them act.The effectiveness of stopping the fire with "low-burnt" condition could be improved if both the agents detect the fire jointly.The chance of stopping it with "low-burnt" condition then would be 74% if only one of them acts, while it would be 99% if both of them act.If both the agents stay idle (i.e., neither of them detects the fire nor pumps the water to extinguish it), the building will be destroyed.Note that although the aim is to save the building with the "low-burnt" condition, it is not always possible.Thus, there are possibilities that the building fire intensity can be increased from low to medium, and eventually to high severity.
Given a joint move m ∈ D A (q), the cost of m is defined as: That is, cost(q, m) is the total cost of the actions performed by the agents in the coalition A.
Given a pRCGS S, we adopt the Definition 5 to define runs (computations).An infinite run is an infinite sequence λ = q 0 ω where m i ∈ D(q i ) and q i+1 is a successor of q i by m i , i.e., q i+1 ∈ supp(δ(q i , m i )) for all i ≥ 0. We denote the set of all infinite computations by ω We denote the set of all finite computations by + S and the set of all finite and infinite computations by S , i.e., S = + S ∪ ω S .The length of a computation λ ∈ S , denoted by |λ|, is defined as the number of transitions in λ.For a finite computation λ = q 0 S , λ(i) = q i for all i ∈ {0, . . ., |λ|}; λ(i, j) = q i . . .q j for all i, j ∈ {0, . . ., |λ|} and i ≤ j; m λ = m 0 m 1 . . . is the projection of moves in λ and m λ (i) = m i for i ∈ {0, . . ., |λ| − 1}.Note that λ(|λ|) is the last state in λ.Finally, + S,q = {λ ∈ + S | λ(0) = q} denotes the set of finite computations starting from q ∈ Q.Given a finite computation λ ∈ + S and a coalition A, the cost of joint actions by A is defined as cost A (λ) = |λ|−1 i=0 cost(λ(i), m λ (i)).We adopt Definition 6 (Sect.3.1) to define strategies as follows.
Definition 8 Given a pRCGS S, a strategy of a player a ∈ N is a mapping f a : + S → D(N + ) which associates each finite computation λ ∈ + S to a distribution μ a ∈ D(D a (λ(|λ|)).
Definition 9 A strategy is called memoryless (or Markovian) if its choice of moves depends only on the current state, i.e., f a (λ) = f a (λ(|λ|)) for all λ ∈ + S .It is called deterministic if it always selects a move with probability 1, i.e., f a (λ) is a Dirac distribution.

Definition 10 Given a pRCGS S, a coalition strategy F
) is a function which associates each player a in A with a strategy.
Given a coalition strategy F A , we show that each finite computation λ ∈ + S gives rise to a distribution μ ) and f a = F A (a) for all a ∈ A.

Lemma 1 Given a finite computation λ ∈ + S and a coalition strategy F
It is done by induction on the cardinality of A. When |A| = 1, it is trivial.Assume that |A| > 1, let b be some agent in A and D X denote D X (q), we have: Given two coalition strategies F A and F B of two disjoint coalitions A and B, i.e., A ∩ B = ∅, their union is also a coalition strategy, denoted by Definition 11 Given a bound b ∈ B and a strategy F A , F A is b-bounded iff for all λ ∈ + S such that cost A (λ) ≤ b, it holds that supp(μ In other words, all executions of a b-bounded strategy cost at most b resources.In order to reason about the probabilistic behaviour of S, we need to determine the probability that certain computations are taken.To do this, we construct for each state q ∈ Q, a probability space over the set of infinite computations ω S,q starting from q.The basis of the construction is the probability of individual finite computations induced by the transition probability function δ.Given a state q 0 ∈ Q, we can determine the probability of every finite computation λ = q 0 S,q 0 consistent with F A as follows: If |λ| = 1, Pr F A S,q 0 (λ) = 1 as the above product is empty.For each finite computation λ ∈ + S , we can then define a cylinder set C λ that consists of all infinite computations prefixed by λ.Given an initial state q ∈ Q, it is then standard (Kwiatkowska et al., 2007;Billingsley, 1986) to define a measurable space over ω S,q , infinite runs of S from q, as ( ω S,q , F S,q ) where F S,q ⊆ ℘ ( ω S,q ) is the least σ -algebra on ω S,q generated by the family of all cylinder sets C λ where λ ∈ + S,q .Given a strategy F N , a strategy for all players in the game, the behaviour of S is fully probabilistic.It then gives rise to a probability measure ( ω S,q , F S,q , Pr F N S,q ) where Pr F N S,q : F S,q → [0, 1] uniquely extends Pr F N S,q : + S,q → [0, 1] such that Pr F N S,q (C λ ) = Pr F N S,q (λ) for all finite computations λ ∈ + S,q .

Truth Definition for pRB-ATL
Given a pRCGS S = (n, r , Q, , π, d, c, δ), the truth definition for pRB-ATL is given inductively as follows: This definition is a combination of pATL and RB-ATL.In particular, the case of requires the existence of a strategy F A which must be b-bounded while there is no restriction on the strategies of the remaining players Ā = N \A.
From the truth definition, the following result directly inherits the complement rule for probability where ≥ −1 ≡≤, > −1 ≡<, ≤ −1 ≡≥ and < −1 ≡>: Example 2 Let us continue with the running Example 1.Consider a question (property of the system): "Can agent 1 from q 0 , equipped with 2 units of electricity and 1 unit of water, make sure that the building is at least 49% low-burnt safe?".This means to check if ϕ {1} = {1} (2,1) P ≥0.49 ♦low-burnt is true at q 0 .Unfortunately, there is no such strategy for agent 1, i.e., S f f , q 0 | ϕ {1} .Consider another question (property of the system): "Can agents 1 and 2 jointly from q 0 , equipped with 4 units of electricity and 2 units of water, make sure that the building is at least 74% low-burnt safe?".Similar to the previous question, we need to check if ϕ {1,2} = {1, 2} (4,2) P ≥0.74 ♦low-burnt is true at q 0 .This is true, for example, when employing a strategy where both the agents perform sensing at q 0 and at least one of them pumping the water at q 2 .In fact, this strategy can guarantee the low-burnt safety of the building by up to 99%.Hence, S f f , q 0 | ϕ {1,2} .

Model Checking
In probabilistic model-checking, the most elementary class of properties for probabilistic models is reachability.Given a state q ∈ Q, the probabilistic reachability problem computes the probability to reach some state in a specified target set of states in the model.That is, the basic reachability question is: "can we reach a given target state from a given initial state with some given probability v?".More formally, given a state q ∈ Q, and a set of target states T ⊆ Q, the reachability probability is the measure of paths starting in q and containing a state from T , i.e., Pr({λ ∈ M,q | λ(i) ∈ T for some i ∈ N}).The property of probabilistic reachability actually refers to the minimum or maximum probability.In practice, many model-checking problems can be reduced to reachability problem; therefore, it is considered as one of the most fundamental properties in probabilistic model-checking.For an in-depth discussion on this topic, we refer the interested reader to Forejt et al. (2011).
Here, we present an algorithm for the model-checking problem of pRB-ATL.In particular, given a pRCGS S = (n, r , Q, , π, d, c, δ) and a pRB-ATL formula ϕ, the algorithm produces the set of states Sat(ϕ) of S that satisfy ϕ, i.e., Sat(ϕ) = {q ∈ Q | S, q | ϕ}.Similar to ATL and its descendants, the algorithm generally processes ϕ recursively by computing the set of states satisfying sub-formulae of ϕ before combining them to produce Sat(ϕ).For the propositional cases, the algorithm can be summarised as follows:

Let us focus on the last cases
) due to Lemma 2. Instead of following the semantics definition, i.e., determining the existence of a b-strategy for A to achieve a certain probability v from a state s, we compute the min and max values over all possible b-strategies for is obtained by players in A selecting an allowed move to minimise it while those outside select one to maximise it.Therefore, we have: Case (b): ψ = ϕ 1 U ≤k ϕ 2 .Assume that Sat(ϕ 1 ) and Sat(ϕ 2 ) are computed.For convenience, we denote Pr max S,q (A b , ϕ 1 U k ϕ 2 ) and Pr min S,q (A b , ϕ 1 U k ϕ 2 ) by X b q,k and Y b q,k , respectively.Then, there are three trivial sub-cases: • q ∈ Sat(ϕ 2 ) and for any k: any computation from q satisfies ψ, hence ) and for any k: any computation from q does not satisfy ψ, hence ∈ Sat(ϕ 2 ) and k = 0: any computation from q does not satisfy ψ before 0 transition, hence X b q,k = Y b q,k = 0. Otherwise, players in A try to choose an allowed move m from q with cost at most b that maximises the probability to arrive at a state that can satisfy ψ with the remaining resource b = b − cost(q, m) and within k = k − 1 transitions.Formally, this can be defined as follows: Overall, one can form two linear equation systems with variables X b q,k and Y b q,k , respectively, for each k.They can be solved by direct methods such as Gaussian elimination or iterative methods such as Jacobi and Gauss-Seidel (Kwiatkowska et al., 2007).In general, iterative methods suit the two linear equation systems best.It should not iterate more than k + 1 times as X b q,0 and Y b q,0 saturate to either 0 or 1 regardless of b and q by definition.Case (c): ψ = ϕ 1 U ϕ 2 .Assume that Sat(ϕ 1 ) and Sat(ϕ 2 ) are computed.Again for convenience, we denote Pr max S,q (A b , ϕ 1 U ϕ 2 ) and Pr min S,q (A b , ϕ 1 U ϕ 2 ) by X b q and Y b q , respectively.Similar to the approach in Chen et al. (2013), variables X b q can be computed by iterating the computation of variables X b q,k as defined in case (b) for k → ∞.In practice, this computation can be terminated up to a large enough k such that max b,q |X b q,k − X b q,k+1 | is less than some , a pre-specified convergence threshold.This is based on the fact that X b q,k is a non-decreasing sequence that converges to X b q (Raghavan & Filar, 1991).
In the following, we show the termination and the correctness of Algorithm 1.
Proof Intuitively, termination is straightforward due to the fact that recursive calls within Sat(ϕ) are always applied to strictly sub-formulas of ϕ.Let us prove (i) and (ii) by induction on the structure of ϕ.
Base case: That means (i) holds immediately and (ii) follows directly from the truth definition.
holds immediately and for (ii): by Sat definition iff S, q | p by the truth definition.
-If ψ = ϕ 1 , by the Case (a), the calculation of Pr mm S,q (A b , ψ) terminates due to the fact that Sat(ϕ 1 ) terminates by the induction hypothesis, and D A (q), D Ā(q), Sat(ϕ 1 ) are all finite.
-If ψ = ϕ 1 U ≤k ϕ 2 , by the cases (b) and (c), the calculation of Pr mm S,q (A b , ψ) terminates due to the fact that Sat(ϕ 1 ) and Sat(ϕ 2 ) terminate by the induction hypothesis, D A (q), D Ā(q), Sat(ϕ 1 ), Sat(ϕ 2 ) are all finite, and the solution of the corresponding linear equation systems also terminates.
(ψ) v since the model is finite, sup and inf are turned into max and min, respectively, in Cases (a), (b) and (c) iff S, q | ϕ by the truth definition.
When mm = min, the proof is symmetric to the one above, hence, omitted here.
Assuming that the natural numbers occurring in a pRB-ATL formula ϕ are encoded in unary, we have the following result.

Theorem 2 The upper bound of the time complexity for Sat
Proof ϕ has at most O(|ϕ|) sub-formulae.The case (c) is the most computationally complex.In this case, b is bounded by O(|ϕ| r ).Therefore, the number of variables for each iteration, also that of equations, in each corresponding linear equation system is bounded by O(|ϕ| r • |S|).It is well-known that the time complexity of solving such a linear equation system is at most O(n 3 ) (Golub & Van Loan, 1996), where n is the number of equations.Therefore, the upper bound complexity of computing Furthermore, the lower bound is given by that of the ATL, i.e., linear to the size of the input model and the input formula.

Tool Implementation
We have developed a prototype probabilistic model-checking tool for resourcebounded stochastic multiplayer games based on the techniques proposed in this paper . 1he tool is implemented in Python.It takes two input, a pRCGS model and a pRB-ATL formula.The model input is then interpreted by a parser into an instance of the class Model.Similarly, the formula is interpreted by an another parser into an instance of the class Formula.This formula instance may recursively include further formula instances of the sub formulas of the input formula.Finally, the implementation of the model-checking procedure Sat(ϕ), introduced in Sect.5, is executed to compute the set of states of the input model satisfying the input formula.The whole described process is depicted in Fig. 2.
The class Formula has five sub-classes corresponding to the five cases of state formulas ϕ defined in Sect.4.1.For the last case A b P v [ψ], an auxiliary class, named Fig. 2 The implementation of pRB-ATL model-checking process PathFormula, is introduced to represent path formulae (next, until and negation).Similar to the class Formula, PathFormula has three sub-classes corresponding to the three cases of the path formulae ψ.
The two parsers have been implemented based on Antlr4 (ANTLR, 2020).They interpret pRCGS models and pRB-ATL formulae in a specification language, respectively, defined as close to the syntaxes of pRCGS and pRB-ATL as possible.The syntax of the specification language for pRCGS models is as follow: 'Structure' NAME '=' '{' agents ',' resources ',' gstates ',' propositions ',' labellings ',' availables ',' costings ',' transitions '}' where agents is a positive number indicating the number of agents in the model, resources is a number indicating the number of resources.The remainders describe The syntax diagram for propositions is similar.Functions such as availables are defined as sets of mappings.Each mapping follows the syntax diagram in Fig. 4 from a pair consisting of a state and a number (identifying an agent) to the number of available actions.
The syntaxes are similar for labellings, costings, transitions where elements of labellings are mapping to a set, costings to a cost (a tuple of numbers) and transitions to a state distribution.A state distribution is simply a set of pairs consisting of a state and a natural number.The probability of a state in a distribution is the division of its corresponding number by the sum of all the numbers in the distribution.The syntax diagram for such a pair is depicted in Fig. 5.
The underlying implementation technique of the model-checking procedure Sat(ϕ) is an explicit-state model-checking.The procedure of Sat(ϕ) is implemented by a member method, named sat, of the class Formula.This method is overridden for each of the five sub-classes corresponding to the five cases of Sat(ϕ) described in Sect. 5.The method sat takes only one parameter, an instance of the input model, and recursively calls the same method of instances representing the sub-formulae of ϕ.

Experimental Results
Let us illustrate the use of the pRB-ATL and quantitatively verify the system presented in Example 1 via our model-checking tool.We generalise the properties described in Example 2 as follows "Can a coalition from q 0 , equipped with e units of electricity and w units of water, make sure that the building is at least v low-burnt safe?".This is formalised in pRB-ATL by ϕ A = A (e,w) P ≥v ♦low-burnt.To this end, we need to check whether S f f , q 0 | ϕ A .As mentioned in the previous section, this can be reduced to determine the maximal probability: Pr max S f f ,q 0 (A (e,w) , ♦low-burnt) = X (e,w)   q 0 . Table 1 Pr max S f ,q 0 ({1} (e,w) , ♦ϕ)

Property type
Model-checking result Time (s) Pr max S f f ,q 0 ({1} (e,w) , ♦low-burnt) w / e 0 1 2 0 .080 0.0 0.0 0.0 1 0.0 0.0 0.25 Pr max S f f ,q 0 ({1} (e,w) , ♦medium-burnt) w / e 0 1 2 0 .080 0.0 0.0 0.0 1 0.0 0.0 0.0572 Pr max S f f ,q 0 ({1} (e,w) , ♦high-burnt) w / e 0 1 2 0 .09 Intuitively, the best strategy for one agent to help the building to be saved with lowburnt is to sense and then to pump the water.In total, this costs two units of electricity and one unit of water.Similarly, while cooperating, the best strategy for both the agents would be to choose their best strategies concurrently, which will cost together four units of electricity and two units of water.Therefore, for A = {1}, we consider the resources bounded by (e, w) ≤ (2, 1) and for A = {1, 2} those are bounded by (e, w) ≤ (4, 2).For each case of A and (e, w), the model-checking results are summarised in Table 1 for A = {1} and in Table 2 for A = {1, 2}.In particular, Table 1 shows that any resource bound less than (2, 1) is not helpful for agent 1 as it has no strategy to make sure that the building is safe with low-burnt.In the best case, with resource bound (2, 1), the only vital strategy is to sense the fire and then pump the water.In this case, since agent 2 is not required to cooperate, the worst case is to end up in q 1 from q 0 where the chance to arrive at q 3 , the low-burnt safe state, is at least 25%.That is, choosing the following actions: 21 in q 0 , 21 in q 1 , and 11 in q 3 .Note that any resource bound greater than (2, 1) will also not increase the chance of making sure the building is low-burnt safe with a higher probability.Similarly, Table 2 shows that any resource bound less than (2, 1) will not be enough for both the agents while cooperating.However, the chance of making the building safe with low-burnt increases to 74% as more and more resources are given.This is because from q 0 both the agents can force the arrival at q 2 instead of q 1 .Eventually, the maximal chance of making the building safe with low-burnt reaches 99% as both the agents have enough resources to follow the same best strategy.That is, choosing the following actions: 22 in q 0 , 22 in q 2 , and 11 in q 3 .As we mentioned earlier, if a fire occurs in the building, there are possibilities that the building fire intensity can be increased from low (at q 1 or q 2 ) to medium (at q 4 ), and eventually to high (at q 6 ) severity.If the fire intensity reaches from low to medium, then for a single agent with resource bound (2, 1) the maximum probability for which the building can be saved with medium-burnt is 5.7%, choosing the following actions: 22 in q 0 , 12 in q 2 , 21 in q 4 , and 11 in q 5 .Since agent 2 is not required to cooperate, it will try to minimize the probability.Basically, there are two paths from q 0 to q 5 where 123 agent 1 performs action 2 at q 0 , namely q 0 22 − → q 2 12 − → q 4 21 − → q 5 with probability 0.057 and q 0 21 − → q 1 12 − → q 4 21 − → q 5 with probability 0.165, and obviously the first path will be chosen.An interesting point to note here is that increasing resource bound from (2, 1) to (3, 2) for agent 1 will not increase the probability of saving the building anymore with medium-burnt.For example, with resource bound (3, 2) if agent 1 senses the fire and then pumps water twice, then agent 2, being uncooperative and its objective is the opposite, will sense the fire and pump the water once, and ultimately will lead through a path ending up with probability 0.002.That is, choosing the following actions: 22 in q 0 , 22 in q 2 , 21 in q 4 , and 11 in q 5 .Tables 1 and 2 demonstrate the self-explanatory results for the coalition, as well as other situations such as high-burnt.

Modelling and Analysis of the Firefighting System
To demonstrate the usability and applicability of our proposed techniques and tool, and to evaluate their performance, we present results from a more generalised version of the example system discussed above.We modelled the system as a pRCGS with n ∈ {2, 3, . . ., 13} players, and we check the same pRB-ATL properties as above considering varying A ⊆ N to be coalitions of different sizes, A ∈ {{1}, {1, 2}, . . ., N }.
We increase or decrease the problem size by parameterizing the number of agents, and use a script to generate a complex system encoding.The script takes input as the number of agents and resources, and creates a system model in which the agents generate their alternative actions.The number of agents also parameterises the probability distribution of transitions.Note that, for a simple representation, Fig. 1 shows some transitions from state q i to state q j by collapsing several possible combinations of actions.These combinations of actions will increase as the number of agents in the system increases.For experimental consistency, we parameterise how the probabilities are assigned to the transitions.the case of two agents, we have two possible states in which a low fire can be detected, namely q 1 and q 2 .The possible moves from each of these two states are 11, 12, 21 and 22. Moves 12 and 21, where only one agent acts, have the same probability.Thus, if we leave the move 11 with probability 1 which takes the system from q 1 to q 4 (or q 2 to q 4 ), then of the remaining 12, 21 and 22 moves will require appropriate probability distribution over next states.In this case, q 3 in which the building is saved with low-burnt can be reached from q 1 and q 2 via four different transitions: (1) q 1 12/21 − −− → q 3 , (2) q 1 22 − → q 3 , (3) q 2 12/21 − −− → q 3 , and (4) q 2 22 − → q 3 .We assign probabilities to these transitions in an increasing order.The first transition has the lowest chance of saving the building with a low burnt condition, so we assign the lowest probability to it, while we assign the highest probability to the last transition, which has the highest chance of saving the building.We assume that with the low burnt condition the maximum chance of a building can be saved is 99%.So, we divide 0.99 by the number of these distinct moves, i.e., we assign p = 0.99/4 (= 0.2475 ≈ 0.25) to the move (1) and 1 − p to the corresponding transition from q 1 to q 4 .We then increase it by p each time probability for the next transition is assigned, and the final value of p which is basically 0.99 is assigned to the move (4) (and 1 − p to the corresponding transition from q 2 to q 4 ).In the case of three agents, we have three possible states in which a low fire can be detected based on the following actions performed at q 0 : (211/1, 121/1, 112/1), (221/1, 122/1, 212/1), and (222/1).From each of these states, there will be three possible moves for which we need to assign appropriate probabilities.Thus, the lowest probability would be p = 0.99/9, then we increase it by p each time probability for the next transition is assigned.Table 3 demonstrates three agents' transitions.In general, in a model, the probability distribution to the low fire states would be assigned using p = 0.99/(n * n), where n in the number of agents.The similar type of probability distribution is used for the medium and high fire states.Also, q n+1 , q n+3 , q n+5 , and q n+6 represent the "green", "yellow", "orange", and "red" states of the n agent transition diagram, respectively.
We conducted an extensive set of experiments, however, Table 4 presents the most significant results.Our purpose here is not to provide a detailed analysis of the time and space required to model different classes of pRB-ATL formulae, but simply to give an indication of the scalability and effectiveness of our algorithms and their implementation.Table 4 shows experiments run on an I ntel(R) Core(T M) i5-6500@3.20G H z using 8 G B R AM.It includes model statistics, number of agents, states and transitions, and the times to construct a pRCGS S f f model and to verify a property of the form Pr max S f f ,q 0 ({A} (e,w) , ♦ϕ).The results only include two extreme coalitions, such as the single agent coalition {1} and the coalition N of all the agents in the system.However, our experiments suggest that when there is an increase in the number of players in a coalition the verification time increases.
Figures 6 and 7 depict experimental performance comparison considering the two extreme coalitions.This illustrates how the size of the coalition and number of agents in the system affects the performance of our model-checking algorithms.Note that our results are not directly comparable to the existing probabilistic model-checking results.The use of resource bounds make our models much more complex and increases the non-determinism that exists within the transition relations of systems.

Conclusions and Future Work
In this paper, we have designed and developed a framework for automatic verification of systems with both resource limitations and probabilistic behaviour.We proposed a novel temporal logic pRB-ATL for reasoning about coalitional abilities of systems under resource constraints that exhibit both probabilistic and non-deterministic behaviour.The novelty of our approach lies in complex logical combinations that tackles the problem of more comprehensive resource-bounded probabilistic multi-agent system specification and verification.To model multi-agent systems where the actions of agents consume resources, we have modified probabilistic strategy logics in two ways.Firstly, we added resource annotations to the actions in the transition system.There are a number of interesting directions in this area for future research.First of all, we would like to investigate extensions of our logical framework and techniques to incorporate agent's behaviour with production of resources, and analyse a wider class of properties for the resulting logic.Secondly, we would like to study alternative semantics of the logics, including Interpreted Systems and Strategy Logic, implement them and report the expressivity and performance among alternative approaches.In this paper, we have used the example system just to explain the definitions/concepts used in Sect. 4 and in the rest of the paper in terms of the example.However, construction of a model considering a more realistic scenario is a non-trivial work.In future work, we have a plan to investigate the use of pRBATL logic for the analysis and verification of collaborative systems, by means of several use-cases.For example, in the domain of smart production system or similar other domains where a group of robots work collaboratively to achieve some goals.

Fig. 3 Fig. 4 Fig. 5
Fig. 3 The syntax diagram for sets in a pRCGS model

Fig. 6
Fig. 6 Model construction and verification time

Table 2 max
S f f ,q 0

Table 3 table
for three agents