1 Introduction

A well-studied and popular architecture for developing rational agents is the Belief–desire–intention (BDI) paradigm. BDI agents build on a sound theoretical foundation to model an agent where (B)eliefs represent what the agent knows, (D)esires what the agent wants to bring about, and (I)ntentions the desires the agent is currently acting upon. BDI agents have inspired many agent-oriented programming languages including AgentSpeak [1], Can [2], CanPlan [3], 3APL [4], and 2APL [5] along with a collection of mature software toolkits and platforms including JACK [6], Jason [7], and Jadex [8]. BDI agents have been recognised for their efficiency and scalability in areas, such as business [9] and healthcare [10].

In BDI languages, desires and intentions are often represented using a plan library. Each plan describes a course of actions which an agent can perform to address an adopted event (often representing a task from the external environment) given some beliefs hold, while the set of intentions are the plans currently being executed. Typically BDI languages: (1) assume that action outcomes (i.e. the effects on external environment) are deterministic, (2) remain agnostic internally to the choice of an applicable plan to address an adopted event, (3) remain agnostic internally to the choice of a pending event to adopt from the external environment, (4) remain agnostic internally to the order that intentions are progressed. These assumptions facilitate the formal verification of agent behaviour through a non-deterministic underlying transition system (depicted in Fig. 3) in work such as [11, 12]), where plan, event, and intention selection denotes branching choices and actions have a single outcome. As such, most verification approaches are limited to analysing qualitative properties, querying whether an intention completes or not.

Though useful to have qualitative assurance, unfortunately, this often does not adequately represent agent behaviours in realistic setting such as cyber-physical robotics systems [13]. For example, the outcome of an action may be probabilistic due to imprecise actuation, e.g. the robot tries to open a door, but might fail. Plans, event, and intentions are not created equal and likely have different (domain-specific) characteristics such as preference and urgency, which may require different certain selection strategies (e.g. ordered, fixed schedules, or sampled from a probabilistic distribution). As a result, there is a growing need for formal techniques that can provide support for automated analysis of quantitative properties such as “what is the probability of eventually completing an intention?” and “what is the worst-case probability of eventually completing an intention over all possible selection strategies?”.

To illustrate the problem, we use a robot packaging task in a smart manufacturing scenario as an example (detailed quantitative analysis is given in Sect. 4). The overall goal is to pack products automatically for shipping. The robot insulates products with suitable wrapping bags, to prevent temperature rise and consequent spoilage, and then transfers the wrapped products to a storage location. There are two types of wrapping bags: premium and standard. The standard wrapping is preferred as the cheaper option, however it may not be effective if the product temperature is already too high, and/or the packaging can occasionally break, which results in damaged product (i.e. a negative action outcome with some probability). Before wrapping the products, the robot also has to decide which product to handle first (as there may be multiple products waiting), meaning handling a product before it spoils requires a notion of urgency. While it is important to prioritise the more urgent products, it is also sensible to progress less urgent ones from time-to-time, before they also become urgent and spoiled. So we need to model and quantify agent behaviour when there are a range of choices, inherent uncertainty, and characteristics of preference and urgency. For example, we may wish to know the probability the robot can complete packaging under different schedules, negative outcomes, and decisions.

In the BDI community, probabilistic action outcomes are usually implicit—requiring the agent to sense failures and revise the beliefs (i.e. to enable new plans)—and are often disregarded when modelling. Although most agent language semantics specify non-deterministic plan selection, e.g. in [2], it is typical in practice for plans to be ordered—either statically [7] or at run-time [14]—to enforce deterministic branching. While desirable to exploit the highest priority plan, it may be worthwhile exploring other plans every now and then to avoid being stuck in a local maximum. Similarly, event/intention selection are also not implemented in a non-deterministic fashion either, but in a fixed schedule including Round-Robin (executing a step of each intention in turn) or First-In-First-Out. Interestingly, customised selection implementations change the semantics of agent languages implicitly, and is often a point where implementations and semantics diverge.

We argue that the highest ordering (i.e. local maximum) and fixed schedules (e.g. Round-Robin) are not always the best approach to plan, event, and intention selection and suggest agents should support probabilistic selection strategies together with the need to evaluate the undesired outcomes of actions. We present a formal approach (in contrast with the informal customisation of implementations mentioned above) to specify, model, and quantitatively analyse BDI agents with probabilistic action outcomes and plan, event, and intention selections drawn from a probability distribution. Quantitative verification, e.g. asking the probability some intention completes, aids the design of agents by enabling plan, event, and intention selection strategies to be explored and compared, and mitigates the risk of negative action outcomes by providing much-needed quantitative assurance.

Fig. 1
figure 1

Analysis of intention success for probabilistic CAN programs with different selection strategies

1.1 Approach

We have chosen to work with Can [2] as it captures the essence of BDI concepts without describing implementation details, such as data structures. As a superset of AgentSpeak [1], Can includes declarative goals, concurrency, and failure recovery. Here, we extend the operational semantics to a probabilistic setting. Although we focus on Can, its features are similar to other BDI languages and our approach would apply equally well to them.

Our approach is depicted in Fig. 1. On the left, we have the inputs: (above) CAN semantics and selection strategies and (below) the agent program. In the middle, we have the abstract machine: (above) the CAN semantics are encoded by probabilistic bigraph reaction rules and (below) the agent program is encoded by bigraph entities. On the right we have the execution engines BigraphER [15] and PRISM [16]. We use BigraphER to generate a transition system (a DTMC—Discrete Time Markov Chain) of all possible agent behaviours, for each given combination of selection strategies and initial states. We express successful (or failure) completion of intentions as a Probabilistic Computation Tree Logic (PCTL) [17] formula (e.g. eventually the intention(s) complete successfully). The transition system and formula are the inputs to the PRISM model checker, which returns a likelihood. Put more simply, the user simply “runs” their PCTL formula and agent model with different plan/intention/event selection strategies, as required.

We employ probabilistic bigraphs [18] as the intermediate language, building on our previous work on (non-probabilistic) bigraphs as an executable semantics for (non-probabilistic) Can [19]. We choose bigraphs, over any other formalism, for several reasons. First, its entity and type system allow a natural encoding of beliefs, desires, intentions, and plans as parallel regions. Second, its matching and rewriting nature closely mirrors Can operational semantics, allowing us flexibility to trial different underlying semantics by changing a few bigraph rules. Third, the priority and conditional rule features provided in BigraphER support straightforward expression of selection strategies (e.g. ordered and fixed schedules). Fourth, there is an intuitive diagrammatic representation. The overall result is a user-friendly, direct and smooth translation that supports both probabilistic modelling and predicate-labelled transition systems that can be exported to model checkers like PRISM.

Parts of this study and preliminary results were presented in [20]. We make the following additional research contributions:

  • a probabilistic extension of the full structural operational semantics of Can;

  • an extended executable semantics of Can based on probabilistic bigraphs;

  • a presentation of how different selection strategies are encoded in bigraphs;

  • an extended evaluation and analysis use case, comparing various plan, event, and intention selection under probabilistic action outcomes, e.g. ordered and Round-Robin;

  • a reflection on insights gained from creating a probabilistic extension of Can, and the practical value of probabilistic agents for, e.g. agent-designers.

The paper is organised as follows. In Sect. 2 we provide a brief overview of BDI agents and bigraphs. In Sect. 3 we propose the probabilistic extension of Can semantics. In Sect. 4 we evaluate our approach on a smart manufacturing example. In Sect. 5, we reflect on the generality and limits of our approach. We discuss related work in Sect. 6, future work in Sect. 7, and conclude in Sect. 8.

2 Background

2.1 BDI agents

A BDI agent has an explicit representation of beliefs, desires, and intentions. The beliefs correspond to what the agent believes about the environment, while the desires are a set of external events that the agent can respond to. To respond to those events, the agent selects a plan (given its beliefs) from the pre-defined plan library and commits to the selected plan by turning it into a new intention.

2.1.1 BDI syntax

The Can language formalises a classical BDI agent consisting of a belief base \( \mathcal {B} \) and a plan library \( \Pi \). The belief base \( \mathcal {B} \) is a set of formulas encoding the current beliefs and has belief operators for entailment (i.e. \( \mathcal {B} \models \varphi ) \), and belief atom addition (resp. deletion) \( \mathcal {B} \cup \{b\}\) (resp. \(\mathcal {B} {\setminus } \{b\} \)). In general, any logic is allowed providing entailment is supported for a belief base. A propositional logic with natural number comparisons is used in this work. A plan library \( \Pi \) is a collection of plans of the form \( e: \varphi \leftarrow P \) with e the triggering event, \( \varphi \) the context condition, and \( P \) the plan-body. The triggering event e specifies why the plan is triggered, while the context condition \( \varphi \) determines when the plan-body P is able to handle the event. Events can be either be external (i.e. from the environment in which the agent is operating) or internal (i.e. sub-goals that the agent itself tries to accomplish). A (partially executed) plan-body P for a selected plan \(e: \varphi \leftarrow P \) is the intention that is addressing e. The language used in the plan-body is defined by the following grammar:

$$\begin{aligned} \qquad P&::= nil \mid +b \mid -b \mid act \mid ?\varphi \mid e \mid P _{1}; P _{2} \mid P _{1} \triangleright P _{2} \mid \\&\qquad \, P _{1}\parallel P _{2} \mid e: (\mid \varphi _{1} : P _{1}, \cdots , \varphi _{n}: P _{n}\mid ) \mid \\&\qquad \, goal (\varphi _{s}, P , \varphi _{f}) \end{aligned}$$

where nil is an empty program, \(+b\) and \(-b\) belief addition and deletion, act a primitive action, \(?\varphi \) a test for \(\varphi \) in the belief base, and e is a sub-event (i.e. internal event). Actions act take the form \(act = \varphi \leftarrow \langle \phi ^{-}, \phi ^{+} \rangle \), where \( \varphi \) is the pre-condition, and \(\phi ^{-} \) and \( \phi ^{+} \) are the deletion and addition sets (resp.) of belief atoms, i.e. a belief base \( \mathcal {B} \) is revised to be \( (\mathcal {B} {\setminus } \phi ^{-}) \cup \phi ^{+}\) when the action executes. We also denote the set of actions in the plan library as \( \Lambda \). To execute a sub-event, a plan (corresponding to that event) is selected and the plan-body added in place of the event. In this way we allow plans to be nested (similar to sub-routine calls in other languages). In addition, there are composite programs \( P _{1}; P _{2} \) for sequence, \( P _{1} \rhd P _{2} \) that executes \( P _{2}\) in the case that \( P _{1} \) fails, and \( P _{1}\parallel P _{2} \) for interleaved concurrency. A set of relevant plans (those that respond to the same event) is denoted by \( e: (\mid \psi _{1}: P _{1}, \cdots , \psi _{n}: P _{n}\mid ) \). Finally, a declarative goal program \( goal (\varphi _{s}, P , \varphi _{f}) \) expresses that the declarative goal \( \varphi _{s} \) should be achieved through program P, failing if \( \varphi _{f} \) becomes true, and retrying as long as neither \( \varphi _{s} \) nor \( \varphi _{f} \) is true (see in [3] for details).

2.1.2 BDI semantics

Can semantics is specified by two types of transitions. The first type, denoted \(\Rightarrow \), specifies agent-level evolution over \(\langle E^{e}, \mathcal {B}, \Gamma \rangle \), detailing how to execute a complete agent where \(E^{e}\) is the set of pending external events to address (the desires), \(\mathcal {B}\) the belief base, and \(\Gamma \) a set of partially executed plan-bodies (intentions). The second, denoted \(\rightarrow \), specifies intention-level evolution on configurations \( \langle \mathcal {B}, P\rangle \) where \( \mathcal {B} \) is the belief base, and P the plan-body currently being executed.

Fig. 2
figure 2

Can semantics from [3]

The agent-level semantics are given in Fig. 2a. Rule \( A_{event} \) handles external events, that originate from the environment, by adopting them as intentions. Rule \( A_{step} \) selects an intention from the intention base, and evolves a single step w.r.t. the intention-level transition, while \( A_{update} \) discards any intentions that cannot make progress (either because they have already succeeded, or failed).

Figure 2b gives intention-level rules for evolving any single intention. For example, the rule act handles the execution of an action, when the pre-condition \(\psi \) is met, resulting in a belief state update. Rule event replaces an event with the set of relevant plans, while rule select chooses an applicable plan from a set of relevant plans while retaining un-selected plans as backups. With these backup plans, rules for failure recovery \(\rhd _{;}\), \(\rhd _{\top }\), and \(\rhd _{\bot }\) enable new plans to be selected if the current plan fails (e.g. due to environment changes). Rules ;  and \(;_{\top }\) allow executing plan-bodies in sequence, while rules \(\Vert _{1} \), \(\Vert _{2} \), and \(\Vert _{\top } \) specify how to execute (interleaved) concurrent programs. Rules \(G_s\) and \(G_f\) deal with declarative goals when either the success condition \(\varphi _s\) or the failure condition \(\varphi _f\) become true. Rule \( G_{init}\) initialises persistence by setting the program in the declarative goal to be \( P \rhd P\), i.e. if P fails try P again. This ensures P runs indefinitely unless either the success condition \(\varphi _s\) or failure condition \(\varphi _f\) holds. Rule \( G_;\) takes care of performing a single step on an already initialised program. Finally, the derivation rule \(G_{\rhd }\) re-starts the original program if the current (partially-executed) program has finished or got blocked (when neither \(\varphi _s\) nor \(\varphi _f\) becomes true).

2.1.3 Agent example

For illustration, we give a classic example [19]: arranging a conference trip. The agent program is shown in Listing   1, and commentary follows.

figure e
Fig. 3
figure 3

Non-deterministic (standard Can) and new probabilistic transitions highlighting plan, event, and intention selection, and action execution. Solid lines are agent-level transitions, while dashed lines are intention-level

An agent desires to arrange a conference trip, denoted by an external event e_conference_travelling (line 6). We assume there are only two ways to travel to the conference. The first is by car, given by the plan in line 2, which expresses that if the agent believes it owns a car (i.e. own_car) and the venue is in the driving distance (i.e. driving_distance), it can start the car (start_car) and drive (driving) all the way to the venue. To specify the actions, for example, the action start_car (line 8) expresses that if the car is functional (i.e. car_functional) and after executing it, the belief of the engine being on (i.e. engine_on) will be added while deleting nothing from the belief base.

The second way to travel is by air, given by the plans in lines 3 to 4. This plan expresses that if the budget allows and there is a flight, the agent can book the ticket first, then post internally a sub-event to actually travelling by plane, and go to the venue after landing. To address the sub-event e_get_on_board, we have plan in line 4, which expresses that if the agent believes the flight has been booked, it can go to the airport and fly by plane.

3 Probabilistic CAN semantics

The semantics of Can are specified by two types of transitions. The first is the agent-level transition \( \Rightarrow \) in Fig. 2a that specifies how to execute a complete agent. The second is the intention-level transition \(\rightarrow \) in Fig. 2b that specifies how to evolve a given single intention.

Can semantics feature non-deterministic transitions, e.g. for plan selection. To allow for probabilistic selection and action outcomes, we must extend this to support probabilistic transitions. Figure 3 provides a high level comparison of the standard non-deterministic, and our new probabilistic semantics for Can.

Choices appear throughout both the agent and intention level semantics. An agent with multiple external events to respond and a set of intentions to pursue is faced with three operations to choose from, namely agent-level operation selection. The agent can incorporate any pending external events specified by semantic rule \(A_{event}\), it can select an intention and execute a step (according to the intention-level semantics) using \(A_{step}\), or it can manage the intention set by removing an unprogressable intentions using \(A_{update}\). In the original Can semantics, these are chosen non-deterministically, that is, there is no way to prioritise completing existing intentions over handling new events.

Once an agent-level operation is chosen, there are further decisions to make. For example, which pending external event (there may be multiple) should be adopted? Similarly, both \(A_{step}\) and \(A_{update}\) must select one intention from a set of intentions (i.e. intention selection). These choices are also made non-deterministically, again meaning we cannot prioritise specific events/intentions.

After choosing to step an intention (\(A_{step}\)), progressing this intention may imply (visualised as dashed lines in Fig. 3) selecting an applicable plan, progressing concurrent program, and executing an action. Again, plan selection is made non-deterministic using rule select, the order of concurrent program progress is non-deterministic using \(\Vert _1\) or \(\Vert _2\), and action has only one single outcome by act.

Previous work [19] formally modelled and analysed non-probabilistic Can (i.e. the left side of Fig. 3). We extend this to define a probabilistic semantics for Can, and show how this allows quantitative analysis. The right side of Fig. 3 shows our probabilistic extension of Can semantics.

To move from non-deterministic transitions to probabilistic transitions, we employ probabilistic transitions \(\mathcal {C} \rightarrow _{p} \mathcal {C}'\) (i.e. move from \(\mathcal {C}\) to \(\mathcal {C'}\) with probability p) [21]. To extend the non-deterministic transition, the key is to assign probability to each selection choice. In next sections we detail why and how we extend both agent-level and intention-level transitions from Can, and how suitable distributions can be constructed to support quantitative analysis.

Notation

We use \(\mu , \eta \) to refer to probability distributions over a set A. We write \(\mu = [x \mapsto p_1, y \mapsto p_2]\) to denote the probability distribution, over \(\{x,y\}\) where, for example, x is sampled with probability \(p_1\), and access the probability of an element using function notation, e.g. \(\mu (x) = p_1\). For a distribution we require \(\sum _{p \in \mu } = 1\). Only probabilities for nonzero elements are given, such that for \([x \mapsto 1, y \mapsto 0]\) we instead write \([x \mapsto 1]\). We use Dist(A) to refer to the set of discrete probability distributions over A, i.e. a set with probability distributions as elements.

We denote the set of all possible belief atoms, external events and intentions—for a specific program—as \( \overline{\mathcal {B}} \), \(\overline{E^e}\), and \(\overline{\Gamma }\) respectively. At each agent step, the belief base (resp. events, intentions) is given as \(\mathcal {B} \subseteq \overline{\mathcal {B}}\).

3.1 Probabilistic agent-level semantics

The agent-level semantics of Can characterise the evolution of an agent which has multiple external events to respond and is currently pursuing a set of intentions While the agent-level semantics allow the agent to respond to new events even while already dealing with other events, only one agent-level operation can be performed at each step. Such non-deterministic choice of agent-level operation is implicit in Can semantics. Here we formalise and express it as a function as follows where we denote the set of agent-level rules as \(\mathcal {A} = \{A_{event}, A_{step}, A_{update}\}\):

$$\begin{aligned} \qquad \qquad \mathcal {S}_{ao}: 2^{\overline{E^e}} \times 2^{\overline{\mathcal {B}}} \times 2^{\overline{\Gamma }} \rightarrow \mathcal {A} \cup \{\bot \} \end{aligned}$$

which returns a choice of agent-level operation given any agent-level configuration \(\langle E^{e}, \mathcal {B}, \Gamma \rangle \) and \(\bot \) stands for no applicable rules available.

In practice, a common approach of selecting an agent-level operation is often done in deterministic fashion such as incorporating an external event if any before selecting any intention to execute a step if possible. However, we may need to choose an agent-level operation from a distribution. For example, it may be better for the agent to mainly incorporate external events as intentions at the early operation stage and to mainly progress existing intentions at later stage. To allow this we sample agent-level operations based on a probability distribution, i.e. with the following selection function:

$$\begin{aligned} \qquad \qquad \mathcal {S}^{p}_{ao}: 2^{\overline{E^e}} \times 2^{\overline{\mathcal {B}}} \times 2^{\overline{\Gamma }} \rightarrow Dist(\mathcal {A} \cup \{\bot \}) \end{aligned}$$

The probability of \(\bot \) of any distribution \(\mu \in Dist(\mathcal {A} \cup \{\bot \})\) is either \(\mu (\bot ) = 0\) (agent-level operation(s) available.) or \(\mu (\bot ) = 1\) (no agent-level operation(s) available). Using \(\mathcal {S}^{p}_{ao} \), we will define probabilistic rules for actual execution of agent-level operations in the next sections.

We will next detail how an agent decides which event or intention should be selected when a given agent-level operation is selected (according to the distribution from selection function \(\mathcal {S}^{p}_{ao}\)). The details of how these functions are implemented are given later on in Sect. 4.4.

3.1.1 Probabilistic event adoption

BDI agents operate by continuously handling external events that represent tasks originating from the external environment.

To respond to these events, an agent selects an external event (\( e \in E^{e}\)) and adopts it in the intention set (\( \Gamma \cup \{e\}\)), using rule \(A_{event}\)

There may be multiple pending external events, due to different requests from the environment, and it is not clear which event should be selected: the rule above picking any waiting event. In practice, we want more control over the event that is selected as different events may be more or less urgent. Many agent implementations, going against the semantics, choose events using an event selection function \(\mathcal {S}_e\) that is customised to account for priorities and is formalised in the following form:

$$\begin{aligned} \qquad \qquad \mathcal {S}_e: 2^{\overline{\mathcal {B}}} \times 2^{\overline{E^e}} \rightarrow \overline{E^e} \cup \{\bot \} \end{aligned}$$

Given a belief base and a set of external events it returns an event or \(\bot \), i.e. no requested event present. In other words, the agent always takes an event if one exists the agent. We also note that the belief base is needed to provide relevant information (e.g. priority) for the agent to make more informed event selection decisions.

To allow non-strict orderings we sample events based on a probability distribution, i.e. with the probabilistic event selection function:

$$\begin{aligned} \qquad \qquad \mathcal {S}^{p}_{e}: 2^{\overline{\mathcal {B}}} \times 2^{\overline{E^e}} \rightarrow Dist \Bigg (\overline{E^e} \cup \{\bot \} \Bigg ) \end{aligned}$$

Using \(\mathcal {S}^p_e\) and \(\mathcal {S}^{p}_{ao} \) (which defines probability of selecting rule \( A_{event} \)), we can define a probabilistic \( A_{event} \) rule:

The rule \(A^{p}_{event}\) says that if the probability of performing event selection at this step is \(p_1\) and the probability of selecting a pending external event is \(p'_1\), then the probability of selecting this event at this step is \(p_1 \cdot p'_1\). When no external event is available (i.e. \(\mathcal {S}^{p}_{e}(\mathcal {B}, E^{e}) = \mu _1 \) and \( \mu _1(\bot ) = 0 \)), \(A^{p}_{event}\) is not applicable.

3.1.2 Probabilistic intention progression

Every time an agent adopts an external event a new intention is created. As agents should adopt events to stay reactive if one exists, we end up with a set of intentions competing for the agent’s attention. As Can agents are single-threaded, at most one intention can be executed each agent step in an interleaved manner. If an agent decides to work on intentions (rather than adopt new events), the agent must make a choice: out of the set of progressable intentions, which should be progressed? In standard Can this is captured by

This rule non-deterministically progresses any intention (that can be progressed) with respect to the intention level rules in Fig. 2b.

The precondition states that P must be an intention but does not control which. If we want more control, we can add an intention selection function as follows:

$$\begin{aligned} \qquad \qquad \mathcal {S}_{i}= 2^{\overline{\mathcal {B}}} \times 2^{\overline{\Gamma }} \rightarrow \overline{\Gamma } \cup \{\bot \} \end{aligned}$$

As before, this function returns a fixed intention, or \(\bot \) if no intention is present in the intention set (i.e. \(\Gamma = \emptyset \)). We note that the function \(\mathcal {S}_{i}\), by definition, includes the set of all possible stages of each intention as every step of an intention is itself a different intention. To efficiently construct such an function, however, we often treat different stages of an intention as the same intention. In fact, we decide to link each intention with the related external event as intentions ultimately address external events to construct this function (detailed in Sect. 4). As such, regardless of how an intention evolves, it is treated as the same intention. Similar to event selection, the key component of beliefs in the function domain is to provide domain-specific information of intentions to aid customised selection.

Due to the need to choose intentions from a distribution, we provide the following function to allow intention selection from a distribution:

$$\begin{aligned} \qquad \qquad \mathcal {S}^{p}_{i}= 2^{\overline{\mathcal {B}}} \times 2^{\overline{\Gamma }} \rightarrow Dist\left( \overline{\Gamma } \cup \{\bot \}\right) \end{aligned}$$

The agent-level transitions of \(A_{step}\) depends on the intention-level transitions and we need to account for this in the transition probabilities. To have a probabilistic agent-level rule \(A_{step}\), we assume, for a chosen progressable intention \(P \in \Gamma \), \(\langle \mathcal {B}, P\rangle \rightarrow _{p'} \langle \mathcal {B}', P' \rangle \) holds, for example, if a plan selection for the given intention P is required based on \(select^P\). The detailed probabilistic intention-level semantics will be given Sect. 3.2.

The probabilistic rule for intention selection is

where \(\Gamma '=(\Gamma \setminus \{P\})\cup \{P'\}\). Rule \(A^{p}_{step}\) says that if the probability of performing intention selection at this step is \(p_2\), the probability of selecting intention P is \(p'_2\), and the probability of progressing it to \(P''_2\), then the probability of selecting and progressing intention P to \(P'\) at this step is \(p_2 \cdot p'_2 \cdot p''_2\).

3.1.3 Probabilistic intention update

The final agent-level rule \(A_{update}\) drops any unprogressable intention from the intention set.

Same as rule \(A_{step}\), rule \(A_{update}\) also requires the same distribution \(\mathcal {S}^{p}_{i}\) to allow intention selection. Unlike \(A_{step}\), however, \(A_{update}\) depends on the intention-level transitions which has a default probability 1, namely \(\langle \mathcal {B}, P\rangle \nrightarrow _1\). To have a probabilistic agent-level rule \(A_{update}\), we present the following probabilistic rule for intention selection.

The new rule \(A^{p}_{update}\) says that if the probability of updating intention selection at this step is \(p_3\), the probability of selecting an unprogressable intention P is \(p'_3\), then the probability of selecting and removing it from the intention set at this step is \(p_3 \cdot p'_3 \).

Finally, when there is no agent-level operation available, we provide a default idle rule that transitions the agent to itself. This is required as DTMCs have must always have an outgoing edge probabilities that sum to 1. This allows for verification using probabilistic model checking tools (in Sect. 4.5). The self-transition rule is

3.2 Probabilistic intention-level semantics

Figure 2b gives rules for evolving any single intention and each rule is either defined in either deterministic (e.g. rule act) or non-deterministic nature (e.g. rule select). Though most of deterministic rules are indeed expected and appropriate such as progressing a sequence of programs one by one, the rule for action execution may have uncertain outcomes (i.e. the effects on external environment), which may need probabilistic treatment. Similarly, we naturally extend non-deterministic rules such as plan selection select to probabilistic setting based on agent-specific information such as preference.

3.2.1 Probabilistic action outcomes

Agents execute actions that both interact with an external environment (e.g. pick up an object), and in-turn revise the internal belief base (e.g. the agent believes it holds the object). Recall that action execution is specified in Can as follows:

This states that an action applies only if the precondition \(\varphi \) holds, and the outcome is to update the belief base by adding and removing the belief atoms specified by \(\phi ^{-}\) and \(\phi ^{{+}}\), respectively. Therefore, the action outcome is implicitly made deterministically by function

$$\begin{aligned} \qquad \qquad \mathcal {S}_{a}: \Lambda \rightarrow 2^{ \overline{\mathcal {B}}} \times 2^{\overline{\mathcal {B}}} \end{aligned}$$

Given an action, it returns a product of set of added and deleted atoms

In practice, we know the outcomes of an action are uncertain (e.g. due to actuator malfunctions). For example, an agent may execute an action to pick up an object but fail to do so because a robotic arm fails. In this case, updating the beliefs that an object is held can lead to misalignment between the true environment and the agent’s representation of it. To allow uncertain action outcomes, we can sample outcomes based on a probability distribution, i.e. with the following action outcome function:

$$\begin{aligned} \qquad \qquad \mathcal {S}^{p}_{a}: \Lambda \rightarrow Dist(2^{\overline{\mathcal {B}}} \times 2^{\overline{\mathcal {B}}}) \end{aligned}$$

A probabilistic action execution is defined by

Importantly we do not expect programming language implementations based on these semantics to draw action outcomes probabilistically. Instead it is used solely for modelling, which allows us to capture environmental effects in a semantics where they are usually overlooked or ignored.

3.2.2 Probabilistic plan selection

BDI agents employ a user-provided plan library to respond to events. Each plan has i) a triggering event defining what event the plan can respond to, ii) a pre-condition defining what beliefs must hold for the plan to apply, and iii) a plan-body defining what steps should be taken to execute the plan. To address a pending event originating from the external environment, the agent retrieves a set of relevant plans, i.e. those with a matching triggering event, as specified by Can rule

Given a set of relevant plans, the agent then selects an applicable plan (one where the precondition is true):

where \(\Delta '=\Delta \setminus \{\varphi :P\}\). If there are no applicable plans a separate rule such as rule \(\rhd _{\bot }\) in Fig. 2b propagates the failure.

Notice that the preceding select rule does not specify which plan should be selected in case of multiple applicable plans, i.e. it is non-deterministic. However, in many implementations, the choice is often made deterministically by a plan selection function of the following form:

$$\begin{aligned} \qquad \qquad \mathcal {S}_{p}: 2^{\overline{\mathcal {B}}} \times 2^\Pi \rightarrow \Pi \cup \{\bot \} \end{aligned}$$

Given a belief base and a set of plans it returns an applicable plan or no applicable plan (\(\bot \)).

While a common heuristic is to select the plan with the highest order based on some characteristics (e.g. preference), it may not lead to globally optimal behaviours due to action side-effects. We argue that it should be possible to prioritise plan choice based on plan characteristics, but not assume a totally fixed ordering in order to allow exploration of non-highest order plans that might have better properties. This is akin to discrepancy search techniques [22] to go against the heuristic, and is particularly useful for declarative goals (e.g. rules \(G_{init}\) and \(G_{\rhd }\) in Fig. 2b) to avoid always repeating the same plan.

To support non-strict orderings, we can sample the choice of applicable plans based on a probability distribution, i.e. with the following plan selection function:

$$\begin{aligned} \qquad \qquad \mathcal {S}^{p}_{p}: 2^{\overline{\mathcal {B}}} \times 2^\Pi \rightarrow Dist(\Pi \cup \{\bot \}) \end{aligned}$$

Using \(\mathcal {S}^{p}_{p}\), a probabilistic select rule is defined by

where \(\Delta ' = \Delta {\setminus } \{\varphi :P\}\) and \(\mu \) is the probability distribution returned from \(\mathcal {S}^{p}_{p}\) such that any non-relevant and non-applicable plans are assigned the probability 0.

Trialling different distributions is possible by changing \(\mathcal {S}^{p}_{p}\) which could, for example, be extracted from historical data. With our approach, it allows quantifying exact probabilistic effects of different \(\mathcal {S}^{p}_{p}\) choices.

3.2.3 Probabilistic concurrency

Can also supports the execution of concurrent programs. To execute a concurrent plan-body program, the agent can execute either part non-deterministically given in following two cases:

To choose the part of concurrent programs to progress, we may also require flexibility to choose from a distribution by a selection function for a concurrent program \( P_{1}\Vert P_{2}\) as follows:

$$\begin{aligned} \qquad \qquad \mathcal {S}^{p}_{c}: 2^{\overline{\mathcal {B}}} \times P \rightarrow Dist(P \cup \{\bot \}) \end{aligned}$$

where \(P \subseteq \Gamma \) is all possible plan-body programs, \(Dist(P \cup \{\bot \}))\) is the set of discrete probability distributions over all possible plan-body programs (or a delta distribution to \(\bot \) if no part of a concurrent program can be progressed).

We note that both rules \(\Vert _{1}\) and \(\Vert _{2}\) imply that the evolution of a concurrent program depends on another intention-level transition of either part of concurrent program and we need to account for this in the transition probabilities. To have a probabilistic intention-level transition for a concurrent program, we assume, for a chosen progressable intention \(P_i \in P\), \(\langle \mathcal {B}, P_i\rangle \rightarrow _{p'_i} \langle \mathcal {B}', P'_i \rangle \) holds where \(i \in \{1,2\}\), for example, if a plan selection for the given intention \(P_i\) is required based on \(select^P\). Using \(\mathcal {S}^{p}_{c}\) we can define a probabilistic extension for rules \(\Vert _{1}\) and \(\Vert _{2}\):

where \(\mu \) is the probability distribution returned from \(\mathcal {S}^{p}_{c}\) such that where \(\sum _{i}p_i \cdot p'_i = 1 \), \(p_i, p'_i \in [0,1]\), and \(i \in \{1, 2\}\).

Finally, we reiterate that when there is no applicable plan available to select or neither of two concurrent programs is progressable, a separate rule of failure recovery (\(\rhd _{\bot }\) in Fig. 2b) will propagate the failure to continue the transition. The rest of intention-level rules in Fig. 2b are automatically extended with a probability 1. The full rule set for the probabilistic extension of the Can semantics is in Fig. 15.

3.3 Constructing selection functions

The probabilistic Can semantic rules either transition with probability 1 or through probability distributions. These probability distributions are abstract, and the rules themselves do not specify how to construct them in practice. In this section, we present and extended syntax for Can programs that allows agent programmers to define the specific distributions to be used.

3.3.1 Situation value functions

We introduce additional syntax for relevant agent programs (e.g. plans) that, through a process of normalisation, determines the correct probabilistic distributions. Following [14], we annotate programs using situation value description functions \(\theta : 2^{\mathcal {B}} \rightarrow \mathbb {R}_{\ge 0}\). Intuitively these map the current situation, as described by the current beliefs, to a real valued number. We allow users to define \(\theta \) functions as folds/aggregation functions as follows:

$$\begin{aligned} \qquad \qquad \langle d_0, \{(\varphi _1, d_1), \cdots , (\varphi _n, d_n)\}, f\rangle \end{aligned}$$

where \(d_0\) is a default value and values \(d_{i}\) are aggregated using function f (e.g. sum) whenever \(\mathcal {B} \models \varphi _i \) holds.

In general, situation value functions are dynamic in that they respond based on the current set of beliefs representing the current situation the agent is in. A special case are static values, e.g. \(\theta = \langle 0, \{(true, d_{i})\}, + \rangle \) that do not depend on the current beliefs of the agent. For ease of notation, we allow users to denote these simply as the value, e.g. \(\theta = d_i\), rather than giving the full function.

Situation value functions can then be attached to the following agent programs to determine the correct probabilistic selection function.

  • Plans \(e:\varphi \leftarrow P[\theta ]\)

  • Concurrency \(P_{1}[\theta _{1}] \parallel P_{2}[\theta _{2}]\)

  • Events and Intentions \(e[\theta ]\)

Intuitively, plans with a higher (current) \(\theta \) value should be selected more often. Concurrency annotations determine which branch should be preferred, while the event annotations determine which event should be adopted first (given a set of possible events). Because intentions ultimately address external events, we measure the situation value of an intention by considering the value of its related external event, which suffices for our smart manufacturing example in Sect. 4.

Finally, to construct the probabilistic selection function for agent-level operations, we add three keywords \(A_{event}\), \(A_{step}\), and \(A_{update}\) (mirroring the agent-level rule names) that allow annotations with a situation value function, i.e. \( A_{event}[\theta _{1}], A_{step}[\theta _{2}]\), and \( A_{update}[\theta _{3}]\). The syntax is shown as part of agent configurations in lines 20 to 22 in Listing  2. Each situation value function describes the relative weight to selecting each agent-level rule. As usual, the resulting probabilities to select each agent-level rule are then determined through normalisation.

figure f

3.3.2 Selection functions

We now describe how the selection functions are constructed given the situation value functions. We give the mapping for plan selection as an example, and the others follow similarly. We define

$$\begin{aligned} \qquad \qquad app(\mathcal {B}, \Delta ) = \Bigg \{P \in \Delta \mid P = \varphi : Q,\, \mathcal {B} \models \varphi \Bigg \} \end{aligned}$$

as a filter that chooses the applicable plans given a specific belief set \(\mathcal {B}\) and set of plans \(\Delta \). Here we use Q to indicate a plan-body. The plan selection function is then defined as

$$\begin{aligned} \qquad \qquad \mathcal {S}^p_p(\mathcal {B}, \Delta )= {\left\{ \begin{array}{ll} \mu &{} \text {if } app(\mathcal {B},\Delta ) \ne \emptyset \\ {[}\bot \mapsto 1] &{} \text {otherwise} \end{array}\right. }\\ \mu = \left[ P_1 \mapsto \frac{\theta _{1}(\mathcal {B})}{N}\ldots , P_n \mapsto \frac{\theta _{n}(\mathcal {B})}{N}, \bot \mapsto 0 \right] \end{aligned}$$

with \(P_{i\in \{1,\ldots ,n\}} \in app(\mathcal {B},\Delta )\) and \(N = \sum _{i=1}^{n} \theta _{i}(\mathcal {B})\). That is, for a non-empty set of applicable plans we normalise the situation values into the range \(0 \le p \le 1\) allowing them to be used as a probability distribution. If there are no applicable plans, or the plan set is empty, we select \(\bot \) with probability 1, allowing different Can rules (e.g. failure recovery \(\rhd _{\bot }\) in Fig. 2b) to apply.

3.3.3 Action outcomes

Action outcomes are statically defined based on estimates of environmental effects at design time. We attach the static situation value functions, i.e. values, to each effect using the following syntax:

$$\begin{aligned} \qquad \qquad act = \varphi \leftarrow [\langle \phi _{1}^{-},\phi _{1}^{+} \rangle [\theta _1], \dots , \langle \phi _{n}^{-},\phi _{n}^{+} \rangle [\theta _n]] \end{aligned}$$

As before, the specific probabilities are determined through normalisation.

Finally, we note that assigning static values to action outcomes has been considered extensively in the planning literature and has led to, e.g. probabilistic planning domain definition languages (PPDDL) [23], that consider multiple outcomes with associated probabilities (e.g. estimated from historical data).

4 Evaluation

We demonstrate, using a smart manufacturing example and existing probabilistic model checking tools, how to quantitatively analyse BDI agent programs. Specifically, we evaluate our probabilistic plan/event/intention selection against common strategies such as always selecting the most preferred plan. The results are promising, with the intention completion probability using probabilistic distributions being \(97\%\) higher than some strictly ordered plan and intention selection strategies.

We build on previous quantitative analysis, for the same example [20], by extending the experiments to include new agent-level operation selection strategies.

The models are freely available in BigraphER format online.Footnote 1 For quantitative analysis we use PRISM to check properties (through bigraph patterns) by importing the labelled DTMC produced by BigraphER. While we only give details of a single case study, users of the executable semantics can employ BigraphER to “run” models with different settings, e.g. external events, plan libraries, customised situation value functions.

4.1 Smart manufacturing example

We consider a robotic packaging scenario, extended from [24], where a robot packs products and moves them to a storage area. Products have specific temperatures and must be packed in a suitable wrapping bag to prevent decay. If the product stays on the production line too long, the temperature increases and it is spoiled and lost. Given multiple waiting products (i.e. events to trigger the operation) on the production line, the robot must choose which to handle first (event selection). Once chosen, the robot must then decide which wrapping to use: either premium or standard (plan selection). Premium wrapping is expensive but always stops product decay and never breaks. On the other hand, standard wrapping is cheap, only works if the product temperature remains low, and has a risk of breaking (a negative action outcome). The (partially-executed) plan-bodies for the selected plans become intentions that handle the products. Among all current intentions, the agent also needs to decide which intention to progress further, for example, moving to the storage once finishing wrapping (intention selection)

Complexity arises from the following factors: (1) losses avoided depend on when a product is packed, (2) when a product is packed determines which wrappings are applicable; earlier packing means cheaper bags, (3) cheaper wrappings introduce uncertainty as they may break. A formal model of the agent system allows us to quantitatively reason about the robot’s behaviours under this uncertainty and use these results as evidence, e.g. for regulatory certification. Furthermore, it can help improve the design of the robot, e.g. using a standard wrapping as often possible but within tolerable failure threshold.

4.2 Agent design

We consider a simplified scenario with two products that are initially present on the production line, i.e. there are no dynamic events. The agent program is given in Listing 2 and we assume beliefs are in a propositional logic with numerical comparisons.

Fig. 4
figure 4

Agent design employing the syntax of Sect. 2.1 with the situation value functions

Products awaiting processing are captured by external events shown in lines 9 and 10, e.g. e_product1 with its situation value function \(\theta _{13}\) (explained below). The agent responds to the events using a declarative goal on line 2 that states it wants to achieve the state success1 (i.e. wrapped and moved) through addressing the (internal) event e_process_product1; failing if failure1 (i.e. dropped or decayed) ever becomes true. Two plans (in lines 3 and 4), which represent the different wrappings, can handle the event e_process_product1 each with different situation value functions. Event e_product2 is handled in a similar way (in line 5–7).

The description of actions are given from lines 12 to 19. There is a probabilistic outcome for the move_product _standard1 action in line 13, such that it has a 10% chance of causing failure1 by dropping the product accidentally, else it succeeds (adding success1 to the beliefs), whereas move_product_premium1 action always succeeds in line 15. In Sect. 4.5 we investigate how varying action success probability effects the overall outcomes in the dedicated section.

To construct probabilistic distributions, we encode the (discrete) temporal information for progress and deadline. Progress determines how far (in terms of agent steps) an agent is through an intention, while deadline determines how many steps we can make before the product spoils. Mirroring implementations, we update timings in the background, without executing an explicit action. In this case, the progress increases whenever a specific intention is stepped, whereas deadline decreases after a step of any intention. That is, we use agent time (i.e. agent steps which is implicit in the semantics) rather than real-time (as this requires a secondary clock) to remain agnostic to the actual time required for each agent step which can be difficult to anticipate at design stage due to the delay or variation in real process deployed on the hardware.

Figure 4a gives the specifications for quantitative reasoning. A short commentary is as follows. \(de_1 = 10\) and \(de_2= 14\) are the initial deadlines of two external events: e_product1 and e_product2. The precondition \(\varphi _{11} = de_{1} \ge 3\) indicates whether \( de_{1}\) is greater than or equal to 3. The situation value function \(\theta _{11}= \langle 1, \{\varphi _{11}, 1\}, \textit{sum}\rangle \) indicates that if \(\varphi _{11} \) holds, then \(\theta _{11}(\varphi _{11})= 1 + 1 = 2\). The situation value description \(\theta _{13}\) for the external event e_product1 is defined as a function \( (de_1+ pr_1)^{-3} \). Intuitively, if \(de_1 + pr_1\) is smaller relative to other products, then it has been progressed less and the deadline is approaching, so it is more urgent. Finally, we have the situation value functions for the agent-level operations so that we ensure the highest weighting for the rule \(A_{event}\) when \(pr_1 + pr_2 \) is small (i.e. relatively low overall progress). When \(pr_1 + pr_2 \) gets bigger, the power of y in \((4-y)\cdot (pr_1 + pr_2)^y\) ensures the rule \(A_{step}\) and \(A_{update}\) to have a higher weight than \(A_{event}\), and \(A_{update}\) higher than \(A_{step}\) as well. Importantly, all of deadline values and the choice of situation value descriptions are made by the agent designer, i.e. \((de_1 + pr_1)^{-3}\) was their choice. Our approach enables the analysis of alternative functions quantitatively, before deploying the agent.

4.3 Selection strategies

We experiment with multiple selection strategies used by the agent (in Listing  2), including: agent-level operation selection, event/intention selection, and plan selection, which are standard in work e.g. [7]. A summary is given in Table 1, and we are particularly interested in selection strategies that use dynamic distributions based on domain-specific information (excluding uniform random selection strategies). A short commentary for each selection mechanism is given next.

Table 1 Selection strategies

4.3.1 Agent-level operation selection strategies

At each agent step an agent can either: incorporate any pending external events through Can rule \(A_{event} \), select an intention and execute a step through Can rule \(A_{step}\) (according to intention-level semantics), or remove unprogressable intentions from intention set using rule \(A_{update} \).

Here, agent-level operation selection strategies control which agent-level operation will be applied, e.g. select a new event or progress an existing intention. In our smart manufacturing example, this affects when a waiting product is initially handled, and how long it takes to pack a product. We use two different selection strategies: the SIP (Select In Priority) strategy selects agent-level operations in a priority order: pending events are adopted first, then intentions are progressed, finally when there are no events/intentions it finally removes them from the intention set. The ProD (Progress Distribution) strategy instead selects an agent-level operation from a dynamic probability distribution based on the current progress of the agent. Initially we bias towards adopting events (to give the agent work to do), and, as the agent progresses, we increase the probability of progressing intentions instead (to finish tasks before becoming overwhelmed). Finally, we garbage collect unprogressable intentions near the end of a run (when there is less work to do). The functions to compute this distribution are in lines 20–22 in Listing 2.

4.3.2 Event/intention and plan selection strategies

For event and intention selection, the SMU (Select Most Urgent) strategy always selects the intention closest to the deadline. FIFO (First-In-First-Out) and RR (Round-Robin) are fixed orders where the former always selects the intention which arrives first and the latter selects each intention in turn. The UD (Urgency Distribution) strategy selects an intention by sampling from a distribution where situation value function is given by \( (de+pr)^{-3}\). Unlike the UD, the CUD (Conditioned Urgency Distribution) only deems an intention urgent if the product is not packed or spoiled. As such, it will not select an intention in which the product is packed when there is another intention whose product is not packed. Finally, OCUD (Optimised Conditioned Urgency Distribution) selects an intention similarly to CUD but the situation value description is revised to be \(\vert de+pr - steps\_expected\vert ^{-3}\), which accounts for the steps remaining to pack a product (to avoid spoilage).

Fig. 5
figure 5

a Example bigraph, b reaction rule, and c result after applying (b) to (a)

For plan selection, SMP always selects the highest weighted plan, while PreD selects a plan by sampling distribution based on preference.

4.4 Encoding in bigraphs

In this section we show how we encode agent design, probabilistic agent semantics, different strategies, and logical predicates in bigraphs. We begin with a brief introduction to bigraphs.

4.4.1 Bigraphs

Bigraphs are a universal graph-based modelling formalism introduced by Milner [25], with conditional, priority, parameterised, and probabilistic extensions [18, 26]. They have an algebraic and diagrammatic form, we employ mainly the latter here.

Fig. 6
figure 6

Reaction rule choose_a_event

An example bigraph is in Fig. 5a. It consists of a set of entities, e.g. A, B, drawn as (coloured) shapes.Footnote 2 Entities can be related through nesting (to arbitrary depth), e.g. the B entities inside A. Entities can also be related through hyperlinks (permitting any-to-any links rather than just one-to-one as is usual), such as the green link between the B and C entities. Entities have a fixed number of links, called the arity, although a link can be disconnected as shown by the C entity in Fig. 5c. The name x means this link is open and can connect to other (unspecified) parts of the system. Likewise, the filled grey rectangles denote that other (unspecified) entities can exist here. Dashed unfilled rectangles are regions that represent parallel parts of the system: that is, these two regions can, but do not have to, share a single parent in some larger system model.

A bigraph represents a system at a single point in time. To allow models to evolve over time we can specify reaction rules of the form , where L and R are bigraphs. Intuitively, a bigraph B evolves to \(B'\) by matching and rewriting an occurrence of L in B with R. Such a reaction is indicated with . Given an initial bigraph and set of reaction rules, we can derive a transition system capturing all possible behaviours.

An example reaction rule is in Fig. 5b, which models the disconnection of B and C and also removes the nesting of B in A. The filled grey rectangles are called sites and represent parts of the model, below some entity, that have been abstracted away. That is, it allows matching on an A with multiple children. Without the site, the rule would only match when A had a single B child. Similarly, the use of the open name x means that the B can be connected not just to the C but also elsewhere, in this case the other B. As B remains connected to x the link remains connected in the result (likewise if it had been C connected to x then it would remain connected in the result). Reaction rules can affect both linking and placement, as shown here with the B entity also moving next to C.

Priority rewriting [26] permits an ordering on rules, defined by specifying classes of rules and an ordering between the classes. A reaction of lower priority can be applied only when no reaction of higher priority is applicable. Probabilistic bigraphs [18] permit rules be weighted, e.g. and , such that if both (and only) \(\mathtt {t_{1}}\) and \(\mathtt {t_{2}}\) are applicable then \(\mathtt {t_{1}}\) is twice as likely to apply as \(\mathtt {t_{2}}\). We allow rule priorities, where a reaction of lower priority can be applied only if no reaction of higher priority is applicable. We write \(\{\texttt{r1}\} < \{\texttt{r2}\}\) to denote when sets of rules have higher priority.

The encoding of probabilistic Can in probabilistic bigraphs follows directly from the encoding for the non-probabilistic versions [19]. For example, the encoding of agent design remains the same while there is a syntax change from in BigaphER for any deterministic intention-level semantics rule that is assigned with default probability 1. Additional rules are required, and some rules must be updated, to support different selection strategies and we describe these changes in the coming section. Importantly, the different strategies define a family of related models rather than a single model with different strategy selection. This means the same rule might appear differently, e.g. with different parameters, depending on the specific strategies we are implementing.

Notation We use fonts to distinguish between Can semantics rules, e.g. \(A_{event}\), bigraph reaction rules, e.g. \(\mathtt {choose\_a\_{event}}\), and bigraph entities, e.g. \(\textsf{Aevent}\).

4.4.2 Encoding agent-level operation selection strategies

To encode the selection of agent-level operation, we add the following new reaction rules:

$$\begin{aligned} \qquad \qquad \{\mathtt {choose\_a\_{update}},\, \mathtt {choose\_a\_{step}}, \\ \mathtt {choose\_a\_{event}}\} \end{aligned}$$

These reaction rules determine the next agent-level operation e.g. if \(\mathtt {choose\_a\_{event}}\) (illustrated in Fig. 6) is applied, then the agent-level rule \(A_{event}\) is applied at the next step (Fig. 7).

Fig. 7
figure 7

Reaction rule a_event adds an event to the intention set if it matches token Aevent. The dashed arrows (called the instantiation map in bigraphs) forces the site in the right hand side to be the copy of the site on the left

Fig. 8
figure 8

Reaction rule \(\mathtt {choose\_a\_{event}}(pr_1, pr_2)\) where we use parameterised entities, e.g. \(\textsf{Progress}(pr_1)\), to represent the steps of an event which has been progressed and \(\theta _1 = 3\cdot (pr_1 + pr_2)\)

Priorities can be used to implement selection strategies such as SIP, which selects agent operations in a priority order. That is, we can assign rule priorities:

$$\begin{aligned} \qquad \qquad \{\mathtt {choose\_a\_{update}}\}<\,&\{\mathtt {choose\_a\_{step}}\} \\ <\,&\{\mathtt {choose\_a\_{event}}\} \end{aligned}$$

To encode the ProD strategy, which selects an agent-level operation rule from a dynamic distribution, we employ parameterised reactions that define a family of rules. For example, reaction \(\texttt{r}(k)\) generates a set of rules \(\texttt{r}(k_1),\texttt{r}(k_2),\dots \) for all values of k. We then define the ProD strategy by:

$$\begin{aligned} \qquad \qquad \{ \mathtt {choose\_a\_{event}}(pr_1, pr_2),\\ \quad \mathtt {choose\_a\_{step}}(pr_1, pr_2),\\ \qquad \mathtt {choose\_a\_{update}}(pr_1, pr_2)\} \end{aligned}$$

where \(pr_1\) and \(pr_2\) denote the steps being applied to events \(\texttt {e\_product1}\) and \(\texttt {e\_product2}\), respectively. Recall in Sect. 4.3.1, we have the situation value of selecting each agent-level rule (using the same syntax as lines 20–22 in Listing  2):

$$\begin{aligned} \qquad \qquad A_{event} [\theta _{1}],\, A_{step} [\theta _{2}],\, A_{update} [\theta _{3}] \end{aligned}$$

where the situation value function \(\theta _{i}\) is \((4 - i) \cdot (pr_1 + pr_2)^i\) where \(m_i\) is a positive number, \( i \in \{1, 2,3\}\). Therefore, reaction rule \(\mathtt {choose\_a\_{event}}\) has weight \(3\cdot (pr_1 + pr_2)\), \(\mathtt {choose\_a\_{step}}\) \(2\cdot (pr_1 + pr_2)^2\), and \(\mathtt {choose\_a\_{update}}\) \((pr_1 + pr_2)^3\). As an example, reaction rule \(\mathtt {choose\_a\_{event}}(pr_1, pr_2)\) is illustrated in Fig. 8. This is a modification of Fig. 6 including the progress step of two products.

The weights are normalised (automatically by BigraphER) based on which reaction rules are applicable to a given (bigraphical encoding of) agent state. For example, if \(pr_1 = 0\) (\(\texttt {e\_product1}\) is not adopted in the intention set) and \(pr_2 = 1\) (\(\texttt {e\_product2}\) is adopted in the intention set and ready for being progressed), both \(\mathtt {choose\_a\_{event}}\) and \(\mathtt {choose\_a\_{event}}\) are applicable. Then we have the weight of rule \(\mathtt {choose\_a\_{event}}\) is 3 and \(\mathtt {choose\_a\_{step}}\) is 2. After normalisation, the final probability of selecting corresponding Can rule \(A_{event}\) is 0.6, while the probability of selecting the corresponding \(A_{step}\) is 0.4. Such a distribution indicates that at this early stage, the agent is more likely to adopt any pending event over progressing the existing intention. Importantly, these reaction rules are all in the same priority class (in the bigraph models) meaning any could be applied at each step, and the probabilities indicate relative likelihoods.

Fig. 9
figure 9

Reaction rule a_step encodes Round-Robin intention selection via a token entity \(\textsf{Pointer}\) that moves after each step. Entity \(\textsf{Intent}\), highlighted in red, denotes that this intention will be progressed through intention-level rules

4.4.3 Encoding event/intention and plan selection strategies

To encode the event/intention selection strategies, we modify the reaction rules for corresponding to agent-level rules of \(A_{event}\), \(A_{step}\), and \(A_{update}\) from our previous work [19].

The first event/intention selection strategy is to always select the most urgent (SMU). To encode this, we can continue to employ the parameterised reaction. Unlike using parameters for obtaining the weight in ProD previously, parameter is used to control whether the first event or its related intention should be selected. For example, the parameterised reaction rule \(\mathtt {a\_event}(1)\) only select the \(\texttt {e\_product1}\), i.e. parameter 1 corresponds to the product identifier number. Since we know, in Listing  2, that the deadline of \(\texttt {e\_product1}\) is nearer (hence more urgent) than \(\texttt {e\_product2}\) We can have the following priority to implement the SMU strategy:

$$\begin{aligned} \qquad \qquad \{\mathtt {a\_{event}}(2),\, \mathtt {a\_{step}}(2),\, \mathtt {a\_{update}}(2)\} \\ \quad < \{\mathtt {a\_{event}}(1),\, \mathtt {a\_{step}}(1),\, \mathtt {a\_{update}}(1)\} \end{aligned}$$

Similarly, we can have the following priority reactions for First-In-First-Out (FIFO) event/intention selection strategy under the assumption that \(\texttt {e\_product2}\) arrives earlier than \(\texttt {e\_product1}\) (otherwise it will be same as SMU):

$$\begin{aligned} \qquad \qquad \{\mathtt {a\_{event}}(1),\, \mathtt {a\_{step}}(1),\, \mathtt {a\_{update}}(1)\} \\ \quad < \{\mathtt {a\_{event}}(2),\, \mathtt {a\_{step}}(2),\, \mathtt {a\_{update}}(2)\} \end{aligned}$$

To encode the Round-Robin (RR) strategy, we also use some auxiliary token to control the order of selection. For instance, we have \(\mathtt {a\_{step}}\) to select an intention which is linked with an entity \(\textsf{Pointer}\) (a token showing that now it should be progressed). In our example with two intentions (corresponding to two products), the token \(\textsf{Pointer}\) will be moved to one intention after the other one is processed one step (through rule a_step). Illustrated in Fig. 9, the execution of an intention linked through \(x_1\) with an event labelled with \(\textsf{Pointer}\) will result in moving the \(\textsf{Pointer}\) to another event (linked through \(x_2\)) and vice versa.

To encode urgency distribution (UD), we have the following set of parameterised reactions for agent-level rules:

$$\begin{aligned} \qquad \qquad \{ \mathtt {a\_{event}}(pr, de),\, \mathtt {a\_{step}}(pr, de), \\ \mathtt {a\_{update}}(pr, de)\} \end{aligned}$$

where pr and de denote how many steps have been progressed and are left for an event or its related intention. Each rule has the weight of \(1/(pr + de)^3\) (according to Sect. 4.3.2).

Conditioned urgency distribution (CUD) is same as UD, but only deems an intention urgent if the product is not packed or spoiled. To encode it, we employ conditional bigraphs [26] that allow application conditions to specify contextual requirements within the bigraphical system. As such, we only need to add the contextual requirement to reaction rule \(\mathtt {a\_{step}}(pr, de)\) (illustrated in Fig. 10).

Fig. 10
figure 10

Conditional reaction rule \(\mathtt {a\_step}(pr,de)\) for CUD strategy with \(\theta = 1/(pr + de)^3 \). The symbol − indicates a negative condition i.e. that the bigraph of the condition should not appear/be matched. Packed and Spoiled are bigraphs representing the unwanted product statues. The \(\downarrow \) means we do not want these states to appear in (any of) the sites. The red highlighted Intent on the right hand side means this intention will be progressed by further rules

Fig. 11
figure 11

Bigraph representation of actions (\(\textsf{Act}\)) with a set of outcomes (\(\textsf{Effect}\)) each one containing a parameterised entity (\(\textsf{EffWeight}(n)\)) indicating its weight

Fig. 12
figure 12

Bigraph rule for \(action^p\) executing an action with an effect having the parameterised entity (\(\textsf{EffWeight}(n)\)). The dashed arrows (called the instantiation map in bigraphs) forces the site in the right hand side to be the copy of the site on the left. The green circle stands the bigraph of the applicability of pre-condition of this action and the red highlighted entity Act implies that this action is to be executed

The optimised conditioned urgency distribution (OCUD) is the same as (CUD) but with weight \(\vert de+pr - steps\_expected\vert ^{-3}\), where \(step\_expected\) accounts for the number of steps that is expected to pack a product (to avoid spoilage). This value can be obtained through a simulation on one single product in BigraphER.

Finally, the similar priority (resp. parameter) approach can be used to encode the plan selection strategy SMP which selects the highest weighted plan and ProD which selects a plan by sampling distribution based on preference.

4.4.4 Encoding probabilistic action outcomes

To encode probabilistic outcomes, we extend the representation of action with a set of outcomes. Each outcome is pre-assigned with a parameterised bigraph entity \(\textsf{EffWeight}(n)\). Figure 11 shows the bigraph representation of action move_product_standard1 from line 13 in Listing 2. To execute an action with probabilistic outcomes, we can encode intention-level rule \(act^p\) (in Sect. 3.2.1) as a parameterised reaction illustrated in Fig. 12 with transition weight n based on the given \(\textsf{EffWeight}(n)\). This is then normalised to a probability by BigraphER.

4.4.5 Intention success and failure

Can does not indicate whether an intention has completed successfully or with a failure. This is, \(A_{update}\) (Fig. 2a) removes a completed intention from the intention base, regardless if it had completed successfully (was the nil program) or if it could not make any further progress (failed). Following previous work [19] (which encodes standard Can semantics), we overcome this limitation in two ways: 1. we add identifiers to each intention using the event name that generated the intention (not possible in Can semantics); 2. We encode the Can rule \(A_{update}\) as two different bigraph reaction rules: one handling a successfully completed intention, and the other a failed intention. This allows the rules to add additional entities to track intention state (either Success or Failure).

To add the labels to output DTMC states, we use two bigraph patterns, for success and failure, as shown in in Fig. 13. Once states are labelled, can use these in an eventually PCTL formulae for PRISM. This gives a general approach to reason about each intention individually, or a combination of intentions (through conjugation). For example, we use the formula \(\mathcal {P}_{=?} {\textbf {F}}[S(1)\wedge S(2)]\) in our packing use-case, which computes the probability that both products are processed successfully.

4.5 Analysis

Table 2 gives the probability of processing the products successfully or with a failure, under different agent-level operation selection, and event/intention selection strategies, with SMP chosen for plan selection. We use the shorthand (X1,Y2) to stand for \(\mathcal {P}_{=?} {\textbf {F}}[X(1)\wedge Y(2)]\) where X and Y are drawn from \(\{S,F\}\) denoting success or failure.

We see the necessity for good event/intention selection, with the first three combinations never successfully processing both products, i.e. (S1, S2). Using UD, it starts to have limited success (\(p = 0.06\) for SIP and \(p = 0.05\) for ProD). With UD, the chance of succeeding with product 1 increases to more than 50%, whereas the failure of product 2 is nearly 72%. This indicates the weighting function is skewed toward product 1 at the detriment of product 2, leading to the improved CUD strategy. This is a key advantage of our approach: discovering potential pitfalls and trialling new strategies without changing the underlying agent programs and semantics. Similar reasoning, that now product 2 was succeeding more often under strategy OCUD being trialled with extremely good success rates, e.g. \(p = 0.97\). We should never expect the probability of (S1, S2) \(= 1\) due to the action outcome uncertainty (e.g. the wrapping bag breaks).

Fig. 13
figure 13

Bigraph patterns for checking intention success and failure. a S(i): product i completes successfully. b F(i): product i completes with failure

Table 2 Probability of (product 1, product 2) completing successfully/with failure for different agent-level operation selection and event/intention selection strategies using the SMP (Select Most Prefered) plan selection

We also see a better performance of SIP strategy for agent-level operation selection. For example, the probability of successfully processing both products with the last three event/intention strategies under ProD is consistently lower (though by margin) than those under SIP. In general, any success rate improvement, even marginally, should be used as it can result in great savings—particularly in large scale processes, e.g. an expected two-product successful behaviour tending to occur 97% of the time instead of 90%. It shows that it is better to get the event adoption done in the beginning of the agent operation, updating unprogressable intention in the end, and the usual intention progress in the middle. In particular, ProD has a significantly detrimental effect when having RR for plan selection, with \(p = 0.8\) for (S1, S2). The reason is that under ProD the agent may still continue to progress the same event just after it has been adopted, or remove an unprogressable intention just after progressing it. Both of the situations can lead to the other product being left there too long and becoming spoiled before it needs to be packed. Interestingly, agent-level operation selection seems to make no difference for event/intention selection strategies (i.e. SMU and FIFO).

Table 3 Probability of product 1, product 2 for the properties, e.g. (S1, S2) with different event/intention selection strategies and plan selection strategies, and SIP (Select In Priority) for agent-level operation selection given in Table 1
Fig. 14
figure 14

Probability of reaching the final state (product 1, product 2) with increasing failure probability in (SIP, SMU, PreD), (SIP, RR, PreD), and (SIP, OCUD, PreD)

Table 3 gives the probability of processing the products either successfully or with a failure, under different event/intention selection strategies and plan selection strategies, but with SIP for agent-level operation selection listed in Table 1. In this example, we find that plan selection has limited effect compared to event/intention selection, but nevertheless positive effects, which is key to this application. This itself is a valuable insight, and given the complexity of agent behaviours, determining this expected probability precisely, without such a model, would be difficult. In particular, we can see that when the event/intention selection is in a one-by-one manner, selecting plans from a dynamic distribution is much more useful, e.g. an expected two-product failure behaviour tending to occur 7% of the time instead of 10%.

The effects of different action outcomes are shown in Fig. 14 where the probability of standard wrapping failing is increased from 10% to 90% for three strategy pairs: (SIP, SMU, PreD), (SIP, RR, PreD), and (SIP, OCUD, PreD)

We can see that negative action outcomes have a much larger effect on strictly ordered intention selection (SMU), e.g. the probability of (S1,F2) decreases from over 90% to below 40%. Meanwhile, (SIP, OCUD, PreD) is more robust to action outcome changes. For example, the probability of (S1, S2) in it has a minor decrease of no more than \(20\%\). This is due to increased interleaving of these two intentions, rendering the standard wrapping inapplicable more often. In particular, in (SIP, RR, PreD), the strategy of round-robin renders the standard wrapping inapplicable at all times when handling product 2. As such, the probability changes of standard wrapping failing has no impact to the final result in this case.

5 Discussion

We reflect on the insights gained by constructing a probabilistic extension of Can language, including the process of building bigraph models. We detail our first-hand experience of the value and limits of the bigraph approach applied to agent languages and their policies, which is not included in our previous work e.g. [20].

By building on an existing encoding of Can in bigraphs [19], much of the probabilistic extension required limited effort. For example, most (deterministic) Can semantics rules are modified to be probabilistic rules with a probability 1. For bigraphs, this requires the change of a reaction rule from . The design of each agent (e.g. plan library) remains unchanged. This is because we focus on probabilistic behaviours rather than the probabilistic knowledge (e.g. probabilistic belief bases [27] that we detail in Sect. 6). Although it is promising to analyse an agent with both probabilistic behaviours and knowledge, it remains a challenging task in the context of verification given the computational complexity of uncertainty theories and their revision strategies. A feasible starting point of this integration can be an executable encoding of a belief base of a BDI agent in any form of these uncertainties theory together with its revision strategies in bigraphs before the final step of verification analysis, which we leave it as future work.

A characteristic of Can is that the transition rules can be given incrementally, i.e. a modular operational semantics. In this case, the modularity in Can separates how to evolve an intention (i.e. the intention-level semantics) from how to evolve the whole agent (i.e. the agent-level semantics). This approach has its merits, for example, we can easily extend or modify one side of the semantics (e.g. the agent-level) without altering the other one. As such, it allows us to separate concerns (decreasing errors) efficiently in the modelling plan/event/intention selection strategies. In particular, such a modular operational semantics turned out to be beneficial when analysing various combination of these plan/event/intention selection strategies under different action outcomes (seen in Sect. 4.5).

When modelling different plan/event/intention selection strategies, we found that BigraphER provides expressive and highly flexible features to construct these. For example, the priority classes of BigraphER naturally supports ordered selection strategies (e.g. First-In-First-Out) while conditional rules allow fixed schedules (e.g. Round-Robin) through auxiliary token entities. These strategies are often domain-independent (regardless of agent designs) as they do not require the details of specific intention to make a decision.

The parameterised reaction rules in BigraphER also play a key role in probabilistic selection (especially from dynamic distribution based on run-time domain-specific information). These probabilistic selection strategies are likely to be domain-dependent as they require the domain-specific knowledge, e.g. deadlines. In practice, we often needed a set of new rules to capture some domain specific information updates. For example, the temporal information of progress and deadline in smart manufacturing case are maintained as separately: to increase the progress of an event whenever either such an event or its related intention is stepped, whereas to decrease the deadline of all events after an application of any agent-level rule. Mirroring implementations, we update timings in the background through applying instantaneous reaction rules [15] on the bigraphs. To write these new instantaneous reaction rules, it is often sufficient to have some form of discrete step update/manipulation rules. As these instantaneous rules do not show up in the resulting transition system, it does not affect our analysis on the agent behaviours, which is the focus of our work. We also note that these temporal timings are treated as agent steps which suffices in our case (rather than real times). However, a domain-specific mapping of real time of each agent step on different agent programs can be specified.

Our modelling approach comes with some limitations. Firstly, our approach does not naturally support mixed strategies, e.g. starting with an ordered strategy before swapping to probabilistic distribution. Modelling mixed strategies could be possible by providing conditioned selection strategies. In this approach, certain strategies are only applicable if related conditions are held in the environment. To model this in bigraphs, we could constrain strategies using pre-defined ranges of agent-operation parameter steps, e.g. use strategy x when deadline is less than 5. Secondly, there is a growing amount of work employing external advanced decision-making tools to solve selection problems (which we detail Sect. 6). Our approach cannot be compared directly with complex external decision-making techniques as advanced decision-making tools are often black-box techniques that we can not easily derive a discrete formal model for. For example, [28] formalises the intention selection problem in BDI agents as a planning problem in the PDDL description language [29]. A possible way to compare with selection strategies offered by this advanced decision-making tools is to employ models at runtime by taking model updates as inputs from external decision-making tools. Finally, we do not model costs/rewards, e.g. the price of the wrapping bags in Sect. 4.2. Utilising costs/rewards, we could perform multi-objective optimisation, e.g. achieving different success rates and robustness to action outcomes while keeping the overall cost low.

Besides ensuring the agent model is correct, it is important to make sure the model deals with the right issue and helps ask the right questions. Our computational models can help agent programmers understand the behaviours of the agent they design before they are even employed. In other words, these allow agent programmers to do virtual “what if?” experiments—even changing the rules of how this detail operates—before we try things out for real. For example, the agent designer can use our models to understand the potential consequences of choices of different selection strategies quantitatively, which selection strategy has a dominant effect regard to the task completion in the given scenario, and how to parameterise the best probabilistic distribution, all of which our approach supports and have demonstrated in Sect. 4. As such, the designer can have the quantitative assurance on the behaviours of the agents which they programmed.

6 Related work

Verifying BDI agents through model checking and theorem proving has been well explored (seen in survey [30]). For example, the work [31] (resp. [32]) applies the Java PathFinder model-checker (resps. Isabelle/HOL proof assistant) to verify BDI programs in a non-deterministic fashion. Recent work also started considering probabilistic verification of BDI agents. The work [33] uses a two-stage verification methods that first generates a model through program model checking (of a system implementation), and then converts this model to PRISM input format for analysis. However, unlike our focus on probabilistic extensions of the BDI semantics itself, the BDI agent used in [33] does not contain any probabilistic aspects. Instead, the environment where the agent executes enables the probabilistic reasoning. Similarly, the work of [34] facilitates probabilistic verification of BDI agents by encoding them in PRISM. In this case, instead of generating the model based on an implementation, they implement a significantly simplified version of AgentSpeak directly in PRISM. The simplifications deviate from realistic BDI agents, e.g. enabling truly-concurrent intentions (and no intention selection) and treating plan selection as non-deterministic. Our approach faithfully modelled the full Can semantics with various selection strategies development support while still providing PRISM verification capabilities. One of the closest work to us is perhaps the work [35] which introduces probabilistic state transitions in BDI agents. Same as us, they are motivated to capture the situation such as “if an agent at state \(s_1\) executes an action, then it transfers to state \(s_2\) with probability 0.7, or transfers to state \(s_3\) with probability 0.3”, which is difficult to reason in standard BDI agents. Unlike our work on the level of a BDI programming language, however, their work is to propose a modal logic system and proofs of properties are obtained from the resulting deduction system. Notably, a main disadvantage of their work is that the description with the probability is restricted to the transition between current time and the next time due to its next-time-like temporal operator to construct a proof system based on the tableau method.

Besides BDI agents, quantitative verification techniques have also applied to other types of agent systems. For example, the work of [36] considers uncertain communication channels between systems of interacting agents. For verification the multi-agent system is transformed to finite state Markov chains for establishing quantitative temporal properties of the system. Similar to our evaluation of plan/event/intention selection strategies, the work of [37] provides a quantitative assessment for a decentralised control policies in multi-vehicle scenarios. Specifically they study conflict resolution policies to ensure that a policy never causes collisions under some mild assumptions on the initial conditions. For general agent-based verification (which is beyond the scope of this work), we refer to [38] for the interested readers.

Works studying plan and intention selection strategies have also been well investigated separately within the BDI community. In fact, most BDI platforms provide some forms of hooks that allow the agent developers to control which plan is adopted. For example, the plan selection function in [7] is a user-defined function to customise plan/intention selection for a particular application domain. Meanwhile, various plan selection strategies such as precedence-based selection (e.g. preference) is also studied in [39] to select more preferred plans (according to some domain-specific plan characteristics).

Unlike plan selection to choose the “best” means to achieve an event, the intention selection which decides about which intention is the best to execute next often comes as how to manage interleaving. Therefore, it is possible that the interleaving of steps in different intentions may result in undesired outcomes such as overlooked product left to be spoiled in our smart manufacturing scenario. To manage intention interleaving, researchers tend to employ external tools to help the agent to pursue multiple intentions in parallel. For example, the work of [24] compiles agent programs to TÆMS (Task Analysis, Environment Modelling, and Simulation) framework to represent the coordination aspects of problems such as “enables” and “hinders” relations between tasks. A Design-To-Criteria scheduler is then used for intention selection to determine the full set of decisions that the agent needs to perform. The work [40] applies the Single-Player Monte Carlo Tree Search [41] to selects which intention to progress at the current step. The work [28] showed that many of the intention selection issues can be modelled in planning domain definition language (PDDL) [42] (the de-facto standard planning language) and resolved through suitable planners such as a modern highly efficient (online) planner [43]. In fact, an increasingly popular topic in the BDI community is intention progression [44], e.g. the Intention Progression Contest.Footnote 3

However, the goal of these plan and intention selection studies above in BDI community is to help the agent to make better decisions, by modifying or replacing entirely the original BDI reasoning, either through some extra booking of domain-specific information or through other advanced decision-making techniques. On the contrary, the focus of our work (arguably complementary to them at large) is to provide an automated quantitative analysis of BDI agents under different common selection strategies (though excluding the strategies provided by the external tool-based approach).

Existing work provides “what-if” analysis capability for BDI agents through simulation. For example, [45, 46] proposes an evacuation model using BDI agents and other network-oriented modelling approaches (e.g. [47]). This model simulates crowd behaviour to evaluate the effects of changing psychological and socio-cultural factor parameters. While useful, simulations or experiments only examine a subset of all the possible behaviours of the given system. If the resulting system is to be used in safety-critical areas, the above approaches guarantee little about actual system behaviour. Instead, we reason about systems through formal reasoning and verification, analysing all the possible behaviours of the system against pre-defined requirements.

There are many existing work on providing probabilistic capacity to BDI agents for various reasons. For example, the work [27] addressed the uncertainty in the belief base of a BDI agent e.g. due to sensor noise (\(80\%\) the agent believes a true and \(20\%\) the agent believes \(\lnot a\) true). To achieve so, they modelled the beliefs of an agent as a set of epistemic states and each state can use a distinct underlying uncertainty theory (e.g. probability and possibilities probabilities) with its own belief revision strategy. Similarly, it is possible to use Bayesian Networks to represent probabilistic knowledge in BDI agents [48]. Contrary to our approach in which probability comes to model the transition of agent behaviours for a quantitative behaviour analysis, their focus is to provide a quantitative approach of representing the knowledge (e.g. probabilistic beliefs) of the agent for, at best, standard agent qualitative testing. In particular, we note that a plan selection strategy has been proposed a BDI agent under probabilistic beliefs in [49]. However, such a plan selection strategy is abstract i.e. no actual implementation and, importantly, its feasibility and computational cost (e.g. tractability) remains unclear, in particular in context of practical formal verification. The work [50] presents an implementation of the appraisal process of emotions using an add-on probabilistic reasoning (specifically Bayesian networks) in BDI agents. According to appraisal theory [51], the appraisal depends of one’s goals and values, which can be represented as BDI agents’ events and beliefs, and is calculated by Bayesian networks to estimate, e.g. the undesirability (a value) of being in a smashed state (representing the emotion of fear) for a robot.

7 Future work

Once we accept probabilistic reasoning inside an agent, it quickly becomes apparent we could consider an external uncertain environment. Currently, only some aspects of an uncertain environment are addressed, i.e. interactions between agent and environment can be probabilistic. For example, the agent tries to open the door but may fail to open it. However, the environment may change itself due to, e.g. natural phenomena for example, \(30\%\) chance it will rain tomorrow. We may need to assess whether the agent behaves as required in all possible environmental changes. The difficulty is to obtain a realistic environment abstraction that can be integrated with existing BDI semantics whilst avoiding state explosion due to branching in both environment changes and agent reasoning. We have previously considered self-dynamic environments (without probabilistic distribution) for BDI agents [52]. One way forward is to extend this with probabilities and integrate it with our new probabilistic semantics.

As discussed, BDI agents have several key decisions to make when operating: which event to handle first (event selection), and which intention to progress next (intention selection). Given the number of decisions faced by an agent, we may want to synthesise a strategy to determine ahead-of-time the decisions an agent should make e.g. to avoid the worst-case execution. Though we cannot replicate some advanced decision-making techniques e.g. from the planning community, formal verification does offer some strategy synthesis capabilities. For example, model checkers can give a trace of evolution that makes some reachability-related properties hold (e.g. some goals are achieved). To allow this, instead of using pre-defined selection strategies (fixed, round-robin, probabilistic choice), we can keep the non-determinism explicit and ask a model checker for a good strategy. This is our current ongoing work.

In principle, our current framework can support reasoning about multi-agents naively: each agent as a thread. Though feasible, we expect the state space will very quickly increase and be practically infeasible. A way forward is to enforce a schedule in which an agent can progress. Before we consider multi-agent settings, a fundamental question is “what are the interesting properties of agent behaviours in a multi-agent setting that we can obtain from analysing only a single agent?” If these properties are related to competition between these agents, then a game-theoretical approach (e.g. in [53]) might be more suitable than a full multi-agent setting.

BDI agents draw heavily from logic programming, e.g. Prolog [54], and feature similar syntax and semantics. Given the probabilistic extensions to logic languages, e.g. Prolog [55], a comparative study would allow research ideas to flow between the probabilistic BDI agent and probabilistic logic programming domains. We leave this investigation as future work.

8 Conclusions

A quantitative evaluation and comparison framework can aid design-time specification of agents by allowing us to reason about agents that exhibit probabilistic behaviours from uncertain or failed actuators, and probabilistic decision policies.

We have extended the Can language—that formalises the behaviour of a classical BDI agents including advanced features such as failure recovery and declarative goals—to a probabilistic setting, allowing both probabilistic action outcomes and probabilistic selections, e.g. of plans. The extended semantics is executable through an encoding to probabilistic bigraphs, which enables quantitative analysis using BigraphER and the probabilistic model checker PRISM. Importantly, this approach allows examination of the potential consequences of different selection strategies.

Through a smart manufacturing example we have shown that it is possible to reason about different combinations of selection strategies, and that probabilistic selection strategies can reduce the impact of undesirable outcomes, compared with ordered or fixed strategies. In this example, we found that plan selection has limited effect compared to intention selection, which is a valuable insight. In particular, due to the agent making smarter intention selection choices, the impact of action outcomes can be marginal—even when the failure probabilities are large.

Fig. 15
figure 15

Probabilistic extension of Can semantics from [3]