1 Introduction

Artifacts are structures that “combine data and process in an holistic manner” to describe business interactions, typically in a service-oriented architecture [1]. The data component is given by the relational databases underpinning the artifacts in a system, whereas the workflows are described by “lifecycles” associated with each artifact schema. Artifacts systems define complex workflow schemes based on artifacts. The system’s participants, or agents, interact with the artifact system by performing events on it.

Differently from services where typically only the process interfaces are advertised, in artifact-centric systems the data structures are also made public. Due to their expressiveness and flexibility, Artifact-centric architectures are increasingly being used in variety of application areas including case management systems [2]. Artifact centric systems are executed in a hub which provides the functionality for service execution. A flexible and powerful language for modelling and executing artifact-centric systems is the Guard-Stage-Milestone programming language (GSM). The open-source design and runtime engine Acsi Hub [3, 4] is an environment whereby system orchestration and choreography are executed.

If artifact-centric environments are to fulfil their promise to drive the future generation of data-intensive services, they need to be verifiable. This should involve not only the hub itself governing the interactions between artifact calls, but also, and crucially, the agents implementing the services in the system, as is normally done when reasoning about services [5]. In addition to providing correctness guarantees and rapid prototyping, techniques such as model checking can form the underpinnings for the implementation of automatic service orchestration and choreography [6].

In this paper we develop verification methodologies for artifact-centric systems implemented in GSM. Since GSM programs include data models, they are infinite state programs; it follows that traditional model checking methods based on finite-state machines cannot be applied to them. To address this problem we develop a novel predicate abstraction methodology [7] for GSM defined on a three-valued semantics to account for over- and under-approximation of the models. We also present GSMC, the first model checker for GSM, that implements the technique discussed. We evaluate the technique on a large industrial scale example.

Related Work. Several techniques for the verification of artifact-centric systems have been put forward [813]. While these provide considerable insight in the decidability and complexity of the verification problem, they do not provide a concrete verification technique for actual systems. The first contributions concerning the practical verification of GSM systems appeared in [14, 15]. These, however, are defined on coarse, user-given abstractions of GSM models where little data is present and ad-hoc restrictions on variable ranges are applied to obtain finite state systems. Additionally the specification language used is limited.

Incomplete verification methodologies operating directly on the source code have been developed in software verification. The abstraction techniques developed in this context normally target reachability properties only. However, 3-valued abstraction can be applied to specifications based on the \(\mu \)-calculus [16].

This paper extends existing work by providing 3-valued abstractions for GSM programs specified by a first-order version of the epistemic \(\mu \) calculus. This enables us to specify services not in purely propositional terms as it is traditionally done but, instead, by referring to the underlying databases.

2 The Guard-Stage-Milestone Language and Multi-agent Systems

While GSM provides a language for the realisation of artifact-centric systems, GSM on its own is not equipped with constructs for the implementation of external actors operating on the system. In GSM these are abstracted by events reaching the system.

However, to verify the possible executions of the system we need to represent how the agents interact with it. Artifact-centric Multi-Agent Systems (AC-MAS) were put forward in [15] to provide a semantics for GSM and the behaviours of external agents. We summarise these concepts below but refer to the cited literature for more details.

Fig. 1.
figure 1

A lifecycle model.

The Guard-Stage-Milestone (GSM) has recently been put forward as a declarative language for implementing artifact systems [17]. GSM describes an artifact system \({\varGamma }\) that depends on of artifact types that correspond to classes of key business entities. A system comprises of a number of artifact instances of artifact types. Each type has an information model, which gives an integrated view of the business data, and a hierarchical lifecycle model, which describes the structure and evolution of the business process. The artifact system interacts with its environment via events. The information model is partitioned into the set of data attributes, which hold business data, and the set of status attributes, which capture the state of the lifecycle model. Figure 1 illustrates a portion of the lifecycle of a manufacturing process and represents the core concepts: The boxes denote stages, which represent clusters of activity designed to achieve milestones (\(\circ \)) that represent operational objectives. A guard (\(\diamond \)) triggers activities in a stage when a certain condition is fulfilled. Both milestones and guards are controlled declaratively through sentries. A sentry of an artifact instance \(\iota \) is an expression \(\chi (\iota )\) in terms of incoming events, guards and milestones, and the status of the instance. In the example above, the Stage ‘Collecting Parts’ contains ‘Research & Order’, which is triggered by an external event; upon reaching the milestone ‘parts ordered’ the next stage ‘Receiving’ is activated.

The operational semantics for GSM is based on the notion of a business step (B-step). This is an atomic unit that corresponds to the effect of processing one incoming event. A B-step has the form of a tuple \(\sigma = ({\varSigma }, e, {\varSigma }')\), where e is an incoming external event and \({\varSigma }\), \({\varSigma }'\) are snapshots that capture the current and next state of the information model respectively.

The programming language GSM [4] provides the construct for the realisation of GSM systems; the semantics of GSM programs is given in terms of B-steps.

Artifact-Centric Multi-agent Systems. While GSM models the business artifacts, agents model the possible interactions that external actors and services may have with the artifact system. Below we summarise the key elements from [15] where a formalism for defining the behaviour of the agents, and their access to the artifact system, is described. The concepts of views and windows are used define which attributes and artifact instances are visible to an agent; events represent external actions that cause a change in the system. In the example above, views can be used to hide details like procurement of parts from a customer, while allowing access to higher level information, e.g., the start and end of the parts assembling process. Window, instead, can be used to hide orders that do not belong to a particular customer. While a view \(\nu \) and an event \(\epsilon \) are simple lists, a window \(\omega _i(\iota )\) is a formula that is evaluated for a specific artifact instance \(\iota \) and an agent i. The instance is exposed to the agent only if \(\omega _i(\iota )\) evaluates to true. The behaviour of an agent is given by its protocol \(\wp \) in terms of the visible state of the artifact system, and the agent’s unique ID and set of private variables var.

We formalise an agent-based GSM system for a set of agents \(\mathcal {A}\) operating on an environment given by the artifact system E through an Artifact-centric Multi-Agent Systems (AC-MAS) [9]. An AC-MAS \(\mathcal {P} = \langle S, \mathcal {I}, Act, \tau , {\varLambda }\rangle \), where \(S \subseteq L_E \times L_1 \times \dots \times L_n\) is the set of reachable global states, \(\mathcal {I}\) is the initial state, \(Act = Act_E \times Act_1 \times \dots \times Act_n\) is the set of actions, \(\tau : S \times Act \rightarrow 2^S\) is the global transition relation, and \({\varLambda }: S \rightarrow 2^{AP}\) is the evaluation relation for a set of propositions AP. A global state \((l_E, l_1, \dots , l_n) \in S\) for the system is given in terms of the snapshot \({\varSigma }\) of the artifact system for \(l_E\), and the accessible variables of each agent for \(l_1, \dots , l_n\). We also write \(l_i(s)\) to extract the visible state for agent i from a global state \(s\in S\). The sets of actions \(Act_E\) and \(Act_i\) are directly defined by the events the system provides and the permissions of the agents. The global transition relation \(\tau (s, \alpha )\) with \(s\in S\) and \(\alpha \in Act\) is given by the corresponding B-steps defined by GSM in combination with the protocols \(\wp \) of the agents, where only one agent can interact with the artifact system at a time while the others are idle.

The initial state \(\mathcal {I}\) is a global state with not artifact instances in \({\varSigma }\) and with all private variables set to their initial value. We write \(s \rightarrow s'\) iff there exists an \(\alpha \), such that \(s' \in \tau (s, \alpha )\); in this case \(s'\) a successor of s. A run r from s is an infinite sequence \(s^0 \rightarrow s^1 \rightarrow \ldots \) with \(s^0 = s\). We write r[i] for the i-th state in the run and \(r_s\) for the set of all runs starting from s. A state \(s'\) is reachable from s if there is a run from s that contains \(s'\), formally \(\exists r' \in r_s: \exists i \ge 0 :r'[i]=s'\). Note that portions of the global state may not be visible to an agent. In line with the standard semantics of epistemic logic [18], we say that the states s and \(s'\) are epistemically indistinguishable for agent i, or , iff \(l_i(s) = l_i(s')\), i.e., if agent i’s local state is the same in s and \(s'\).

3 Three-Valued Abstraction for AC-MAS

Predicate Abstraction [7] is a technique used to generate sound approximations of infinite state systems by grouping together system states satisfying certain properties into abstract states. May transition between abstract states correspond to possible transitions between some of corresponding concrete states. This leads to an over-approximation of the possible behaviour that is conservative for safety properties but may lead to unsound results otherwise. Three-valued abstraction has been employed [16, 19] to overcome these limitations. In three-valued abstraction a second transition relation (or must relation) is introduced to encode when a change in the corresponding concrete states must happen. This allows to concurrently maintain over- and under-approximations that are conservative for both positive and negative specifications and allows to detect when a result cannot be determined.

To extend this technique to AC-MAS, we introduce the three-valued semantics for the epistemic \(\mu \)-calculus and replace \(\tau \) with \(\tau _m\), the global may transition relation, and \(\tau _M\), the global must transition relation, to get \(\mathcal {P} = \langle S, \mathcal {I}, Act, \tau _m, \tau _M, {\varLambda }\rangle \). Analogously to the concrete case, we write () for \(t \in \tau _m(s,a)\) (\(t \in \tau _M(s,a)\)). Over- and under-approximations for the epistemic relations are denoted as and respectively. This extended definition of AC-MAS allows us to define abstraction formally as:

Definition 1

(Abstraction). Let \(\mathcal {P} = \langle S, \mathcal {I}, Act, \tau _m, \tau _M, {\varLambda }\rangle \) and \(\mathcal {P}' = \langle S', \mathcal {I}', Act', \tau '_m, \tau '_M, {\varLambda }' \rangle \) be AC-MAS over the same set \(\mathcal {A}\) of agents and sets \(AP' \subseteq AP\) of propositions. We say that \(\mathcal {P'}\) is an abstraction of \(\mathcal {P}\) if:

  1. 1.

    \(s' \in \mathcal {I}'\) iff there exists \(s \in \mathcal {I}\), such that \(s \in \gamma (s')\);

  2. 2.

    iff there exist \(s \in \gamma (s')\) and \(t \in \gamma (t')\), such that ;

  3. 3.

    iff for each \(s \in \gamma (s')\) there exists \(t \in \gamma (t')\), such that ;

  4. 4.

    iff there exist \(s \in \gamma (s'), t\in \gamma (t')\) such that or there exists \(u'\) such that and ;

  5. 5.

    iff for each \(s \in \gamma (s')\) there exists \(t \in \gamma (t')\), such that , and for each \(t \in \gamma (t')\) there exists \(s \in \gamma (s')\), such that ;

  6. 6.

    \(p \in {\varLambda }'(s')\) iff \(p \in {\varLambda }(s)\) for each \(s \in \gamma (s')\);

where \(\gamma : S' \mapsto 2^{S}\) is the concretisation function that maps each abstract state \(s' \in S'\) to the non-empty set of concrete states \(S_{s'} \subseteq S\) it represents; and are the may transition relations in \(\mathcal {P}'\) and \(\mathcal {P}\) respectively; and are the must transition relations; and are the may epistemic relations; and and are the must epistemic relations.

May transition relations in the abstract model \(\mathcal {P'}\) over-approximate may transition relations in the concrete model \(\mathcal {P}\): whenever there is a may transition between two states in \(\mathcal {P}\), there is is a transition between the corresponding abstract states of \(\mathcal {P'}\). Conversely, must transition relations in the abstract model \(\mathcal {P'}\) under-approximate must transition relations in the concrete model \(\mathcal {P}\); they are only created for concrete transitions that are common to all of the states of \(\mathcal {P}\) represented by the source abstract state.

We define may and must epistemic possibility relations in the abstract system similarly to the temporal case; however, there are additional constraints due to the nature of the relations. Specifically, we require both to be equivalence relations. This is achieved by building the transitive closure for , while relations in that are not symmetric are removed. By insisting on equivalence relations, we ensure that the usual KT45 axioms [18] for knowledge are satisfied in the abstract model.

Note that if the abstract may epistemic possibility relation were defined analogously to abstract may transition relations, it would not necessarily be transitive. Therefore, we define the abstract may epistemic possibility relation as the transitive closure of this relation. Similarly, if the abstract must epistemic possibility relation were defined analogously to abstract must transition relations, it would not be necessarily symmetric. Therefore, we remove the abstract must epistemic possibility relations that are not symmetric. The labelling of an abstract state is defined so that it is consistent with the labelling of all the concrete states it represents. The bi-implication ensures that the abstract labelling function is exact.

We use an extension of the epistemic \(\mu \)-calculus [20] as our specification language. We use the observational semantics for the epistemic component \(K_i\) in addition to the standard \(\mu \)-calculus [21] and define the language \(\mathcal {L}\) in BNF notation as follows. Let AP be a finite set of atomic propositions and \(\mathcal {V}\) a set of propositional variables, then:

$$\begin{aligned} \varphi \;::=\; \top \mid p \mid Z \mid \lnot \varphi \mid \varphi \wedge \varphi \mid \square \varphi \mid K_i\varphi \mid \mu Z.\varphi \mid \nu Z.\varphi \end{aligned}$$

where \(p \in AP\) and \(Z \in \mathcal {V}\). Here \(K_i \varphi \) means agent i knows \(\varphi \) [18].

The syntactic combinations \(\mu Z\) and \(\nu Z\) are the least and greatest fix-point operators respectively. An interpretation \(\rho : \mathcal {V} \rightarrow 2^S\) assigns the free propositional variable Z as a set of states. Any occurrence of Z in \(\varphi \) falls within an even number of negations. Furthermore, we assume that formulas are closed and well-named, i.e., all propositional variables are bound exactly once in any formula.

To evaluate a formula \(\varphi \), we compute sets of states such that a state s satisfies \(\varphi \) if \(s \in [\! [ \varphi ]\!]^{\mathcal {P},\rho }_{\text {tt }}\); a state s refutes \(\varphi \) if \(s \in [\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff }}\). In addition to satisfaction (tt) and refutation (ff), we write \(\bot \) to express that the truth value is unknown. We define the three-valued semantics for \(\mathcal {L}\) in line with [16] and extend it by the epistemic operator \(K_i\) as follows:

Definition 2

(Three-Valued Semantics). Let \(\mathcal {P}\) be AC-MAS . The three-valued semantics of \(\varphi \in \mathcal {L}\) in \(\mathcal {P}\) for an environment \(\rho \), denoted \([\![\varphi ]\!]^{M,\rho }_{3}\), is defined by a mapping \(S \rightarrow \{tt, ff, \bot \}\) such that:

$$\begin{aligned}{}[\![\varphi ]\!]^{\mathcal {P},\rho }_{3} (s) = \left\{ \begin{array}{l} \text {tt }, if \; s \in [\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt }} \\ \text {ff }, if \; s \in [\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff }} \\ \bot , otherwise \end{array} \right. \end{aligned}$$

The sets \([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}} \subseteq S\) and \([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}} \subseteq S\) for \(\varphi \in \mathcal {L}\) over \(\mathcal {P}\) are defined as:

$$\begin{aligned}{}[\![\top ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= S&[\![\top ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= \emptyset \\ [\![p ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= \{s \in S : p \in {\varLambda }(s)\}&[\![p ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= \{s \in S : p \notin {\varLambda }(s)\} \\ [\![Z ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= \rho (Z)&[\![Z ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= \rho (Z) \\ [\![\lnot \varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= [\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}&[\![\lnot \varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= [\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}} \\ [\![\varphi _1 \wedge \varphi _2 ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= [\![\varphi _1 ]\!]^{\mathcal {P},\rho }_{\text {tt}} \cap [\![\varphi _2 ]\!]^{\mathcal {P},\rho }_{\text {tt}}&[\![\varphi _1 \wedge \varphi _2 ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= [\![\varphi _1 ]\!]^{\mathcal {P},\rho }_{\text {ff}} \cup [\![\varphi _2 ]\!]^{\mathcal {P},\rho }_{\text {ff}} \\ [\![\square \varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= ax([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}})&[\![\square \varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= ex([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}) \\ [\![\mu Z.\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= \text {lfp}(\lambda g.[\![\varphi ]\!]^{\mathcal {P},\rho [Z\mapsto g]}_{\text {tt}})&[\![\mu Z.\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= \text {gfp}(\lambda g.[\![\varphi ]\!]^{\mathcal {P},\rho [Z\mapsto g]}_{\text {ff}}) \\ [\![\nu Z.\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= \text {gfp}(\lambda g.[\![\varphi ]\!]^{\mathcal {P},\rho [Z\mapsto g]}_{\text {tt}})&[\![\nu Z.\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= \text {lfp}(\lambda g.[\![\varphi ]\!]^{\mathcal {P},\rho [Z\mapsto g]}_{\text {ff}})\\ [\![K_i \varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}}&= ax_i([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}} )&[\![K_i \varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}&= ex_i([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}) \cup [\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}} \end{aligned}$$

where for \(X \subseteq S\): , , , and . Intuitively, ax returns states whose may successors are all in X. In contrast, ex computes all states for which at least one must transition exists. Similarly, \(ax_i\) and \(ex_i\) are the corresponding operators for the epistemic relations for a given agent i and give the set of the respective indistinguishable states. The definition for \([\![K_i \varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}\) allows for a tighter under-approximation since agents do not know \(\varphi \) in states where \(\varphi \) is false.

An AC-MAS \(\mathcal {P}\) satisfies a formula \(\varphi \), or , if all its initial states are in \([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}}\). An AC-MAS \(\mathcal {P}\) refutes \(\varphi \), or , if at least one initial state is in \([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}\). Otherwise we say . Note that the abstraction for AC-MAS models \(\mathcal {P}\) as defined above is consistent, i.e., \([\![\varphi ]\!]_{\text {tt}} \cap [\![\varphi ]\!]_{\text {ff}} = \emptyset \) for any \(\varphi \in \mathcal {L}\). Therefore the set \([\![\varphi ]\!]^{\mathcal {P},\rho }_{\bot }\) can be computed as \(S \backslash ([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}} \cup [\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}})\).

Abstracting GSM. To instantiate the theory above, we now outline a methodology for constructing abstract AC-MAS models from concrete GSM programs. This process includes abstracting the data to build a finite model using predicates, as well as the computation of the temporal and epistemic may and must relations. Observe that GSM programs only regulate the evolution of the artifact-centric system in the presence of external events and do not include a description of the agents’ behaviour with the system. To account for the evolution of both we combine GSM programs with procedural agent descriptions, thereby obtaining a GSM-MAS program. We do not present the agents descriptions here; we simply assume that they define the local states for the agents and define their evolution, both in terms of the actions performed on the artifact-centric system (or events) and the changes to their local state in the presence of actions. By GSM-MAS we refer to the combined programs consisting of the GSM code and the agents descriptions. It can be checked that AC-MAS provide a semantics for GSM-MAS programs.

Given a GSM-MAS program \(\mathcal {P}\) and a specification \(\varphi \) as input, we generate an abstract \(\mathcal {P'}\) such that if checking \(\mathcal {P}'\;\models \;\varphi \) returns either true or false, then the same result also applies to \(\mathcal {P}\); if \(\mathcal {P}'\;\models \;\varphi \) returns undefined, then no conclusion can be drawn on \(\mathcal {P}\) and the abstraction needs to be refined.

States in the abstract system are represented by predicates, which are Boolean variables that represent the validity of expressions in the concrete system. Predicates are selected by analysing the GSM-MAS program and the specification to be verified. In doing so we retain the status attributes of the lifecycles, as these are already Boolean, but replace the potentially unbound data attributes. To capture key conditions in the system, binary relations (\(=,\ne ,<,\le ,>,\ge \)) or quantifications over sets of data (\(\exists ,\forall \)) are selected by syntactically analysing the GSM-MAS program to get an initial set of predicates \(p_i\).

In contrast to classical approaches, which build abstractions locally to single execution blocks, the declarative nature of GSM-MAS programs and the quantification over artifact instances results in predicates that are shared between instances or agents. While predicates that are local to an artifact instance or agent can be treated as instance variables, shared predicates need to be treated carefully to avoid incorrect abstractions for the local states of the agents. Building the abstract state using data predicates along with the original status attributes guarantees that the abstract system retains the same structure, while maintaining an over-approximation of the data space of the concrete system.

Fig. 2.
figure 2

Concrete and abstract transitions of a non-negative integer counter.

Fig. 3.
figure 3

Indistinguishable states of an agent given \(y \in \nu \).

Since several concrete states correspond to an abstract state, temporal changes in the abstract system can only approximate the corresponding changes in the concrete data. Rather than giving the full procedure, instead we here compute the may and must transition relations on a simple example. Consider the abstraction of a non-negative integer counter with a single integer variable x that is initialised to 0 and gets incremented by 1 at each step using the assignment \(x := x+1\). If we base our abstract states on the predicates \(p: x < 3\) and \(q: x = 3\), we have three possible abstract states, which are shown in Fig. 2. Between the abstract states \(p\overline{q}\) and \(\overline{p}q\) we have a may transition because the concrete system can transition to a state that is in \(\overline{p}q\). There is no must transition, however, because from a state in \(p\overline{q}\) the concrete system can also transition to a state that is still in \(p\overline{q}\). In contrast, all concrete states in \(\overline{p}q\) transition to \(\overline{pq}\), which means that we have both may and must transitions.

In line with existing literature in epistemic logic [18], the agents’ knowledge is computed on the basis of the equality of their local components. In our case, however, the agents’ local states are given by private variables, but also their view \(\nu \) and the window \(\omega \). In the labelling algorithm for computing the sets in which an epistemic formula holds, the existential pre-image of the set of global states X with respect to the appropriate epistemic relation ( or ) is computed by existential quantification of variables outside of the view, and restriction to the window. The pre-image can be directly used to compute \([\![K_i \varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}\), since . This is not the case for \([\![K_i \varphi ]\!]^{\mathcal {P},\rho }_{\text {tt}}\), where ; in this case we first compute the pre-image of \([\![\varphi ]\!]^{\mathcal {P},\rho }_{\text {ff}}\) and then take its complement.

To build the abstract epistemic relations, views and windows have to be defined in terms of the predicates for the abstract states. The window \(\omega \) can be expressed as a formula using relations between variables. Since we build our set of predicates using exactly those relations, we can build a direct mapping to an abstract function \(\omega '\). In other words, the abstract and concrete window functions represent the exact same states and \(\omega (\gamma (x)) = \omega '(x)\) for any abstract state x.

The abstraction of the view \(\nu \) is less straightforward, however, as predicates may use sets of variables that do not coincide with \(\nu \), and in the case of shared predicates may even relate to different instances and agents. This implies that an agent may be able to determine the value of a predicate only for some states. To avoid computing \(\nu '\) depending on the state, we compute two sets and that give correct over- and under-approximations of the epistemic relation.

For the over-approximation , we select only the local predicates for that exclusively refer to visible variables in \(\nu \). This ensures that an agent can distinguish two states in the abstract system only if it has enough visibility in the concrete system to determine the value of the predicates. We exclude shared predicates since one or more of the referenced instances might be outside the window \(\omega \) and thus the predicate may be unknown. Note that fewer predicates in \(\nu \) result in a larger set , thereby ensuring that an over-approximation is generated. This set is then restricted to the set of reachable states computed with , which represent the states possibly reachable in the abstract model.

For the must transitions , we need to ensure under-approximation; we stipulate that if for each of the concrete states in s there is a concrete state in t such that there is an epistemic relation for agent i between them. Intuitively, this means that we need to consider every predicate for that encodes at least one variable visible in the concrete system. Note, however, that this may not be sufficient as, if the predicates are not independent of each other, they may allow to infer information about a value even if it is not visible to the agent. Consider the example in Fig. 3 with \(p: x = 1\) and \(q: x > y\) with the visible variable y. In the concrete system, \((x,y) = (1,1)\) is distinguishable from (0, 0), but not from (0, 1). To compute with visible predicate q and only quantify p would result in a transition between \(p\overline{q}\) and \(\overline{p}\overline{q}\), which is not a proper under-approximation because of the missing epistemic relation between (0, 0) and (1, 1) in the concrete system. To ensure a correct under-approximation is generated, we transitively select all predicates that share the variables with predicates already in and also include shared predicates. Finally, we restrict by , computed by , which corresponds to the set of states that are known to be reachable in the concrete system.

4 Implementation and Experimental Results

GSMC is an open source model checker that implements the technique described above [22]. It is operated via a command line application written in C++ that uses the CUDD library [23] for BDD operations and the SMT solver CVC4 [24] to help compute the abstractions. GSMC uses binary decision diagrams (BDDs) to represent the sets of states and the transition relations of the abstract model.

GSMC operates directly on GSM programs developed in the Acsi Hub [4], a web-based application that supports the design and implementation of artifact systems. By using the Acsi Hub, users can design business artifacts with GSM lifecycles through a design editor and then immediately deploy these programs on an execution engine. The description of the agents and specification properties are supplied in plain text files.

GSMC supports specifications written in a temporal-epistemic logic with quantification over artifact instances. The language, called Instance Quantified CTLK [15], or IQ-CTLK, extends the usual epistemic branching time logic CTLK and has the following syntax:

$$\begin{aligned} \varphi \;::=\;p \mid \lnot \varphi \mid \varphi \vee \varphi \mid EX \varphi \mid EG \varphi \mid E(\varphi U \varphi ) \mid K_i\varphi \mid \forall x:R\ \varphi \mid \exists x:R\ \varphi \end{aligned}$$

where R is the name of an artifact type and p is an atomic proposition over the agents’ private data and the attributes of active instances that are specified in terms of instance variables bound by the quantification operators. The quantified instance variables range over the active instances of a given artifact type R in the state where the quantification is evaluated and must be bound.

We introduce a bound on the number of instances that can be generated and use an overflow flag that indicates if the bound was reached during a run. The bound in the number of instances restricts the possible behaviour of the system and may lead to loss of soundness or completeness when the limit is reached. The bound can be revised before any execution. Any IQ-CTLK formula to be verified is first rewritten into a CTLK formula by replacing the quantification operators as follows:

$$\begin{aligned} \forall x: \varphi&\Rightarrow \bigwedge _{\iota \in {\varGamma }} created(\iota ) \rightarrow \varphi&\exists x: \varphi&\Rightarrow \bigvee _{\iota \in {\varGamma }} created(\iota ) \wedge \varphi \end{aligned}$$

where the expression \(created(\iota )\) checks if instance \(\iota \) was created. This is required since the new formula ranges over the actual instances, which are created dynamically at run-time, and the number of active instances is not a priori known. The CTLK formula is then translated to an epistemic \(\mu \)-calculus formula using the fixed point characterisation of CTL [25]; the resulting specification is checked on the abstract model.

In the rest of the section we evaluate the tool. Both use cases are complete Acsi Hub applications. We verify the temporal-epistemic properties of the systems and discuss performance of the implemented techniques. All tests were conducted on a 64-bit Fedora 17 Linux machine with a 2.10 GHz Intel Core i7 processor and 4 GB RAM.

Evaluation: The Order-to-Cash Scenario. This is an application in which a seller schedules the assembly of a product based on a confirmed purchase order from a buyer that requires several components, that are sourced from different suppliers. When the product is assembled, a carrier ships the order to the buyer. The buyer can cancel a purchase order at any time before the delivery. We refer to [17] for more details. The GSM program consists of a single-artifact Acsi Hub application with 10 data attributes, 9 stages, 11 milestones, and 12 events. We model a collection of components by introducing an integer counter. The process is considered complete when 3 components have arrived. The following three agent roles interact with the artifact system: (1) a Buyer who creates an artifact instance that represents the order; (2) a Seller who fulfils the order; and (3) a Carrier who ships the finished product to the Buyer.

We constructed several GSM-MAS with different numbers of agents and bounds on artifact instances. We report on the verification of these systems against four temporal-epistemic specifications. In the following Diogenes is an agent of role Buyer. The first specification, Property 1, states that Diogenes knows that the product might be be received via any of his orders as long as these are not cancelled, i.e., that there is no deadlock in processing the order:

$$\begin{aligned} AG\ \forall x:\textit{CustomerOrder} ((\textit{x.BuyerId} \ne \textit{Diogenes} \wedge \lnot \textit{Diogenes.Cancelled})\nonumber \\ \rightarrow K_{\textit{Diogenes}}\ EF\ \textit{x.Received}) \end{aligned}$$
(1)

Property 2 states that Diogenes may come to know that a product is received for an order with a different owner. This can be used to ascertain whether the orders are private to the buyers:

$$\begin{aligned} EF\ \exists x:\textit{CustomerOrder} (\textit{x.BuyerId} \ne \textit{Diogenes} \wedge K_{\textit{Diogenes}}\ \textit{x.Received}) \end{aligned}$$
(2)

Property 3 encodes the ability of an agent to deduce information it can not directly observe by checking whether Diogenes always knows there are 3 PurchaseOrders collected in all of his orders when the milestone Ready is achieved:

$$\begin{aligned} AG\ \forall x:\textit{CustomerOrder}((\textit{x.Ready} \wedge \textit{x.BuyerId} = \textit{Diogenes}) \qquad \qquad \qquad \nonumber \\ \rightarrow K_{\textit{Diogenes}}\ (\textit{x.PurchaseOrders} = 3)) \end{aligned}$$
(3)

The last specification, Property 4, encodes the ownership of the order. It implies that an agent other than Diogenes can cancel an order that belongs to Diogenes. This is done by using a private variable, which is true only if Diogenes executed the Cancelled event. We thus require that an order that belongs to Diogenes cannot be cancelled if this variable is false:

$$\begin{aligned} EF\ \exists x:\textit{CustomerOrder}(\textit{x.BuyerId} = \textit{Diogenes} \wedge \textit{x.Cancelled}\qquad \qquad \nonumber \\ \wedge \;\textit{Diogenes.cancelled} \ne 1) \end{aligned}$$
(4)
Table 1. Performance for different numbers of artifact instances \(\iota \) and agents.

We first verified the properties in the abstract system and measured the number of may and must reachable states, memory used, and CPU time required. GSMC evaluated Property 1 to be unknown, Properties 2 and 4 to be false, and Property 3 to be true in the abstract model. Table 1 reports the performance for a system with 1 agent per role and a system of 15 agents (6 Buyers, 5 Sellers, and 4 Carriers). We observe that there is an order of magnitude of difference in the number of may and must reachable states; this implies that there are specifications, such as Property 1, that cannot be determined. However, the tool is still able to find answers to the other three properties. The results are in line with our expectations, confirming the correctness of the GSM program against said specifications.

For a comparison we disabled the predicate abstraction feature and verified the same Order-to-Cash system under the same conditions. In this case GSMC evaluated Properties 1 and 3 to be true and Properties 2 and 4 to be false in the model, which is consistent with the abstraction results. Note that the previously unknown Property 1 is returned as true when predicate abstraction is disabled.

Table 2 presents the performance of the tool executed on the same machine, under the same conditions. By comparing this table to Table 1, we see that verification of the concrete model initially outperforms abstraction. This is because there is a constant overhead from building the may and must temporal transitions by calls to the SMT solver. However, as the model grows we clearly see the benefits of the abstraction methodology as it reduces the number of states to be considered. For example, for 15 agents and 5 instances we have over two orders of magnitude reduction in the number of states to be considered and an order of magnitude reduction in the verification time.

Although the tool does not support automatic refinement for the abstraction methodology, by manually adding the predicates \(x. PurchaseOrders = 0\), \(x. PurchaseOrders = 1\), and \(x. PurchaseOrders = 2\) we could refine the abstract model in such a way that may and must reachable state spaces become equal to those of the concrete model. In doing so Property 1 is no longer returned as unknown but true; this is in line with the results obtained by verifying the concrete system.

Table 2. Performance for different settings of the concrete system.

The Second Evaluation Scenario focuses on the management of research programs. The scenario consists of three conceptual entities modelled as business artifacts: CallForProposals represents the annual call of a funding program; Project encodes one project which starts as a proposal and, if successful, becomes a funded research project; ReviewBoard governs the assembling of a review board for a specified research topic and the reviews of all competing proposals. We focus on three roles: the Program Manager initiates the process and confirms the board; the Program Staff Member supervises projects on behalf of the funding agency, the Project Leader is responsible for a particular proposal. The scenario was implemented in the Acsi Hub. We refer to [26] for detail.

The GSM program for this scenario is a significantly larger application than the Order-to-Cash, as it consists of 45 stages, 56 milestones, and 19 events. For this reason we here report only the interactions between the agents and the ReviewBoard artifact type only, i.e., the types CallForProposals and Project are not analysed here. We also restrict the number of agents to one per role. Nevertheless, GSMC builds the transition relations for the whole GSM program.

An artifact instance is created when the agent Manager decides to set up a review board. When the Manager confirms the assembled board, the lifecycle of the ReviewBoard instance terminates. The agent Staff carries out several administration task, including assembling and updating the review board. Both Manager and Staff can access all artifact instances. In contrast, the agent Leader cannot observe any of them. Agents do not set specific payloads; this implies we can examine all the possible non-deterministic behaviours.

The first two specifications we analyse concern the simple reachability of stages and milestones. Property 5 states that there is an instance of the ReviewBoard artifact type in which eventually the stage SendProposalsToReviewers is open:

$$\begin{aligned} EF\ \exists x: \textit{ReviewBoard}(\textit{x.SendProposalsToReviewers}) \end{aligned}$$
(5)

Property 6 encodes that there is an instance of ReviewBoard in which eventually the milestone ReviewsTerminated is achieved. This means that an instance will terminate:

$$\begin{aligned} EF\ \exists x: \textit{ReviewBoard}(\textit{x.ReviewsTerminated}) \end{aligned}$$
(6)

The next two specifications demonstrate the use of 3-valued abstraction on sets of data. These formulas cannot be verified on concrete systems as sets of data cannot be represented on concrete models. Property 7 states that there is an instance of ReviewBoard in which eventually the the active reviewers is equal to the specified number of reviewers required:

$$\begin{aligned} EF\ \exists x: \textit{ReviewBoard}(\textit{x.Reviewers.size}() = \textit{x.ReviewBoardSize}) \end{aligned}$$
(7)

Property 8 states that there is an instance of ReviewBoard in which eventually the set of active reviewers contains a reviewer called Diogenes:

$$\begin{aligned} EF\ \exists x: \textit{ReviewBoard}(\textit{x.Reviewers.exists}(\textit{FirstName} = \textit{Diogenes})) \end{aligned}$$
(8)

The last two specifications concern reasoning about the knowledge of the agents. Property 9 says that agent Manager knows there is a path where eventually the milestone ReviewsTerminated is achieved:

$$\begin{aligned} K_{\textit{Manager}}\ (EF\ \exists x: \textit{ReviewBoard}(\textit{x.ReviewsTerminated})) \end{aligned}$$
(9)

Finally, Property 10 encodes that agent Leader knows there is a path where eventually the milestone ReviewsTerminated is achieved:

$$\begin{aligned} K_{\textit{Leader}}\ (EF\ \exists x: \textit{ReviewBoard}(\textit{x.ReviewsTerminated})) \end{aligned}$$
(10)
Table 3. Performance results for 1 instance of the ReviewBoard artifact type.

The data attributes of the concrete model are represented by 10 predicates in the abstract model. The abstract model is then encoded by GSMC into BDDs by using 142 Boolean variables. As the construction of the transition relations requires three distinct sets of Boolean variables, there are 426 Boolean variables in total. The may reachable state space of the model spans over approximately \(7.1 \times 10^{9}\) states, and its construction requires 30 iterations. The must reachable state space has \(8.4 \times 10^{7}\) states and it is built in 12 iterations. The total time for the verification was 43.88 s and the memory usage peaked at 395 MB.

Table 3 presents the performance of the individual operations undertaken by GSMC, as well as the verification results. The first row reports the construction of the transition relations, the second row shows the construction of may and must reachable state spaces, and the remaining rows give the performance for the properties verified in this section. Properties 59 are true in the model. Property 10 is false in the model since the agent Leader cannot observe the ReviewBoard lifecycle.

5 Conclusions

Artifact-centric systems have been put forward as an intuitive paradigm to model applications for businesses and services. Differently from process models, artifact-centric systems give equal prominence to both the process model (i.e., the lifecycles) and that information model (i.e., the data structures). GSM has been introduced as a programming framework for artifact-centric systems and recently adopted as part of the OMG Case Management Model and Notation standard [27]. This suggests its use may increase considerably in the future.

In this paper we introduced a methodology for the verification of GSM systems. The technique extends state-of-the-art methods in verification by providing a predicate abstraction methodology to GSM. In addition to catering for GSM programs directly, we support first-order quantification to refer to the data referenced by artifacts. Differently from any other mainstream predicate abstraction technique we also support operators expressing the knowledge of the agents in the system.

We implemented the technique in GSMC, the first model checker for GSM that supports GSM’s information model. The checker supports GSM’s infinite models and automatically generates, via SMT calls, finite abstract models that can be efficiently encoded as BDDs and then verified. To evaluate the efficiency of the approach we have discussed the experimental results obtained by using the checkers on sophisticated use-cases generated by third-parties in the EU project ACSI. The approach as currently implemented does not support recursion in the GSM programs. In the future we plan to add partial support for basic recursive data types and automatic refinement.