1 Introduction

The environment of the web is finally at a stage where hypermedia agents could be applied [9]: We see that dynamic, open, and long-lived systems are commonplace on the web forming a highly distributed system. For example, microservices [31] build on the web architecture and provide fine-grained read-write access to business functions. Moreover, Internet of Things devices are increasingly equipped with web interfaces, see, for example, the W3C’s Web of Things effort.Footnote 1 Furthermore, users’ awareness for privacy issues leads to the decentralisation of social networks from monolithic silos to community- or user-hosted systems like SoLiD,Footnote 2 which builds on the web architecture. The web architecture offers REST [14], or its implementation HTTP,Footnote 3 as a uniform way for system interaction, and RDFFootnote 4 as uniform way for knowledge representation, where we can employ semantic reasoning to integrate data. To facilitate software agents in this environment called Read-Write Linked Data [4], we need to embrace the web architecture and find a suitable way to specify behaviour. As according to REST, the exchange of state information is in the focus on the web, we want to investigate a data-driven approach for specifying behaviour. Moreover, data-driven approaches to workflow modelling can be both intuitive and actionable, and hence are suited to a wide range of audiences with different experience with information technologies [20]. Hence, we want to tackle the research question of how to specify and execute agent behaviour in the environment of Read-Write Linked Data in a data-driven fashion?

Echoing the said application areas for web technologies, we envision our approach to be useful to define software agents that orchestrate services in microservice deployments, act as assistance systems in Internet of Things or Cyber-Physical Systems deployments, or manage the lifecycle of personal data. Deployments in which REST, semantic technologies, and some notion of behaviour (e.g. flow-driven workflows) play a role can be found in various industries, for academic descriptions see, for example, [6]—automotive, [25]—aviation, [10]—manufacturing, and [26]—Internet of Things. Having been involved in the development of some of these deployments, we see the need for an approach to specify behaviour that embraces the data-driven nature of the environment.

As the environment determines why different workflow approaches are used in different circumstances [13], we need to look at the particularities of Read-Write Linked Data, whose basic assumptions are fundamentally different from traditional environments where workflow technologies are applied, e.g. databases:

  • The absence of events in HTTP Of the many HTTP methods, there is no method to subscribe to events. Hence, for our Read-Write-Linked-Data native approach, we rely on state data and resort to polling to get informed about changes in the environment.

  • Reasoning and querying under the OWA While in databases, typically the closed-world assumption is made, i.e. we conclude from the absence of information that it is false, reasoning in ontology languages for RDF is typically based on the open-world assumption (OWA). Hence, we have to explicitly model all options.

Mitigation strategies would introduce complexity or restrict the generality of our approach: The absence of events could, for example, be addressed by (1) generating events from differences between state snapshots and to process these events, which would add unnecessary complexity if we can do without; (2) assuming server implementations that implement change events using the WebSocket protocol, which would restrict the generality of our approach and, for uniform processing, would require clear message semantics, which, in contrast to HTTP, event-based systems do not have [14]Footnote 5. The open-world assumption could, for example, be addressed by introducing assumptions such as negation-as-failure once a certain completeness class [18] has been reached, which also would add complexity.

In contrast to previous work by Pautasso, who presented an approach to retrofit REST into the BPEL approach to workflow modelling in [32], our approach rather retrofits a workflow modelling approach into REST, here GSM. In previous work, we defined ASM4LD, a model of computation for the environment of Read-Write Linked Data [23], which allows for rule-based specification of agent behaviour. Based on this model of computation, we provided an approach to specify flow-driven workflows [24]. In contrast, we present a data-driven approach in this paper. The Guard-Stage-Milestone (GSM) approach, which serves as basis for our work, has first been presented in [20]. While GSM builds on events sent to a database, which holds the information model consisting in status and data attributes, in our approach, distributed components with web interfaces that supply state information hold the information model.

In [22], we described a previous version of this paper, which we presented at the 3rd International Workshop in Artificial Intelligence for Business Process Management (AI4BPM) at the 17th International Conference on Business Process Management (BPM). In this paper, we extended aforementioned work by providing a description on how we process queries with a theoretical insight into the required rule language, an extended formal evaluation from which we derive modelling requirements, a performance evaluation, updates to the operational semantics, and an extended discussion of related work.

Our approach consists in two main parts:

  • GSM Ontology We present an ontology to specify GSM workflows and instances in the ontology language RDFS. Using this ontology, we can specify, reason over, and query workflow models and instances at run-time.

  • Operational Semantics We present ASM4LD rules to execute workflow instances specified using our GSM ontology. To this end, we build on a Linked Data Platform containerFootnote 6, i.e. a writable RESTful RDF data store, to store the status attributes, i.e. workflow instances in our ontology.

The paper is structured as follows: In Sect. 2, we survey related work. Next, in Sect. 3, we give basic definitions on which we build our approach with the help of examples. In Sect. 4, we present our main contributions: the ontology and the operational semantics, together with the modelling requirements. Then, in Sect. 5, we present how we process queries on the information model using rules. Next, we evaluate our approach in Sect. 6 regarding correctness and performance. Last, in Sect. 7, we conclude.

2 Related Work

As we work in the intersection of data-driven workflow management, knowledge representation using semantic technologies, and systems built using the web, we survey related work in those fields and intersections.

2.1 Data in Workflow Management

Execution of workflows is typically driven by either flow or data. Flow-driven approaches include the popular BPMN languageFootnote 7. If we want to make use of data during workflow execution, we can either extend a flow-driven approach to make it data-aware, or we use an approach where data are a first-class citizen. In this paper, we want to investigate an entirely data-driven approach, but we give a brief overview on works in the former category first.

Semantics for the flow aspects in flow-based approaches are typically given using Petri nets [36] in an event-driven fashion. For instance, Dijkman et al. [12] provide such semantics for BPMN. Thus, approaches that make flow-based approaches more data-aware on a formal level are often based on Petri nets: Such approaches include [30, 37]. They take an extension of Petri nets for the dynamic part of the system, and a formal view on data bases for the data aspect. The gap between flow-based approaches and data-based approaches becomes obvious in [38], which presents an approach for deriving GSM models from Petri nets. In the conversion, not all conditions in the GSM model can be filled automatically: Those sentries that build on data require additional sources for input, e.g. a human expert. We see that here, the data aspects need to be added to the model when talking about a flow-driven model from a data-centric perspective, which provides additional motivation for us to investigate entirely data-driven approaches.

Semantics for data-driven workflow languages are often specified using Event-Condition-Action (ECA) rules to be executed on databases, e.g. see GSM [20] for an artefact-centric approach, and [8] for a flow-centric approach. Another data-centric approach is RESEDA [39], whose semantics have been given using the event-driven reactive paradigm. Our approach is built for the environment of REST, where there are no events that could inform the enactment of the workflow instance. However, we make use of the GSM workflow language and transfer it to the environment of Read-Write Linked Data by specifying semantics in Condition-Action rules.

Other approaches assume processes to be given as Condition-Action rules, noting that workflow languages can be layered on top, i.e. the following three approaches do not talk about the semantics of such languages. For instance, the Daphne approach [7] provides actions as abstraction on a SQL query or an external service call, both of which change the contents of a relational data base. With its roots in the formal investigations of data-centric dynamic systems [2], Daphne is closely related to [3], where a similar approach is investigated that works on a Description Logic knowledge base instead of a data base, and also provides actions as abstractions for changes. In contrast, the knowledge base in our approach is the web of Read-Write Linked Data, i.e. our approach is built for a distributed hypermedia setting, where REST calls are the means to enact change. In terms of expressivity, we use an ontology language that is not based on description logics but rather on RDFS, which is typically less expressive than members of the Description Logic family of languages [19]. However, the transition systems in those condition-action rule-based approaches can be related to ASM4LD [23], the Abstract State Machine [16] based model of computation we build our work on.

2.2 Workflows and Web Services

Workflows applied to the web have been investigated under the headline Web Services, which produced a set of WS-* standards, most prominently SOAPFootnote 8 and WSDLFootnote 9, which allow for composition of services, i.e. arbitrary functions called via HTTP as transport protocol, using the flow-based BPELFootnote 10 workflow language. Semantic descriptions, e.g. in OWL-S [28] and WSMO [17] of those functions, should allow for automated composition. In contrast, REST [14] constrains this set of arbitrary functions and emphasises the processing of state information instead of return values of function calls [35, 42]. Thus, extensions for BPEL have been proposed to include RESTful services [32, 33]. We built our approach with the same aim in mind, i.e. to make use of RESTful services in processes. However, as in REST, a web is built around resources and data about their state, we want to use a data-centric approach and exploit the semantics of the constrained set of REST operations.

2.3 Ontologies for Processes on the Web

Ontologies for processes have been proposed in numerous occasions. For instance, OWL-S [29] contains a process language, scientific workflows are often described using an ontology [15, 41], and publicly funded research projects including SuperFootnote 11 and Adaptive Services GridFootnote 12 defined process ontologies. In contrast to our approach, those approaches have been built for flow-based processes, the function-call style web service interaction, cannot describe process instances, and do not come with operational semantics.

In previous work, we developed the WiLD ontology to describe flow-based workflow models and instances together with operational semantics in to specify and execute workflows on Read-Write Linked Data [24]. In contrast, in this paper, we investigate data-centric behaviour descriptions.

2.4 Information and Processes in Information Systems

In the broader scope of object-aware business process systems, [27] have developed 20 requirements around data, activities, processes, user integration and monitoring. Based on these requirements, they evaluate imperative, declarative, and data-driven approaches to workflow management. As we work with Read-Write Linked Data, our approach can contribute to filling the gaps left by other approaches: to fulfil requirement R1 (data integration) and R2 (access to data). As our approach is based on GSM, we of course inherit their requirement fulfilment in other areas including R10 (object behaviour), R13 (flexible process execution), R14 (re-execution of activities) and R15 (explicit user decisions).

3 Preliminaries

In this section, we introduce the technologies on which we build our approach.

3.1 Uniform Resource Identifier (URI)

On the web, we identify things (called resources on the web) using URIsFootnote 13. URIs are character strings consisting in a scheme, and a scheme-specific part, separated by the colon character. In this paper, we only consider the schemes of HTTP. For instance, consider the following URI that identifies the relation to assign a thing to a class:

$$\begin{aligned} \texttt {http://www.w3.org/1999/02/22-rdf-syntax-ns\#type} \end{aligned}$$

In this URI, the URI scheme is “http”, which refers to the HTTP protocol, which we discuss in the next section. Note that we use the term URIs when we actually mean their internationalised form, IRIs.Footnote 14 We do so, as the former is the more popular term and the former RFC provides more information about the meaning and the composition of URIs/IRIs than the RFC about the latter.

3.2 The Hypertext Transfer Protocol (HTTP)

HTTP\(^{3}\) is a stateless application-level protocol, where clients and servers exchange request/response message pairs about resources that are identified using URIs on the server. Requests are typed, and the type (i.e. the HTTP method) determines the semantics of both the request and the optional message body. We make extensive use of the GET request to request a representation of a resource, the PUT request to overwrite the representation of a resource, and the POST request to append to an existing collection resource. Notably, the constrained set of HTTP methods does not include a way to subscribe to events. Therefore, polling, i.e. the repeated retrieval of state information, is the way to get informed about changes to resource state.

3.3 The Resource Description Framework (RDF)

RDF\(^{4}\) is a graph-based data model for representing and exchanging data based on logical knowledge representation. In RDF, we represent data as triples that follow the form:

$$\begin{aligned} (\textit{subject}, \textit{predicate}, \textit{object}) \end{aligned}$$

Such a triple defines a relation of type predicate between graph nodes subject and object. Multiple triples form an RDF graph, and the triples can be regarded as defining the edges that connect the vertices of the graph. Things in RDF are identified globally using URIs, or document-locally using so-called blank nodes. Literals can be used to express values. RDF can be serialised in different formats. Those include TurtleFootnote 15, which is probably the easiest for humans to read, RDF/XMLFootnote 16, which has been one of the first formats specified and works well in XML-based environments, and JSON-LDFootnote 17, which works well in JSON-based environments. In this paper, we use the following notation for RDF: As triples encode binary predicates, we write

$$\begin{aligned} {rdf{:}type}({{:}active},{{:}State}) \end{aligned}$$

to mean the following triple in Turtle notation:

$$\begin{aligned} \texttt {{:active} {rdf:type} {:State}~.} \end{aligned}$$

Here, :active is the subject, rdf:type is the predicate, and :State is the object. Triples are asserted conjunctively. Note the use of URI abbreviations using the CURIE syntaxFootnote 18, where a colon separates the abbreviating and possibly empty prefix from the local name, here rdf is short for http://www.w3.org/1999/02/22-rdf-syntax-ns#Footnote 19. For the special case of class assignments, we use unary predicates. That is, in our notation, above triple becomes:

$$\begin{aligned} {:State}({:active}) \end{aligned}$$

For boolean constants, we use typewriter font, e.g. true.

3.4 The SPARQL Protocol and Query Language

SPARQLFootnote 20 is a protocol and a query language for RDF. SPARQL queries can have different forms, most prominently SELECT queries. We use ASK queries in this paper. The basic construct is a SPARQL query is a so-called graph pattern, where variables are allowed in all positions of a triple. For instance, a query to select all states could look as follows (assuming appropriate prefix definitions):

figure a

The ASK query

figure b

returns true if the SELECT query above has a non-empty result.

3.5 Ontologies, Reasoning, and Rules

Besides encoding knowledge in a graph, RDF is the basis for a set of Knowledge Representation technologies including the languages RDFSFootnote 21 and OWLFootnote 22 to express ontologies. By and large, RDFS is less expressive than most members of the OWL family, which comes with computational benefits [19]. For instance, RDFS entailment can be implemented using monotonous deduction rules. A rule language in the context of RDF is Notation3Footnote 23. For example, if we know that:

figure c

or in the notation of our paper:

$$\begin{aligned}&\textit{:NonFailureState}(\textit{:active})\\&\wedge \textit{rdfs:subClassOf}(\textit{:NonFailureState},\textit{:State}) \end{aligned}$$

the rule (with terms without colons being variables)

$$\begin{aligned} \textit{rdfs:subClassOf}(x,y)\wedge x(z)\rightarrow y(z) \end{aligned}$$

allows us to entail the triple from the RDF example. We call the part before the arrow antecedent or body and the part after the arrow consequent or head.

In this paper, we additionally use a special kind of rules: request rules, where the consequent is an HTTP request to be executed. For instance:

$$\begin{aligned} \textit{:hasState}(s,\textit{:active})\rightarrow \text {PUT}(s,\textit{:hasState}(s,\textit{:done})) \end{aligned}$$

would mean that a PUT request should be sent to all s that have the state :active with the payload that this s shall have the state :done.

3.6 Abstract State Machines for Linked Data (ASM4LD)

ASM4LD is an Abstract State Machine-based operational semantics given to Notation3 [23]. In ASM4LD, we can encode two types of rules: derivation rules (to derive new knowledge) and request rules (which cause HTTP requests). Moreover, ASM4LD supports RDF assertions. In [23], we derived the operational semantics based on the semantics of HTTP requests, first-order logic, and Abstract State Machines. The operational semantics can be summarised in four steps to be executed in a loop, thus implementing polling:

  1. 1.

    Initially, set the working memory to be empty.

  2. 2.

    Add the assertions to the working memory.

  3. 3.

    Until no further data can be retrieved from URIs and no further data can be deducted, evaluate on the working memory:

  4. (a)

    Request rules from which HTTP-GET requests follow. For the rules whose condition holds, make the HTTP requests add the data from the responses to the working memory.

  5. (b)

    Derivation rules. Add the thus derived data to the working memory.

    This way, we (a) obtain and (b) reason on data about the world state.

  6. 4.

    Evaluate all other request rules on the working memory, i.e. those rules from which PUT/POST/DELETE requests follow. Make the corresponding HTTP requests.

    This way, we enact changes on the world’s state.

In this paper, we use ASM4LD as formal basis to give operational semantics to our Guard-Stage-Milestone ontology.

3.7 The Guard-Stage-Milestone Approach

The Guard-Stage-Milestone approach is an artefact-centric workflow meta-model, presented in [20]. The key modelling elements for Guard-Stage-Milestone workflows are the following: The Information Model contains all relevant information for a workflow instance: data attributes maintain information about the system controlled by the workflow instance, and status attributes maintain control information such as how far the execution has already progressed. Stages can contain a task (i.e. the actual activity, an unit of work to be done by a human or machine) and may be nested. Guards control whether a stage gets activated, i.e. the activity may execute. The conditions of a guard are given as sentries. Sentries are boolean expressions in a condition language. They come in the form Event-Condition (on <event> if <condition>). Here, events may be incoming from the system, or be changes to status attributes. Milestones are objectives that can be achieved during execution, and are represented using boolean values. Milestones have achieving and invalidating sentries associated: If an achieving sentry is evaluated to true, the milestone is set to achieved. An invalidating sentry can set a milestone back to unachieved.

An example can be found in Fig. 1.

Fig. 1
figure 1

A small example of two stages with associated guards and milestones

The example is set in an Internet of Things scenario and shows two stages “Start Fire Alarm” and “Close doors”, whose activation is controlled using guards. Those stages have milestones associated that can be validated and invalidated. The informal definition of those guards and milestones can be found in Fig. 2.

To specify the operational semantics, Hull et al. [20] provides a set of six PAC rules. PAC rules are a variation of Event-Condition-Action rules and are described by a prerequisite, antecedent, and consequent, respectively. Both prerequisite and antecedent range over the entire information model, and the consequent is an update to the status attributes. The rules can be subdivided into two categories: explicit rules, which accomplish the actual progress in a workflow instance, and invariant preserving rules, which perform “housekeeping” by, e.g. deactivating child stages if the parent has been deactivated. The operational semantics of GSM are defined in an event-triggered fashion. There are three types of Events: Outgoing Events for Task Invocation, sent from stages, Incoming Events sent from the environment, e.g. one-way messages and task terminations, and Status changes who fire when a milestone or stage changes its state. If an event triggers the execution, all that follows from the rules gets incorporated into the information model. This full incorporation is called a Business Step (B-Step).

We illustrate the operational semantics using an example. It demonstrates a workflow of a fire alarm process in a public building. Figure 2 shows how the workflow execution proceeds triggered by changes in the information model.

Fig. 2
figure 2

Example of a workflow execution. Time progresses from top to bottom

Every line represents a B-Step.

  1. Line 2

    Once smoke has been detected, the sentry of g1 becomes true, which leads to g1 being active and thus the execution of the left stage.

  2. Line 3

    Guard g1 is still active. As the fire alarm has been started, the achieving sentry of milestone m1 is true and thus the milestone is set to achieved. In consequence, guard g2 becomes active, which triggers the execution of the right stage.

  3. Line 4

    As the invalidating sentry of m1 becomes true, i.e. the fire alarm stops when all doors are closed, the dependent guard g2 becomes inactive as well. This does not affect the state of milestone m2b.

  4. Line 5

    After m1 has been invalidated, the smoke cannot further circulate through open doors.

  5. Line 6

    At some point after all doors had already been closed, somebody re-opens a door. This triggers the invalidation of the corresponding milestone m2b (all doors are closed). As a consequence, the left stage is re-triggered.

We see that changes in the information model trigger in more changes in the information model.

4 Proposed Approach

To transfer the GSM approach to the environment of Read-Write Linked Data, we need to transfer both the modelling and the operational semantics to this environment. Our approach consists therefore in an ontology to model GSM workflows and instances (the latter corresponding to the status attributes), and in operational semantics in rules with ASM4LD semantics, which interpret those workflows and instances. The rules can be directly deployed on a corresponding interpreter.

To put our approach into practice, a workflow model has to be modelled using our ontology and to be made available as Linked Data. A workflow instance has to be defined using our ontology, with a link to a workflow model, and deployed as Read-Write Linked Data in a Linked Data Platform Container. Then, an interpreter—with the operational semantics rules deployed– has to be pointed to the workflow instance and can execute the instance according to the model. Necessary instances for the elements within the workflow model are created, also as Linked Data in the said container, by the interpreter using specialised set-up rules. The interpreter then changes the state of the instances according to the GSM lifecycle. As the rule language supports both, derivation rules and request rules, we can deploy derivation rules to implement the reasoning that is necessary to fulfil the semantics of ontology languages (e.g. RDFS) next to the rules required for the operational semantics. Thus, we can execute the workflow and perform semantic data integration at the same time.

We first present the ontology (Sect. 4.1) and then the operational semantics (Sect. 4.2). For the presentation of the operational semantics, we use the rule syntax described in Sect. 3. Last (Sect. 4.3), we present modelling requirements that need to be fulfilled for a correct workflow execution.

4.1 Ontology for Modelling Entities

We built an ontologyFootnote 24 to describe the core modelling primitives from [20]. We depict the ontology in Fig. 3. We stay as closely as possible to their definitions and divert only if demanded by the environment of Read-Write Linked Data: Tasks are HTTP requests as atomic activities. Sentries contain a SPARQL ASK query in SPIN notation.Footnote 25 Correspondingly, we use SPARQL’s true boolean query result (we cannot use false, see Sect. 1) with sentries and guards. We introduce the class State of all states to, e.g. model the states of a stage: active and inactive.

Fig. 3
figure 3

Our ontology as UML class diagram with the following correspondence to RDF Schema: UML class depicts RDFS class; UML associations depict domain and range of RDF properties. UML inheritance depicts RDFS subclass. The core classes to describe models are depicted in bold; the core classes during execution are depicted dashed

In our approach, the information model is not contained in a database, but is spread over Read-Write Linked Data resources. Hence, we do not maintain the data attributes ourselves: We do not store information about the system we control, but instead, we retrieve the system state live in RDF over HTTP from the system itself. However, we do maintain the status attributes, for uniform access in RDF over HTTP. Thus, from now on we call all information regarding the state of the workflow status information. All other information about the system under control, and other relevant information from the environment, e.g. external services, we call environment information.

4.2 Operational Semantics

The following operational semantics are based on the PAC-rule-based semantics from [11]. We distinguish between set-up and flow conserving (FC) rules. Our rules can be found onlineFootnote 26 in N3 notation. For the sake of the example, we assume all status information to reside in a fictitious collection resource at http://ldpc.example/. To improve legibility, we sometimes use “\(\cdots \)” to denote when the values of other predicates stay the same.

4.2.1 Instance Set-Up Rules

The basic condition for all setup rules is:

$$\begin{aligned}&CS:=\textit{:StageInstance}(i) \wedge \textit{:isInstanceOf}(i,s) \\&\wedge \textit{:hasState}(i, \textit{:uninitialized}) \end{aligned}$$

Workflow  The setup can be requested by publishing an uninitialised instance:

$$\begin{aligned}&CS \longrightarrow \text {PUT}(i, \textit{:ArtifactInstance}(i)\\&\wedge \dots \\&\wedge \textit{:hasState}(i, \textit{:uninitialized})) \end{aligned}$$

Stage  Then, resources are created for all the model’s sub-stages, linked to their counterpart in the model, and with state inactive.

$$\begin{aligned}&CS \wedge \textit{:hasDescendantStage}(s, s_{child})\\&\longrightarrow \text {POST}(\texttt {http://ldpc.example/},\\&\textit{:isInstanceOf}(\text {\texttt {new}},s_{child})\wedge \textit{:inArtifactInstance}(\text {\texttt {new}},i)\\&\wedge \textit{:hasState}(\text {\texttt {new}}, \textit{:inactive})) \end{aligned}$$

Milestone  Also, resources are created for all the model’s milestones, linked to their counterpart in the model, and with state unachieved.

$$\begin{aligned}&CS \wedge \textit{:hasDescendantStage}(s, s_{child})\\&\wedge \textit{:hasMilestone}(s_{child},m)\\&\longrightarrow \text {POST}(\textit{http://ldpc.example/}, \\&\textit{:isInstanceOf}(\text {\texttt {new}},m)\wedge \textit{:inArtifactInstance}(\text {\texttt {new}},i) \\&\wedge \textit{:isAchieved}(\text {\texttt {new}}, \text {\texttt {false}})) \end{aligned}$$

4.2.2 Flow-Conserving Rules

All FCRs share the following conditions, which we abbreviate with FC:

$$\begin{aligned}&FC:=\textit{:ArtifactInstance}(S^I)\wedge \textit{:Stage}(s^M)\\&\wedge \textit{:StageInstance}(s^{I})\wedge \textit{:isInstanceOf}(s^I, s^M)\\&\wedge \textit{:inArtifactInstance}(s^{I}, S^{I}) \\&\wedge \textit{:hasState}(S^I,\textit{:active}) \end{aligned}$$

For better readability when writing about sentries, we define the term \(\textit{:hasBooleanResult}\) as abbreviation:

$$\begin{aligned}&\textit{:hasBooleanResult}(c, \text {\texttt {true}}) \\&:= \textit{:hasInstanceRelativeResult}(c, r) \\&\wedge \textit{:hasArtifactInstance}(c, S^I) \\&\wedge \textit{sparql-results:boolean}(r, \text {\texttt {true}}) \end{aligned}$$

FCR-1 For inactive stages, we check the guards. If a guard holds, the state of the stage is set to active and the task of the stage is executed.

$$\begin{aligned}&FC \wedge \textit{:hasGuard}(s^{M}, g)\wedge \textit{:hasHttpRequest}(s^{M}, \text {req}) \\&\wedge \textit{:allAncestorsActive}(s^{I}, \text {\texttt {true}})\\&\wedge \textit{:hasState}(s^{I}, \textit{:inactive})\wedge \textit{:hasCondition}(g, c)\\&\wedge \textit{:hasBooleanResult}(c, \text {\texttt {true}})\\&\longrightarrow \text {PUT}(s^{I}, \dots \textit{:hasState}(s^{I}, \textit{:active})) \\&\longrightarrow \text {req}(\cdot ,\cdot ) \end{aligned}$$

FCR-2 If an active stage’s validating sentry holds, set the stage and its children to inactive, and the stage’s milestone to achieved.

$$\begin{aligned}&FC\wedge \textit{:hasState}(s^{I}, active)\wedge \textit{:hasMilestone}(s^{M}, m^{M})\\&\wedge \textit{:hasValidatingSentry}(m^{M}, c^{a})\wedge \textit{:isInstanceOf}(m^{I}, m^{M})\\&\wedge \textit{:hasBooleanResult}(c, \text {\texttt {true}}) \\&\wedge \textit{:hasDescendantStage}(s^{M}, d^{M})\wedge \textit{:isInstanceOf}(d^{I}, d^{M})\\&\longrightarrow \text {PUT}(m^{I}, \dots \textit{:isAchieved}(m^{I}, \text {\texttt {true}}))\\&\longrightarrow \text {PUT}(s^{I}, \dots \textit{:hasState}(s^{I}, \textit{:inactive}))\\&\longrightarrow \text {PUT}(d^{I}, \dots \textit{:hasState}(d^{I}, \textit{:inactive})) \end{aligned}$$

FCR-3 When an achieved milestone’s invalidating sentry of a completed stage becomes true, the milestone is set to unachieved.

$$\begin{aligned}&FC\wedge \textit{:hasMilestone}(s^{M}, m^{M})\wedge \textit{:isInstanceOf}(m^{I}, m^{M})\\&\wedge \textit{:isAchieved}(m^{I}, \text {\texttt {true}})\wedge \textit{:hasInvalidatingSentry}(m^{M}, c)\\&\wedge \textit{:hasBooleanResult}(c, \text {\texttt {true}})\\&\longrightarrow \text {PUT}(m^{I}, \dots \textit{:isAchieved}(m^{I}, \text {\texttt {false}})) \\ \end{aligned}$$

FCR-4 Upon activation of a stage, set all of the stage’s and its descendant stages’ milestones to unachieved.

$$\begin{aligned}&FC\wedge \textit{:hasGuard}(s^{M}, g)\wedge \textit{:allAncestorsActive}(s^{I}, \text {\texttt {true}})\\&\wedge \textit{:hasState}(s^{I}, \textit{:inactive})\\&\wedge \textit{:hasCondition}(g, c)\wedge \textit{:hasBooleanResult}(c, \text {\texttt {true}})\\&\wedge \textit{:hasMilestone}(s^{M}, m^{M})\wedge \textit{:isInstanceOf}(m^{I}, m^{M})\\&\wedge \textit{:isAchieved}(m^{I}, \textit{true})\wedge \textit{:hasDescendantStage}(s^{M}, d^{M}) \\&\textit{:hasMilestone}(d^{M}, m_d^{M})\wedge \textit{:isInstanceOf}(m_d^{I}, m_d^{M})\\&\longrightarrow \text {PUT}(m^{I}, \dots \textit{:isAchieved}(m^{I}, \text {\texttt {false}})) \\&\longrightarrow \text {PUT}(m_d^{I}, \dots \textit{:isAchieved}(m_d^{I}, \text {\texttt {false}})) \\ \end{aligned}$$

4.3 Requirements to Workflow Modelling

We now add two conditions a workflow needs to fulfil. The disjointness conditions specify sets of sentries that need to be disjoint. If the workflow modeller violates one of them, a correct workflow execution cannot be guaranteed. In the following, “\(\lnot \)” denotes the logical negation. The reader can find the rationale behind those requirements in Section 6.2.1.5, where we evaluate our approach regarding correctness.

4.3.1 Disjointness of Achieving and Invalidating Sentry

For each milestone, the invalidating and achieving sentry must be disjoint. More formally:

$$\begin{aligned}&FC\wedge \textit{:hasMilestone}(s^{M}, m^{M})\\&\wedge \textit{:hasInvalidatingSentry}(m^{M}, c_I)\\&\wedge \textit{:hasAchievingSentry}(m^{M}, c_A)\\ \end{aligned}$$

the workflow modeller must ensure that \(\lnot (c_A \wedge c_I)\) always holds.

4.3.2 Disjointness of Guards and Milestones

For each milestone, the guard of a stage and must be disjoint with the achieving sentries of all milestones of its parent stages. More formally:

$$\begin{aligned}&FC\wedge \textit{:hasDescendantStage}(s^M, d^M)\\&\wedge \textit{:hasMilestone}(d^{M}, m_{d}^{M})\\&\wedge \textit{:hasAchievingSentry}(m_{d}^{M}, c_A)\\&\wedge \textit{:hasGuard}(s^M, g)\\&\wedge \textit{:hasCondition}(g, c_g)\\ \end{aligned}$$

the workflow modeller must ensure that \(\lnot (c_A \wedge c_g)\) always holds.

5 Rule-Based Querying of Data and Status Attributes

In our approach, we use SPARQL ASK queries for sentries. As the rule language we employ does not have special features to incorporate query results into conditions, we have to use the rule language to do query processing. We build on the SPINFootnote 27 notation to describe SPARQL queries in RDF.

For queries on the data attributes, SPARQL features are sufficient. Consider the following query that could, e.g. be a part of the sentry of g1 in Fig. 1:

figure d

If we identify the query using q, we can write (imagine \(bx_y\) as blank nodes):

$$\begin{aligned}&\textit{sp:Ask}(q) \wedge \textit{sp:where}(q,bl_1) \wedge \textit{rdf:first}(bl_1,bt_1)\\&\wedge \textit{sp:subject}(bt_1,\textit{http://iot-house.example/front\#door})\\&\wedge \textit{sp:predicate}(bt_1,\textit{rdf:type})\\&\wedge \textit{sp:object}(bt_1,\textit{smarthome:Door})\\&\wedge \textit{rdf:rest}(bl_1,bl_2) \wedge \textit{rdf:first}(bl_2,bt_2)\\&\wedge \textit{sp:subject}(bt_2,\textit{http://iot-house.example/front\#door})\\&\wedge \textit{sp:predicate}(bt_3,\textit{smarthome:isOpen})\\&\wedge \textit{sp:object}(bt_3,\text {\texttt {true}})\\&\wedge \textit{rdf:rest}(bl_2,\textit{rdf:nil}) \end{aligned}$$

SPIN makes use of reification, similar to RDF reification,Footnote 28 but with own RDF terms, to describe the triple patterns in the query. Using rules of the following form (with tp, s, p, o being variables), we can annotate the triple patterns using deductions that hold if a triple pattern is fulfilled:

$$\begin{aligned}&\textit{sp:subject}(tp,s)\wedge \textit{sp:predicate}(tp,p) \wedge \textit{sp:object}(tp,o)\\&\wedge p(s,o) \rightarrow \textit{sparql-results:boolean}(tp,\text {\texttt {true}}) \end{aligned}$$

Using further deductions, we can check whether all triple patterns in a query in SPIN notation are fulfilled, and eventually yield triples that serve in the conditions of the operational semantics, e.g. see the rule FCR-2 in Sect. 4.2.

For queries on the status attributes, this technique to annotate existing resources of triple patterns is not sufficient: when giving queries to be evaluated on the status attributes, we need a notion of the current artefact instance, and thus, we need new resources (and new terms in the vocabulary) using which we can annotate the triple patterns with the artefact instances in which they hold. For example, a guard g that requires a milestone \(m_1\) to be achieved should, first, only look in the same artefact instance to check whether \(m_1\) has been achieved, and should second, be defined at workflow design time, when the exact URI of the milestone instance for \(m_1\) is not known yet. For the new resources, we need to extend the expressivity of the rule language: For the considerations so far, a rule language without existentials in the head was sufficient. As we need to create new resources in the deductions to annotate triple patterns relative to artefact instances, we need existentials in the head.

To annotate sentries relative to artefact instances, we thus deduce blank nodes for each sentry in each active artefact instance (bn as blank node):

$$\begin{aligned}&\textit{:ArtifactInstance}(a)\wedge \textit{:hasState}(a,\textit{:active})\\&\wedge \textit{:inArtifactInstance}(si,a)\wedge \textit{:isInstanceOf}(si,sm)\\&\wedge \textit{:hasGuard}(sm,g)\wedge \textit{:hasCondition}(g,s)\\&\rightarrow \textit{:hasArtifactInstanceRelativeResult}(s,bn)\\&\wedge \textit{:hasArtifactInstance}(bn,a) \end{aligned}$$

Now, we can follow the same pattern when annotating triple patterns.

Last, we need a way to relate the SPARQL ASK query to the artefact instance. To this end, we introduce special triple patterns to be used in SPARQL ASK queries in SPIN notation. Those triple patterns have the form:

$$\begin{aligned}&\textit{:inArtifactInstance}(bt_1,\textit{:thisArtifactInstance})\\&\wedge \textit{:hasAchievedMilestone}(bt_1, m_1) \end{aligned}$$

where \(bt_1\) is a triple pattern, similar as in the presented SPARQL query in SPIN notation, \(m_1\) is the URI of the milestone that the sentry is supposed to consider, and \(\textit{:thisArtifactInstance}\) is a magic term that represents the current artefact instance.

6 Evaluation

In this section, we evaluate our approach. We show the correctness of our approach using theoretical considerations and proofs in 6.2 after introducing some notation 6.1. Then, we briefly report on the applicability of our approach in Sect. 6.3. Last, we provide a performance evaluation in Sect. 6.4.

6.1 Notation

To improve readability, we introduce a new notation and simplify the concept of models and its instances. In the previous chapter, we made a difference between stage instances (\(s^{I}\)) and stage models (\(s^{M}\)). The FCRs, however, only consider one-stage instance of one-stage model; hence, we can simplify the notation by conflating the two. Therefore, in the new notation, we choose definitions that heavily build on the instances only. We also compact the notation by defining sets that contain all resources that fulfil certain conditions from the rules in Sect. 4. Table 1 contains the relevant definitions of sets and functions we use in the following.

Table 1 Notation used in the proofs

On top of the sets and function that describe one instant in time, we also need to define the progression of time. To describe the progressing execution of a workflow, we look at it as a series of snapshots of the workflow. A snapshot contains the assignments of all status information variables of a workflow. The sequential of snapshots then describes the stepwise execution. To link a snapshot to its successor, we define a state transition function. The set of all possible snapshots is \(\Sigma \), and the state transition function is

$$\begin{aligned} f : \Sigma \rightarrow \Sigma \end{aligned}$$

Given a snapshot \(\sigma \in \Sigma \), \(f(\sigma )\) determines the subsequent state of the workflow, derived by applying the proposed rules. We abbreviate \(f(\sigma )\) with \(\sigma '\). \(\sigma (x)\) denotes the value of status information variables (i.e. for a milestone or stage) x in snapshot \(\sigma \). Correspondingly, \(\phi _{\sigma }\) determines the results of a sentry in snapshot \(\sigma \).

We now apply this notation to the FCRs. Table 2 lists the FCRs using the new notation. Next, we shortly provide a recap of the intuition behind each rule:

FCR-1:

activates stages. It requires that a stage is inactive, the stage’s ancestors are active, and the stage’s guard is true.

FCR-2:

sets milestones achieved and inactivates the corresponding stages as well as its substages. It requires a stage to be active and the achieving sentry of a milestone of the stage to be true.

FCR-3:

invalidates milestones. It requires a milestone to be achieved and the milestone’s invalidating sentry to be true.

FCR-4:

invalidates all achieved milestones when a stage gets activated again. It requires all parent stages to be active, the stage itself to be inactive and the guard of the stage to be true.

Table 2 Flow-conserving rules: conditions and actions

We also reformulate the two disjointness conditions using the new notation:

Disjointness of Achieving and Invalidating Sentries. A workflow must be modelled in a way that the guard of a child stage is never satisfied at the same time when the stage’s milestones are satisfied.

$$\begin{aligned} \begin{aligned} \forall s \in S, m_s \in \mathcal {M}_{s}: \lnot (\phi _{\sigma }^{+}(m_{s}) \wedge \phi _{\sigma }^{-}(m_{s})) \end{aligned} \end{aligned}$$
(1)

Disjointness of Guards and Milestones. The sentry of a guard and the achieving milestones of its parent must be disjoint.

$$\begin{aligned} \begin{aligned}&\forall s, s_d \in S, m_s \in \mathcal {M}_{s}, s_d \in D_{s}, g_{s_d} \in G_{s_d}: \\&\quad \lnot (\phi _{\sigma }(g_{s_{d}}) \wedge \phi _{\sigma }^{+}(m_{s}) \end{aligned} \end{aligned}$$
(2)

6.2 Correctness

To show the correctness of our approach, we need to discuss two aspects in the light of the environment of Read-Write Linked Data: First, we need to have a look at the well-formedness criterion of [11] for workflows, which is a prerequisite for correct handling of workflows by their rules. Second, we need to investigate whether the invariants of [11] hold for our approach.

We therefore present in Sect. 6.2.1 the rationale behind the well-formedness criterion and show why the well-formedness criterion is not directly applicable to our approach. Then, we discuss the well-formedness criterion’s implications and why they are valid in our approach nevertheless. Next, in Sect. 6.2.2, we present and formalise the invariants for our environment and show using mathematical induction that our FCRs do not violate those invariants.

6.2.1 Well-Formedness

In this section, we discuss the well-formedness criterion of [11]. The authors of said paper formulated this criterion to address non-intuitive behaviour resulting from different linearisations of the rules to be applied on incoming events. In contrast, our approach works without events and with parallel evaluation of rules. Therefore, we first explain the necessity for the well-formedness criterion in case of event data. Next, we show why we do not need to linearise the application of our rules on the condition part, as we only work with state data. However, new conflicts on the action part can arise from our parallel processing, which we discuss subsequently. We show that most conflicts cannot occur due to the nature of the conditions, and provide a rationale why the remaining conflicts are minor.

Well-formedness to Order M-Steps in B-Steps Remember that the execution in the GSM approach as described by [11] is event-triggered. A business step (B-Step) is the full incorporation of the implications of one event into the current state, i.e. the variable assignment on the data and status attributes. This incorporation includes the sequential evaluation of all sentries. If upon an event, a sentry’s condition is met, its value is true. Corresponding updates to the variable assignment are enacted immediately. Thus, variables change over time and other sentry’s conditions who would have held at the beginning of the incorporation, do not hold once it is this sentry’s turn. This may lead to non-intuitive behaviour of the workflow, which [21] address using a well-formedness criterion.

We illustrate the non-intuitive behaviour with an example in Fig. 4. Think of two guards \(g_{1}\) and \(g_{2}\) with corresponding stages \(s_{1}\) and \(s_{2}\). Guards \(g_{1}\) and \(g_{2}\) only depend on event e. Executing \(s_{1}\) before \(s_{2}\), or vice versa, leads to the same result. This is not always the case though. Now, \(g_{1}\) depends on event e and on \(s_{2}\) being active. Let event e fire. If \(g_{1}\) is evaluated before \(g_{2}\), \(s_{1}\) does not become active, but \(s_{2}\) becomes active. If \(g_{2}\) is evaluated first, both become active. Intuitively, only \(s_{2}\) should become active. Without a fixed order of execution, the incorporation of e is ambiguous.

Fig. 4
figure 4

Example for sensitivity to order of guard evaluation in a database environment. Case a) is not sensitive, while b) is sensitive

To address this ambiguity, Hull et al. [21] introduce the micro-step (M-Step), into which a B-Step can be subdivided. Evaluating \(g_{1}\) and \(g_{2}\) each is M-Steps. A B-Step then consists of multiple M-steps. To handle the ordering of execution of M-steps [21] introduce the dependency graph. The dependency graph models dependencies between guards and milestones. A guard or a milestone, say \(g_1\) or \(m_1\), depends on another milestone \(m_2\), if the truth of the sentry of \(g_1\) or \(m_1\) is dependent on \(m_2\) being achieved or not. The dependency graph is a directed graph with sentries as nodes and dependencies between them as edges. The graph induces a topological sorting for execution that eliminates ambiguity and maintains an “intuitive” order of execution. This order requires the graph to be acyclic. Hence, Hull et al.  [21] define that a GSM model is well-formed iff its dependency graph is acyclic.

Parallel Processing of State Data In our approach, we use time-triggered operational semantics. Instead of using events, we query status of the workflow state and the environment periodically to determine the current state. Correspondingly, we apply updates in bulk instead of updating the status information variables as we go. Each state in the sequence is the result of the application of our entire ruleset. Above, we define the function f to describe this state transfer.

Therefore, the B-step concept is not directly applicable to our approach. A B-step includes every action which depends on an event e. If there are no events, we cannot associate any actions with them. Looked at from the perspective of the event-triggered GSM semantics, our approach rather performs M-steps periodically and in parallel. Still, we need to discuss the ramifications of non-intuitive behaviour and updates applied in bulk.

As an illustration, consider again the example from Fig. 4, which we treat using our notation. With the example’s stages \(s_{1}\) and \(s_{2}\) and their guards \(g_{1}\) and \(g_{2}\), we define:

$$\begin{aligned}&\gamma \in \{{\textit{true, false}}\}\text { a boolean variable}\\&\quad \gamma _{\sigma } \text { the value of } \gamma \text { in state } \sigma \\&\quad g_{1} = \textit{true} \text { if } s_{2}=\textit{active} \text { and }\gamma , \text {else }{} \textit{false}\\&\quad g_{2} = \textit{true} \,\text {if } \gamma ,\text { else }{} \textit{false} \end{aligned}$$

If we now assume \(\gamma _{\sigma } = true\), then

$$\begin{aligned} \phi _{\sigma }(g_{1}) = {\textit{false}} \wedge \phi _{\sigma }(g_{2}) = {\textit{true}} \end{aligned}$$

and

$$\begin{aligned} \sigma '(s_{1}) = {inactive} \wedge \sigma '(s_{2}) = {active} \end{aligned}$$

We observe that our approach does not reproduce the “non-intuitive” behaviour. The reason is that each state \(\sigma \) is immutable and \(\sigma '\) is constructed independently, but based on \(\sigma \). Therefore, there is no need to order the execution of sentry evaluations and their corresponding consequences. In other words, there are no conflicts between changing the state of a status information or environmental variables before or after observing it. There is also no need to order the M-Steps using a dependency graph because \(\sigma \) stays the same for all g.

Write Conflicts Above, we reason why an order for reading and changing values of status information variables in our approach is not necessary. We provide the immutability of states as main reason. Still, there is a possibility that we produce ambiguous state derivations. In this section, we illustrate the issue of such ambiguity, where and if it occurs, and its ramifications if it does occur.

Our approach consists of multiple rules, that, depending on the truth of a condition, derive new values for a set of status information variables as action. We do not know in which order the rules are applied. In the previous section, we show that the order of rules is not relevant, if one rule derives a value for a status information variables and another rules accesses it. If now two rules each derive a different value for the same status information variables in the subsequent state, the order becomes relevant. The value of this status information variables is ambiguous. We call it a conflict in the ambiguous variables. We identify all conflicts that may occur and discuss each one in detail.

Recall that the function f constructs for a state \(\sigma \) its subsequent state \(\sigma '\) and Table 2 specifies all rules and their corresponding actions in f. We now list all combination of rules that may derive ambiguous results. We skip derivations that are equal, since they are not ambiguous. In the following, we write FCR-1 \(\rightarrow \sigma '(m)=achieved\), if from FCR-1 follows that milestone m is achieved in the next snapshot.

Conflicting Write by FCR-2 + FCR-3: We look at a possible conflict in \(\sigma '\) for a milestone’s state, \(\sigma '(m_{s})\), with FCR-2 \( \rightarrow \sigma '(m_{s}) = achieved\) and FCR-3 \( \rightarrow \sigma '(m_{s}) = unachieved\). This requires that the conditions of both rules hold:

$$\begin{aligned} \begin{aligned} \forall s \in S, m_{s} \in \mathcal {M}_{s}: \sigma (s)&= {\textit{active}} \wedge \phi _{\sigma }^{+}(m_{s}) \\ \wedge \sigma (m_{s})&= {\textit{achieved}} \wedge \phi _{\sigma }^{-}(m_{s}) \end{aligned} \end{aligned}$$

Besides the fact that an overlapping validating and invalidating sentry is semantically questionable, all logical operators are conjunctions. Therefore, all operands must hold individually and all their combination. This requires:

$$\begin{aligned} \forall s \in S, m_{s} \in \mathcal {M}_{s}: \phi _{\sigma }^{+}(m_{s}) \wedge \phi _{\sigma }^{-}(m_{s}) \end{aligned}$$

Under the disjointness condition of achieving and invalidating sentries (Eq. 4.3.1), this is impossible. \(\implies \) There is no \(\sigma \) that leads to a conflict of FCR-2 and FCR-3 in \(m_{s}\).

Conflicting Write by FCR-1 + FCR-2 FCR-1 and FCR-2 induce two possible conflicts in \(\sigma '\): The activation of a stage by FCR-1 \(\rightarrow \sigma '(s)=active\) can conflict with both consequences of FCR-2 \(\rightarrow \sigma '(s)=inactive \wedge \sigma '(s_d)=inactive\).

We discuss the first part of the conflict: FCR-1 \(\rightarrow \sigma '(s) = active\) and FCR-2 \(\rightarrow \sigma '(s) = inactive \wedge \sigma '(s_{d}) = inactive\). By combining the conditions of both rules, we get:

$$\begin{aligned} \begin{aligned}&\forall s \in S, s_{a}\in A_s, g_{s} \in G_{s}, m_{s} \in \mathcal {M}_{s}: \sigma (s) = inactive \\&\wedge \sigma (s_{a}) = active \wedge \phi _{\sigma }(g_{s}) \wedge \sigma (s) = active \wedge \phi _{\sigma }^{+}(m_{s}) \end{aligned} \end{aligned}$$

Again, all operators are conjunctions and thus all combinations of operands must be satisfied. The following combination contains a contradiction:

$$\begin{aligned} \begin{aligned} \forall s \in S: \sigma (s) = inactive \wedge \sigma (s) = active \end{aligned} \end{aligned}$$

\(\implies \) There is no \(\sigma \) that leads to a conflict of FCR-1 and FCR-2 in s

Next, we consider the second part of the conflict of FCR-1 and FCR-2: The case of the descendant stage \(s_{d}\), to which FCR-2 intends to write. We thus assume that FCR-1 applies to stage \(s_{d}\). We start with FCR-1 \(\rightarrow \sigma '(s_{d}) = active\) and FCR-2 \(\rightarrow \sigma '(s) = inactive \wedge \sigma '(s_{d}) = inactive\) and construct the conjunction. By combining the conditions of FCR-1 and FCR-2, we get:

$$\begin{aligned} \begin{aligned}&\forall s\in S, s_{d}\in D_s, s_{d_a}\in A_{s_d}, g_{s_{d}} \in G_{s_{d}}, m_{s} \in \mathcal {M}_{s}: \\&\sigma (s_{d}) = inactive \wedge \sigma (s_{d_{a}}) = active \\&\wedge \phi _{\sigma }(g_{s_{d}}) \wedge \sigma (s) = active \wedge \phi _{\sigma }^{+}(m_{s}) \end{aligned} \end{aligned}$$

Note that \(A_{s_d}\) is the set of all ancestors of \(s_d\), and therefore, \(A_{s_d} \supset A_{s}\) holds. The contradiction we observe when applying FCR-1 to the first part of the consequent of FCR-2 does not occur anymore. Given this, we cannot assure that there are no write conflicts just by contradiction of conditions. Instead, we have to avoid this conflict when modelling the workflow. The disjointness condition of guards and milestones (Eq. 2) prescribes that a guard and its parent’s milestones have disjoint sentries. That is, \(\phi _{\sigma }^{+}(m_{s})\) and \(\phi _{\sigma }(g_{s_{d}})\) cannot be true the same time. Hence, the equation above is never satisfied if we impose the disjointness condition.

\(\implies \) There is no \(\sigma \) that leads to a conflict of FCR-1 and FCR-2 in \(s_d\)

We remark that the effects of the described ambiguity may be considered minor. The consequence of the ambiguity is that a child stage \(s_{d}\) is either triggered once again or not. To put it in context, the stage \(s_{d}\) is supposed to be deactivated because its parent is deactivated. The consequence is that \(s_{d}\) may perform its task and it does not abort. After finishing, its state changes to inactive and its milestone is set to achieved. As soon as the parent stage becomes active again, the milestone is set back to unachieved. Of course, this requires the task of \(s_{d}\) to finish. Other assumptions would require the ability to abort tasks, which is beyond the scope of this work. For the sake of completeness, we provide the disjointness condition as a modelling constraint.

Conflicting Write by FCR-2 + FCR-4: Lastly, we discuss a possible conflict in FCR-2 and FCR-4. There is a conflict in the state of the milestone by FCR-2 \( \rightarrow \sigma '(m_{s}) = achieved\) and FCR-4 \(\rightarrow \sigma '(m_{s}) = unachieved \wedge \sigma '(m_{s_{d}}) = unachieved\). Again, we combine the conditions:

$$\begin{aligned} \begin{aligned}&\forall s \in S, s_a \in A_s, s_d \in D_s, m_{s} \in \mathcal {M}_{s}, g_{s} \in G_{s} : \\&\quad \sigma (s) = active \wedge \phi _{\sigma }^{+}(m_{s})\wedge \sigma (s_{a}) = active \\&\qquad \wedge \sigma (s) = inactive \wedge \phi _{\sigma }(g_{s}) \wedge \phi _{\sigma }^{+}(m_{s}) \end{aligned} \end{aligned}$$

Since \(\sigma (s)=active\) contradicts to \(\sigma (s)=inactive\), this conflict cannot occur for the same stage s.

\(\implies \) There is no \(\sigma \) that leads to a conflict of FCR-2 and FCR-4 in s

As in the previous rule combination with FCR-2, we also examine the case where FCR-2 applies to the descendant stage \(s_{d}\). Then,

$$\begin{aligned} \begin{aligned}&\forall s \in S, s_{d}\in D_s, s_{a}\in A_s, m_{s} \in \mathcal {M}_{s}, m_{s_{d}} \in \mathcal {M}_{s_{d}},\\&\quad g_{s} \in G_{s} : \\&\quad \sigma (s_{d}) = active \wedge \phi _{\sigma }^{+}(m_{s_{d}}) \wedge \sigma (s_{a}) = active \\&\quad \wedge \sigma (s) = inactive \wedge \phi _{\sigma }(g_{s})\wedge \phi _{\sigma }^{+}(m_{s}) \end{aligned} \end{aligned}$$

Now we observe that \(\sigma (s) = inactive\) and \(\sigma (s_{d}) = active\) do not contradict. So, if a stage s is inactive, its descendants \(s_{d}\) need to be active. We did discuss a situation where this may happen in the previous paragraph when we considered the combination FCR-1 + FCR-2, but under the disjointness condition this inconsistency is eliminated. Hence, we can conclude that given a workflow that aligns with the disjointness condition, no conflicts occur.

\(\implies \) There is no \(\sigma \) that leads to a conflict of FCR-2 and FCR-4 in s and/or \(s_{d}\)

As result, we know that under the disjointness condition, for each state \(\sigma \), \(\sigma '\) is unambiguously determined. Formally, we can say:

$$\begin{aligned} \sigma \implies \sigma ' \end{aligned}$$

In addition to that, we even can conclude that if any rule derives a new value for a status information variables, there exists no other rule that derives a different value, even if we do not know the order of the rule execution. This implies that if there is any status information variables \(\sigma (x)=y\), and any individual FCR r deriving a value \(\sigma _r(x)'=y'\), then \(\sigma '(x)=y'\) (where \(\sigma _r(x)=y\) is the value y of x if only r is applied). Thus, we can write:

$$\begin{aligned} \begin{aligned} \sigma (x)=y \wedge \sigma _{r}(x)=y' \implies \sigma '(x)=y' \end{aligned} \end{aligned}$$
(3)

6.2.2 GSM Invariants

The invariants GSM-1 and GSM-2 define combinations of status attributes that are inconsistent. If one of the invariant is violated, the workflow state is inconsistent.

  1. GSM-1

    “If a stage S owns a milestone m, then it cannot happen that both S is active and m has status true. In particular, if S becomes active, then m must change status to false, and if m changes status to true, then S must become inactive.” [11]

    $$\begin{aligned}&\forall s \in S, m \in \mathcal {M}_{s}: \\&\sigma (s) = active\implies \sigma (m) \ne achieved \\&\wedge \sigma (m) = achieved \implies \sigma (s) \ne active \end{aligned}$$
  2. GSM-2

    “If stage S becomes inactive, the executions of all substages of S also become inactive.” [11]

    $$\begin{aligned}&\forall s\in S, s_{d}\in D_s: \\&\sigma (s) = inactive \implies \sigma (s_{d})= inactive \end{aligned}$$

Additionally we define a function \(c : \Sigma \rightarrow \{ true, false \}\), that determines whether both GSM invariants hold on a snapshot \(\sigma \).

None of these invariants are allowed to be violated at any time. As the first step of our proof, we show that all possible states induced by the FCRs do not violate the GSM invariants using mathematical induction.

Theorem: GSM-1 and GSM-2 are not violated throughout the workflow execution.

Proof: We apply the set of FCRs to the information model. One snapshot \(\sigma \) contains all data concerning the workflow’s state, as well as environment values at the beginning of each iteration. The state of the workflow and the environment values correspond to status and data attributes, respectively. \(\sigma _{0}\) represents our initial workflow state after the initialisation. We now proof the theorem using mathematical induction.

Base case: \(\sigma _{0}\) : No stage is activated yet \(\implies c(\sigma _0)=true\). \(\checkmark \)

Step case: \(\sigma \rightarrow \sigma '\) : In order to get into an inconsistent state one of the invariants must be violated. We distinguish two cases—a violation of GSM-1 and a violation of GSM-2:

  • \(GSM-1\) To infer the consistency of \(\sigma '\) we assume the contrary. We assume a set of rules that derive at least one triple so that \(c(\sigma ') = {\textit{false}}\). To achieve that given \(c(\sigma ) = {\textit{true}}\), there must be a milestone m and a stage s which either satisfy (C1) or (C2):

    $$\begin{aligned}&(\text {C1}): \sigma (m) = unachieved \wedge \sigma '(m) = achieved \\&\wedge \sigma '(s) = active\\&(\text {C2}): \sigma (s) = inactive \wedge \sigma '(s) = active \\&\quad \wedge \sigma '(m)=achieved \end{aligned}$$

    In words, case (C1) describes the case where a milestone becomes achieved and its stage stays active. Case (C2) corresponds to the case, where a stage becomes active although one of its milestones is still achieved.

  • Case(1) :  We assume that condition (C1) holds. Since only FCR-2 leads to the achievement of a milestone, we know that \(\sigma (m) = unachieved\) and \(\sigma '(m) = achieved\) is true, if and only if the condition of FCR-2 is satisfied. Hence, we conclude that the action of FCR-2 holds as well. From equation 3, we know that \(\sigma '(m) = unachieved\). This is a contradiction to our condition (C1).

  • Case(2) :  We assume condition (C2) holds. Because (C2) requires a stage becoming active in \(\sigma '\), we conclude that the condition FCR-1 must be satisfied. Considering equation 3, we know that

    $$\begin{aligned}&\sigma (s) = inactive \wedge \sigma (s_a) = active \wedge \phi _{\sigma }(g_s) \\&\wedge \sigma '(m_s) = active \end{aligned}$$

    If we consider the assumption that there exists a milestone \(m_s \in \mathcal {M}_s\) and \(\sigma '(m_s)=achieved\), either the milestone must have been already achieved in \(\sigma \) or it became achieved in \(\sigma '\). FCR-1 and FCR-4 share the almost same condition. The difference is that FCR-4 sets all corresponding achieved milestones to unachieved. Hence, in the first case, if \(\sigma (m_s)={\textit{achieved}}\), FCR-4 implies \(\sigma '(m_s)=\)unachieved. By Eq. 3, we know that \(\sigma '(m_s)=\)unachieved. This contradicts assumption (C2). In the latter case, FCR-2 must hold, since it is the only rule that implies an achieved milestone. FCR-2 also implies \(\sigma '(s)=inactive\) though, which also contradicts our assumption.

  • \(GSM-2\) Similar to GSM-1, we show the contrary by assuming there are rules that derive at least one triple in such a way, that \(c(\sigma ') = \text {\textit{false}}\). This requires

    $$\begin{aligned}&\exists s \in S, s_{d} \in D_s: \\&\sigma (s) = inactive \wedge \sigma '(s_{d}) = active \end{aligned}$$

    Because FCR-1 requires all ancestors to be active, a violation of this condition is only possible if a parent stage is set to inactive while its parent stays active. Only FCR-2 leads to a stage being set to inactive. While it implies \(\sigma (s)=inactive\), it also implies \(\sigma (s_d)=inactive\) for all \(s_d \in D_{s}\). Again, by Eq. 3, we know that \(\sigma (s_d)=inactive\) holds for all \(s_d \in D_{s}\).

As result we conclude that, under the disjointness conditions, f does not imply a transition from a state \(\sigma \) with \(c(\sigma )=true\) to a state \(\sigma '\) with \(c(\sigma )=false\). \(\implies \) Step case \(\checkmark \)

By the principle of mathematical induction, we have shown that the invariants GSM-1 and GSM-2 are not violated. \(\square \)

6.3 Applicability

In Sect. 1, we described different scenarios and deployments from automotive, avionics, manufacturing, and Internet of Things, where our approach could be applied. To publicly showcase the approach presented in this paper, we build an small conference demonstrator [1] based on Internet of Things devices with Read-Write Linked Data interfaces. The implementation this demonstrator can be found online.Footnote 29 For the status attributes and rule interpreter, the demonstrator uses LDBBCFootnote 30 as Linked Data Platform Container implementation, and Linked Data-FuFootnote 31 [40] as N3 rule interpreter with ASM4LD [23] operational semantics.

6.4 Performance

Our approach takes the GSM approach and brings it to Read-Write Linked Data. Therefore, we contrast our approach with the converse, namely to bring Read-Write Linked Data to GSM. We compare: our operational semantics deployed on Linked Data-Fu and status attributes maintained in LDBBC, with the CMMN implementation in Camunda,Footnote 32 a commercial process and case management suite (CMMN,Footnote 33 the Case Management Model and Notation, is a standard that is closely related to GSM [11]).

Both engines therefore need to make HTTP requests to sources that provide RDF and reason over the data using RDF using rules that implement the RDFS semantics given to the engines for data integration.

With Camunda of course not built for that workload, we now describe our assumptions and implementation: For data processing, Camunda has SPIN,Footnote 34 a library to extend the process languages they support with data processing of XMLFootnote 35 and JSON in a scripting fashion, such that we can use a classless, declarative approach to define the processing steps, in line with the declarative input of rules to Linked Data-Fu. We chose the tree-based XML as data format, as one can serialise RDF graphs in XML trees using the RDF/XML\(^{16}\) serialisation format, and the querying support of the XML stack is better supported in the extensions of Camunda. We then specified the processing steps as BPMNFootnote 36 process, which we plugged into our CMMN diagram as process task. Our BPMN process mimics Linked Data-Fu’s operational semantics (cf. Sect. 3.6) by parallel downloading RDF/XML, and subsequentFootnote 37 parallel rule evaluation of the RDFS semantics. As the processing of arbitrary RDF/XML using the XML querying stack is a research endeavour of its own, where one reason is that different XML trees can represent the same RDF graph [5], we assume a deterministic mapping from triples to XML trees, where each triple gets a dedicated rdf:Description node, as employed by many popular RDF/XML serialisers.Footnote 38 Furthermore, we assume ground triples without literals and lists, as blank nodes, datatypes and lists would require specialised treatment in RDFS and RDF/XML. Then, we can give the rules to implement RDFS using BPMN Script Tasks implemented declaratively in XQuery 3.1.Footnote 39 Those parallel tasks with rules are placed in a loop that runs until no further deductions are derived (and thus, the fixpoint is calculated).

Fig. 5
figure 5

The workflow we use in our evaluation

For our tests, we crafted a workflow, which we designed to include different constellations of stages, guards, and milestones. Specifically, the workflow includes nested stages and milestones whose sentries check whether other milestones have been reached. The workflow contains 8 stages, starts with two sequentially arranged stages, and leads into a combination of stages with 2 as the maximum level of nesting. Thereby occur some interesting situations like parallel execution of s2 and s4 where s4 is nested in another stage s3. For a deterministic evaluation, the sentries are designed to be always fulfilled, but their queries need to be evaluated. We designed the data part of the sentries to check for data that needs to be derived using reasoning. In case of Linked Data-Fu, those are SPARQL ASK queries in SPIN notation, in case of Camunda, those are XPath 1.0Footnote 40 queries within an ELFootnote 41 expression. We did not include invalidating sentries for the milestones in order not to have to build the equivalence of dynamic data attributes, which could also reduce the determinism and the repeatability of the evaluation. Although there is no specific story to that workflow you can think of it as a simplification of the Order-to-Design workflow example from [11]. The workflow model in both implementations, the BPMN for Linked Data processing, and the data to be retrieved can be found online.Footnote 42

We ran our experiments on a MacBook Pro with an Intel Core i5-5257U processor with 2.7 GHz, 4 threads, and 8 GB RAM running OpenJDK 1.8.0_131 on Mac OSX Version 10.14.6 and allow for a warm-up time of 10s. We repeated our experiments 5 times. For our set-up, we used Linked Data-Fu 0.9.13pr1 and LDBBC 0.0.5. For the Camunda set-up, we used Camunda 7.14 Community Edition, where we needed to remove the limit of 4000 characters in certain columns of the database Camunda runs on.

We show our results in Fig. 6. We see that the declarative Read-Write Linked Data processing approach in Linked Data-Fu outperforms the approach with a declarative implementation of data retrieval and reasoning from within the Camunda process engine. On top, the Linked Data-Fu approach retrieves and reasons with every execution cycle such that the queries run on fresh data, where the Camunda-based approach only retrieves and reasons in 3 of the stages. This limitation of our implementation is due to difficulties we faced when connecting the retrieval and reasoning part to the CMMN workflow lifecycle. Therefore, our results for Camunda only serve as a lower bound. On the other hand, we see for Linked Data-Fu that the handling of workflow instances in data structures and code not optimised for that purpose puts considerable load on the system: Already between 1 and 9 instances, the runtime doubled.

Fig. 6
figure 6

Results from 5 runs of our evaluation: Time in ms until the termination of the last workflow when starting n workflows in parallel

7 Conclusion and Discussion

We presented an approach to specify and execute agent specifications in the form of data-centric workflows in Read-Write Linked Data, i.e. an environment of semantic knowledge representation and reasoning. To this end, our approach consists of an ontology, and operational semantics. We gave the operational semantics in a rule language for Read-Write Linked Data, derived requirements for modelling and discussed the rule expressivity required for querying data and status attributes. We showed the correctness of our rules for the operational semantics and provided a performance evaluation.

While we envision our approach to be particularly useful in small-scale scenarios where agents interact with a handful of resources with only few workflow instances, we evaluated our approach against a workflow engine, which is built for scenarios with many instances. In the evaluation, our general-purpose Read-Write Linked Data processor made to maintain workflow instance state outperformed a workflow engine made to perform Read-Write Linked Data processing with data retrieval and reasoning. This is despite the overhead people often fear when considering polling-based approaches. Viewed from a broader perspective, we remark that in essence, our agent behaviour specification encodes the behaviour part missing in the integration standards from the web architecture such as HTTP for interaction and symbolic reasoning in RDFS for data integration. If we take BPM solutions as way of doing integration in practice by specifying the behaviour that orchestrates system components, our evaluation agrees with [34] that there is still a way to go until the integration capabilities of the web architecture can be fully exploited in practice.