Discovering workflow nets of concurrent iterative processes

Tapia-Flores, Tonatiuh; López-Mellado, Ernesto

doi:10.1007/s00236-023-00445-5

Discovering workflow nets of concurrent iterative processes

Original Article
Open access
Published: 14 September 2023

Volume 61, pages 1–21, (2024)
Cite this article

Download PDF

You have full access to this open access article

Acta Informatica Aims and scope Submit manuscript

Discovering workflow nets of concurrent iterative processes

Download PDF

Tonatiuh Tapia-Flores² &
Ernesto López-Mellado¹

Abstract

A novel and efficient method for discovering concurrent workflow processes is presented. It allows building a suitable workflow net (WFN) from a large event log $\lambda $, which represents the behaviour of complex iterative processes involving concurrency. First, the t-invariants are determined from $\lambda $; this allows computing the causal and concurrent relations between the events and the implicit causal relations between events that do not appear consecutively in $\lambda $. Then a 1-bounded WFN is built, which could be eventually adjusted if its t-invariants do not match with those computed from $\lambda $. The discovered model allows firing all the traces in $\lambda $. The procedures derived from the method are polynomial time on $|\lambda |$; they have been implemented and tested on artificial logs.

Inferring the Repetitive Behaviour from Event Logs for Process Mining Discovery

Foundations of Process Discovery

Discovering the “Glue” Connecting Activities

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Modelling is a crucial stage in the design of a system; obtained models from systems functioning specifications are helpful to synthesise control or management systems of discrete event processes. Conversely, for analysing an existing system, obtaining a model of such a system is worthwhile. In this case, models are built manually by an expert on the process or automatically by a computer tool that handles the behaviour exhibited by the process.

Computer aided modelling

The automated modelling of discrete event processes from event data in the form of event sequences issued by the process is a challenging approach to performing reverse engineering analysis. Nowadays, several work teams in the research areas of discrete event systems (DES) and workflow management systems (WMS) are interested in this research matter.

The first publications on automated modelling, called language learning techniques, have been proposed in computer science. The goal was to obtain formal models (finite automata or grammars) to represent languages from positive samples of accepted words [1, 2].

Process identification

In the DES area, the problem is named process identification; several approaches and methods have been proposed to build models that represent the behaviour of automated manufacturing processes, exhibited as sequences of events recorded during the process execution. The incremental approach proposed in [3, 4], obtains 1-bounded interpreted Petri nets (PN) from a large stream of the process output signals. In Giua and Seatzu [5], a method based on the statement and solution of an integer linear programming problem is proposed; it allows building PN from a set of event sequences. Extensions to this method have been proposed in [6, 7]. Then, in Klein et al. [8], a technique to determine finite automata from input–output sequences is presented; it is applied to fault detection in industrial processes; an extension to this method allows obtaining distributed models [9]. In Estrada-Vargas et al. [10], input–output identification of automated manufacturing processes is addressed; the identification method builds an interpreted PN from a set of sequences of input–output vectors sampled from the controller during the cyclic operation of the system. This method is extended to incrementally update the model when new sequences are processed [11]. Surveys on DES identification are presented in [12, 13].

Process discovery

In the research area of WMS [14,15,16], the equivalent concern is named process discovery; in its statement, the systems dealt with are business processes whose behaviour is presented by a multiset of task sequences from a finite alphabet. Earlier methods have been proposed in [17, 18]; in [19], Agrawal proposed a method in which a finite automaton, called the conformal graph, is obtained. In Cook [18], presented a probabilistic technique to determine the concurrent and direct relations between tasks; the obtained model is a graph akin a PN. Later, in [20], a technique to discover finite automata from task sequences is presented. In Wang et al. [21] a discovery method called Algorithm Alpha is presented; in this method, an event log composed of several traces is mined, yielding a subclass of PN called workflow net (WFN). Numerous publications present extensions of this algorithm, namely [21,22,23]. Wide overviews on discovery techniques are [24,25,26,27,28].

Problem and approach

The aim of the discovery/identification methods is to build models that represent the behaviour captured in the event log. However, a common issue in the current discovery techniques is that the obtained models represent more behaviour than that exhibited by the process; that is, the language of the model built is greater than that represented by the event log. This problem arises when concurrent iterative processes are dealt with.

In this paper, a novel model discovery method is proposed. The technique allows synthesising a suitable WFN from a large log $\lambda $ of task traces, which include the iterative behaviour drawn from complex business processes that exhibit concurrency and iterations. The obtained WFN has a reduced surplus language with respect to the event log.

The discovery method follows the approach held in [29, 30] for DES and it is adapted to the WMS field for dealing with WFN; furthermore, an important extension is proposed allowing addressing more complex behaviours such as causal dependencies between tasks that do not appear consecutively in the traces; this feature allows reducing the exceeding language. The method determines from a log $\lambda $, causal and concurrency relations between tasks and the t-invariants Y of the PN to discover. Then, the obtained t-invariants allow determining a first structure of a PN $N_1$. Afterwards, $N_1$ can be adjusted if some of its t-invariants J do not agree with those derived from $\lambda $(Y). This paper is a revised extension of the conference paper presented in [31] in which the method for inferring the t-invariants is presented.

All the algorithms derived from this method have polynomial-time complexity. They have been implemented and tested with artificial logs obtained from known WFN inspired from models reported in literature. Tests on artificial logs and the computational complexity are compared with those obtained by the process mining method named the alpha++ algorithm [32].

Paper organization

The paper is organised as follows. In Sect. 2, the basic notions on PN are recalled. Section 3 formulates the discovery problem. Section 4 introduces basic relations derived from the tasks sequences. Section 5 proposes a technique for determining the t-invariants from $\lambda $. In Sect. 6, the WFN discovery method is presented. Section 7 outlines implementation and presents the tests. Finally, section 8 discusses the main features and limitations of the proposed method in the scope of relevant related works.

2 Background

This section recalls the basic concepts and notation of ordinary PN and WFN used in this paper.

Definition 1

An ordinary Petri Net structure G is a bipartite digraph represented by the 3-tuple G = (P, T, F); where:

$P = \{ p_1, p_2,..., p_{|P|} \}$ and $T = \{t_1, t_2,..., t_{|T|}\}$ are finite sets of nodes named places and transitions, respectively;
$F \subseteq P \times T \cup T\times P$ is a relation representing the arcs between the nodes. For any node $x \in P\cup T, {}^{\varvec{\cdot }}x = \{x | (x, y) \in F \} $ and $ x^{\varvec{\cdot }} = \{ { y | (y, x) \in F } \}$.
The incidence matrix of G is $C = C^+ - C^- $, where $C^- = [c_{ij}^-]$; $c_{ij}^- = 1$ if $ (p_i, t_j) \in F$, and $c_{ij}=0$ otherwise; and $C^+ = [c_{ij}^+]$; $c_{ij}^+ = 1$ if $ (t_j, p_i) \in F$, and $c_{ij}=0$ otherwise; $C^-$ and $C^+$ are called the pre-incidence and post-incidence matrices, respectively. Thus, $C = [c_{ij}]$, where $c_{ij} \in \{-1, 0, 1\}$

Definition 2

A marking $ M: P \rightarrow \mathbb {N}^{\ge 0}$ determines the number of tokens within the places; where $\mathbb {N}^{\ge 0}$ is the set of non-negative integers. A marking M, usually denoted by a vector $\mathbb {(N)}^{|P|}$, describes the current state of the modelled system.

PN dynamics

A Petri Net system or Petri Net (PN) is the pair $N = (G, M_0)$, where G is a PN structure and $M_0$ is an initial marking.
In a PN system, a transition $t_j$ is enabled at marking $M_k$ if $\forall p_i \in P, M_k(p_i) \ge c_{ij}^-$.
An enabled transition $t_j$ can be fired reaching a new marking $M_{k+1}$. This behaviour is represented as $M_k \overset{t_j}{\longrightarrow }\ M_{k+1}$. The new marking can be computed as $M_{k+1} = M_k + Cu_k$ where $u_k$ is the firing vector; when $t_j$ is fired $u_k(j)=1$ whilst $u_k(i) =0, \forall i \ne j$. This equation is called the PN state equation.
The reachability set of a PN is the set of all possible reachable markings from $M_0$ firing only enabled transitions; this set is denoted by $R(G,M_0)$.

Definition 3

A PN system is 1-bounded or safe iff, for any $M_i \in R(G,M_0)$ and any $p \in P, M_i(p) \le 1$. A PN system is live iff, for every reachable marking $M_i \in R(G,M_0)$ and $t \in T$ there is a $M_k \in R(G,M_i)$ such that t is enabled in $M_k$.

Definition 4

A t-invariant $Y_i$ of a PN is a non-negative integer solution to the equation $CY_i=0$. The support of $Y_i$ (t-support) denoted as $<Y_i>$ is the set of transitions whose corresponding elements in $Y_i$ are positive. $Y_i$ is said to be minimal if its support is not included in the support of other t-invariants. A t-component $G(Y_i)$ is a subnet of PN induced by a $<Y_i>: G(Y_i)=(P_i,T_i,F_i)$, where $P_i={}{\varvec{\cdot }}<Y_i> \cup <Y_i>{\varvec{\cdot }}$, $T_i = <Y_i>$, $F_i = (P_i \times T_j \cup P_i \times T_j) \cap F$. In a t-invariant $Y_i$, if we have initial marking $(M_0)$ that enables a $t_i \in <Y_i>$, when $t_i$ is fired, then $M_0$ can be reached again by firing only transitions in $<Y_i>$.

Definition 5

A WorkFlow net (WFN) N is a subclass of PN owning the following properties [14]: (1) it has two special places: i and o. Place i is a source place: ${}^{\varvec{\cdot }}i = \emptyset $, and place o is a sink place: $o^{\varvec{\cdot }} = \emptyset $. (2) If a transition $t_e$ is added to PN connecting place o to i, then the resulting PN called extended WFN is strongly connected.

Definition 6

A WFN N is said to be sound iff $(N, M_0)$ is safe and for any marking $M_i \in R(N, M_0)$, $o \in M_i \rightarrow M_i = [o]$ and $[o] \in R(N, M_i)$ and $(N, M_0)$ contains no dead transitions. The extended WFN of a sound WFN is live and 1-bounded.

3 The discovery problem

In this section, the problem of WFN discovery in the context of workflow management systems is formulated and then, the proposed method is outlined.

3.1 Problem statement

Definition 7

Let $T =\{t_1, t_2,\ldots , t_n\}$ be a finite set of tasks; a workflow log $\lambda $ is a multiset of tasks traces $\sigma _i \in T^{*}$, $|\sigma _i| < \infty $. Given a workflow log $\lambda = \{\sigma _1, \sigma _2,\ldots \sigma _n\}$, the PN discovery problem consists of building a sound WFN using only transitions in T, which reproduces the observed log. The number of places is unknown.

Example 1

Consider the log $\lambda =\{ \sigma _1, \sigma _2, \sigma _3, \sigma _4, \sigma _5, \sigma _6, \sigma _7 \}$ composed by the following tasks traces as the result of the execution of some process. $\sigma _1 = t_1 t_6 t_3 t_4 t_7$; $\sigma _2 = t_1 t_3 t_6 t_4 t_5 t_3 t_4 t_7$; $\sigma _3 = t_2 t_3 t_4 t_5 t_3 t_4 t_8$; $\sigma _4 = t_2 t_3 t_4 t_8$; $\sigma _5 = t_1 t_3 t_4 t_5 t_6 t_3 t_4 t_7$; $\sigma _6 = t_1 t_3 t_4 t_5 t_3 t_4 t_6 t_7$; $\sigma _7 = t_1 t_3 t_4 t_9$. A suitable discovery technique should be able to build a model such as depicted in Fig. 1 from the previous traces.

3.2 Assumptions

It is assumed that the event log is complete and generated by a process that behaves as an unknown sound WFN, which has no duplicate task labels or silent transitions. The soundness requirement implies that the process behaviour captured in the event log corresponds to a “well-behaved” process, which does not exhibit deadlocks or anomalies such as buffer overflows.

3.3 Outline of the method

The proposed discovery method obtains a model of an unknown or ill-known process; the model reproduces the observed behaviour and exhibits the causality and concurrency relationships between the tasks.

The method builds a 1-bounded PN (a WFN including a transition $t_e$) from which all the task traces $\sigma _i$ in the log $\lambda $ can be fired. It focuses on the computation of the causal and concurrent relations between the tasks. This is accomplished by computing the t-invariants Y from $\lambda $, which must exist in a strongly connected PN exhibiting iterative behaviour; also, t-invariants are used to find causal relations between tasks that do not appear consecutively (called implicit causal relations).

In the first stage, the method determines from $\lambda $ several binary relations between transitions; based on these relations, the t-invariants are discovered. Then, the causal and concurrent relations are determined, and together with the computed t-invariants, a first structure N of a WFN is built. Finally, the t-invariants are used again for adjusting the language of N by determining implicit causality between tasks.

4 Basic concepts and relations

First, several relations obtained directly from $\lambda $ are introduced. Some definitions have been taken and adapted from [29, 31].

4.1 Structuring the observed behaviour

Definition 8

(Event precedence relation) The precedence relationship between transitions that are observed consecutively is stated by the relation $R_{<\subseteq } T \times T$, which is defined as $R_{<} = \{ (t_a, t_b)| \exists \sigma _i \in \lambda $. $\sigma _i(j) = t_a$ and $\sigma _i(j+1) = t_b $; $1\le j < |\sigma _i|-1\}$; $\sigma _i(j)$ denotes the symbol in position j in the trace $\sigma _i$. Thus, $t_a R_{<} t_b$ (denoted also as $t_a< t_b$) expresses that $t_a$ has been observed before $t_b$ in at least one trace $\sigma _i$. When $t_a$ is related by $R_<$ to more than one task, this will be denoted as $t_{a} < t_1,t_2,\ldots ,t_n$. The relationship between transitions that never occur consecutively in the traces of $\lambda $ is given by $T\times T \backslash R_<$; a pair in this relation is denoted as $t_a > < t_b$.

4.1.1 Causal and concurrency relationships

The aim of existing PN discovery methods is to determine from observed precedence between transitions, actual causal or concurrency relationships between transitions, which will be useful to build a PN structure. Below, notions and properties for determining causality and concurrency relationships are stated.

Definition 9

(Two-length cycles) Two transitions $t_a, t_b$ are in a two-length cycle relation (Tc) if the tasks traces in the log $\lambda $ contain the sub-sequences $t_a t_b t_a$ or $t_b t_a t_b$. Tc is the set of transition pairs $(t_a, t_b)$ fulfilling this condition.

It is clear that simple substructures of PN can be straightforward determined from Tc.

Definition 10

(Causal and concurrent relations) Every pair of consecutive transitions $(t_a, t_b)\in R_<$ may be classified into one of the following relationships:

Causal relationship, denoted as $[t_a, t_b]$, expresses that the occurrence of $t_a$ enables $t_b$; in a PN structure this implies that there must be at least one place from $t_a$ to $t_b$. The set of transition pairs in a causal relation in $\lambda $, named CausalR, is defined as follows: $CausalR = \{(t_a, t_b) | ( t_a < t_b \wedge \lnot (t_a || t_b)) \vee (t_a, t_b) \in Tc)$.
Concurrent relationship, denoted as $t_a || t_b$. It means that when both $t_a$ and $t_b$ are simultaneously enabled, if $t_a$ fires first, $t_b$ is not disabled and vice versa; in a PN structure, $t_a || t_b$ implies that there are no places connecting $t_a$ to $t_b$ and viceversa. $t_a || t_b$ is determined if $(t_a, t_b), (t_b, t_a) \in R_<$, i.e., $t_a, t_b$ have been observed consecutively in the tasks traces in the log $\lambda $ in both orders, and $t_a, t_b$ do not form a Tc. Then, the set of concurrent transition pairs derived from $\lambda $ is $ConcR= \{ (t_a, t_b) | t_a< t_b \wedge t_b < t_a \wedge (t_a, t_b \notin Tc) \}$

Example 2

From the tasks traces of Example 1, the following relations among tasks were found $(t_1< t_3, t_6), (t_2< t_3), (t_3< t_4, t_6), (t_4< t_5, t_6, t_7, t_8, t_9 ), (t_5< t_3, t_6), (t_6 < t_7, t_3, t_4, t_5), Tc =\emptyset , ConcR =\{(t_3, t_6), (t_4, t_6), (t_5, t_6)\}$, and $CausalR = \{(t_1, t_3), (t_1, t_6), (t_2, t_3), (t_3, t_4), (t_4, t_5), (t_4, t_8), (t_4, t_7), (t_4, t_9), (t_5, t_3), (t_6, t_7)\}.$

5 Discovering t-invariants

During the execution of workflow processes, the tasks occur sequentially as cases; this is captured as task traces. If for all cases, their tasks appear once, there is no iterative behaviour captured in the traces; then the alphabet of every trace is the support of a t-invariant. However, processes often include repetitive subprocesses, such as those modelled by the WFN in Fig. 1; for such processes, extracting the minimal support of t-invariants is not trivial.

This section describes a novel algorithm to derive the minimal supports of t-invariants from an event log that includes traces involving repetitive behaviour. We will refer in the presentation to the t-invariants of the extended WFN.

The t-invariants computed form $\lambda $ are those that must fulfil the structure of the WFN to be built. Thus, the method presented herein determines the t-invariants of the unknown WFN that generates the log. Several notions used for defining the t-invariants computation technique are introduced below.

A trace $\sigma $ that contains transitions $t_j$ such that $\#(t_j, \sigma ) > 1$, where $\#(t_j, \sigma )$ is the number of occurrences of $t_j$ in $\sigma $, includes sub-sequences representing a repetitive behaviour.

Definition 11

(Cyclic sub-sequences) We will call cyc a sub-sequence of $\sigma $ starting with the task $t_i$ until before the next occurrence of $t_j$ in $\sigma $. If $\forall t_j \in cyc, \#(t_j, \sigma ) = 1$, then it is called elementary cyc, which is denoted as $cyc_e$. A cyc may contain other cycs.

Example 3

Several traces of Example 1 have a cyc within; it is the case of $\sigma _2 = t_1 t_3 t_6 t_4 t_5 t_3 t_4 t_7$, where $cyc = t_3t_6t_4t_5$; furthermore it is elementary $(cyc_e)$ because $\forall t_j \in cyc, \#(t_j, \sigma _2) = 1$; instead, $\sigma _1$ and $\sigma _4$ do not include a cyc.

Proposition 1

The tasks in a $\sigma \in \lambda $ form the support of a t-invariant of the extended WFN. The t-invariant is minimal iff $\forall t_j \in \sigma _x, \#(t_j, \sigma _x) = 1$ and $\not \exists \sigma _y$. $ \sigma _x \subseteq \sigma _y$

Proof

(Direct) As stated in Definition 5 (ii), the extended WFN is strongly connected. Thus, the transitions in $\sigma _i$, whose first and last tasks of $\sigma _i$ belong to $i{}{\varvec{\cdot }}$ and ${\varvec{\cdot }}o$, respectively, together with $t_e$ can be fired repeatedly. When every $t_j \in \sigma _i, \#(t_j, \sigma _i) =1$, then $\sigma _i$ does not have cycs, therefore the transitions in $\sigma _i$ are the support of a minimal t-invariant.

In Example 1, is easy to see that the tasks in $\sigma _1$ and $\sigma _4$ (together with $t_e$ of the extended WFN) are the supports of minimal t-invariants. $\square $

Proposition 2

A trace $\sigma _i \in \lambda $ includes cycs if it contains tasks belonging to two or more t-invariants.

Proof

(Contraposition) Suppose that all the tasks in $\sigma _i$ belong to a single t-invariant. Then, all the tasks $t_j$ in $\sigma _i$ occur once, i.e. $\#(t_j, \sigma _x) = 1$; thus, there is no iterative behaviour, that is $\sigma _i$, does not share tasks with nested cycles (cycs). $\square $

The algorithm prunes the interleaving tasks in traces to separate them by supports of minimal t-invariants. The procedure for determining the t-invariants processes recursively every trace $\sigma _i$ from the most external cyc in $\sigma _i$ to shorter nested cycs.

Definition 12

(Causality graph) The causality graph of an elementary cyc, $G_r$ describes the relations between tasks in a $cyc_e$. $G_r(cyc_e)=(V, E)$; where $V=\{t_k | t_k \in cyc_e\}$ and $E=\{(t_k,t_l) \in V \times V | (t_k,t_l) \in CausalR \}$.

The $G_r$ that can be formed with the cycs of Example 3 and the CausalR set of Example 2 is shown in Fig. 2.

Definition 13

(Strongly connected subgraphs) The function $Scc(G_r)$ returns a set of strongly connected components $\{ G_{sc}^1(V_1, E_1), G_{sc}^2(V_2, E_2),...,G_{sc}^n(V_n, E_n) \}$ in Gr.

Proposition 3

Let $ G_{sc}^i(V_i, E_i)$ be a strongly connected component in a $G_r$; then, the transitions in $V_i$ such that $|V_i|> 1$, form the support of a minimal t-invariant of the WFN to discover.

Proof

(Contraposition) Suppose that the transitions in $V_i$ are not the support of a t-invariant; then, there exist at least a $t_k \notin V_i$ that must occur to perform the repetitive firing of transitions in $G_r$. Thus, there are no cycles in $G_r$ and then, it is not strongly connected. $\square $

Next, several simple operators allowing handling task traces are outlined.

Sym. The operator $sym(\sigma )$ returns the set of different transitions (the alphabet) used in a sequence $\sigma $.
Pos. The operator $pos(t, \sigma )$ returns the set of positions where the transition t appears in a trace $\sigma $.
Clear. The operator clear(s, A), where $s \in T^*$ is a sequence and $A\in 2^T$ is a set of tasks, returns a sequence such that the occurrences of $t_i \in A$ in s are deleted; if $sym(s) \cup A = \emptyset $ then, $clear(s, A) = s$; if $sym(s) = A$ then, $clear(s, A) = \emptyset $
Replace. The operator replace(r, s, t), where r, s, and t are sequences $(|r| \ge |s| \ge |t| \ge 0)$, returns a sequence such that the first occurrence of s in r is replaced by t; replace(r, s, t) returns r if s is not a sub-sequence of r, and returns t if $r = s$; when $t = \epsilon $, the first occurrence of substring s is deleted form r.

Example 4

Consider $\sigma _2 = t_1, t_3, t_6, t_4, t_5, t_3, t_4, t_7, cyc_e = t_3, t_6, t_4, t_5$, and $\tau ={t_3, t_4, t_5}$. The result of applying the above operators is $sym(\tau ) = {t_3, t_4, t_5}; pos(t_3, \sigma _2)={2, 6}; clear(cyc_e, \tau ) = t_6; replace(\sigma _2, cyc_e, t_6) = t_1, t_6, t_3, t_4, t_7.$

Below, a procedure for extracting elementary cycles of traces is presented (Algorithm 1). It explores nested cycs and then, it returns one elementary $cyc (cyc_e) $ if there exists; otherwise, it returns the empty set.

In line 5 of Algorithm 1, every task $t_x$ that appears more than once in $\sigma $ is analysed. The sub-sequence between the first and the second occurrence analysed to verify if it is an elementary cycle; if not, the sub-sequence is analysed recursively to extract the inner cycle.

Algorithm 2 shows the procedure for obtaining the minimal t-invariant supports is presented. Consider that each trace in the event log ends with the task $t_e$.

Each trace in $\lambda $ is analysed (Line 2) of Algorithm 2; if a trace does not have repeated tasks, then its symbols are added as a t-invariant support; otherwise, the elementary cycles are extracted from $\sigma _i$ to obtain the corresponding graphs; consequently, the supports of nested t-invariants are found.

Property 1. Algorithm 2 determines all the t-invariant supports of the extended WFN to build from $\lambda $

Proof

It is easy to observe that in the repeat loop, the procedure extracts the evident cycles including $t_e$, and the nested iterations of traces.$\square $

The supports of t-invariants obtained by the application of the above algorithm to the task traces in Example 1 are $<Y_1> = {t_1, t_3, t_4, t_6, t_7, t_e},<Y_2> = {t_2, t_3, t_4, t_8, t_e}, <Y_3> = {t_3, t_4, t_5}$, and $<Y_4> = {t_1, t_3, t_4, t_9, t_e}$. Notice that in $<Y_1>, <Y_2>$, and $<Y_4>$, the transition $t_e$ of the extended WFN is included, since such invariants involve transitions in $i{}{\varvec{\cdot }}$ and ${\varvec{\cdot }}o$, whilst $t_e$ is not included in $<Y_3>$ because it is the support of a nested t-invariant.

6 Building the PN model

Causal relations $[t_i, t_j]$ imply the existence of a place between the related transitions. For source and sink places (i, o), causal relations are denoted as$ [-, t_j]$ and $ [t_i, -]$, respectively. Using this basic structure, named dependency, together with the computed t-invariants, a technique for building a PN is now presented.

Definition 14

(First and last tasks) $T_I = \bigcup _{\sigma _k \in \lambda } first(\sigma _k)$, and $T_O = \bigcup _{\sigma _k \in \lambda } last(\sigma _k)$, where $first(\sigma _k)$ and $last(\sigma _k)$ provide the first and last tasks in $\sigma _k $, respectively.

6.1 Composing substructures of dependencies

The substructures corresponding to causal dependencies must be composed by merging all the transitions that have the same label $t_i$ into a single one. The merging of transitions may lead to merge also the places in the involved dependencies; the merging strategy is simple; it is performed using two construction operators [31].

Operator 1. The composition of two dependencies in the form $[t_i, t_j]$ and $[t_j, t_k]$ yields a sequential substructure including two places, allowing firing the sequence $t_i t_j t_k$; this is illustrated in Fig. 3a.

Operator 2. The composition of two dependencies where the first transitions in two dependencies are the same $([t_i, t_j]$ and $[t_i, t_k]$) yields two possible substructures:

(a) The places of each dependency are merged into a single one iff each of the transitions $t_j$ and $t_k$ belong to different t-invariants. This substructure is called $Or-split$; it is denoted as $[t_i, t_j+t_k]$.
(b) The places of the dependencies are not merged iff both transitions $t_j$ and $t_k$ belong to a same t-invariant. This substructure is called And-split; it is denoted as $[t_i, t_j || t_k]$.

Similarly, for dependencies having the same second transition $([t_i, t_k]$ and $[t_j, t_k]$), the substructure yielded will be either $[t_i+t_j, t_k]$ (Or-join) or $[t_i || t_j, t_k]$ (And-join). In both cases the observations $(t_i, t_k), (t_j, t_k)\in R_<$, which have induced the dependencies, are preserved. This merging operator is illustrated in Fig. 3b. In general, a set of dependencies in the form $([t_i, t_j], [t_i, t_k],... [t_i, t_r])$ may produce either $[t_i, t_j+t_k+...+t_r]$ or $[t_i, t_j || t_k ||... || t_r]$ according to the relations between transitions i.e., whether $t_j, t_k,.., t_r$ belong to different t-invariants or $t_j, t_k,.., t_r$ belong to the same t-invariant, respectively.

The merging of transitions can be applied iteratively to composed dependencies that exactly match with one expression of transitions of type $t_i + t_j$ or $t_i || t_j$. For example, the composition of dependencies $[t_i + t_j, t_k]$ and $[t_i+t_j, t_r]$ produces $[t_i+t_j, t_k+t_r]$ if both $t_k$ and $t_r$ do not belong to the same invariant.

All $t_j$ in dependencies in the form $[-, t_j]$ have the input place i. Similarly, for transitions in dependencies of the form $[t_i, -]$, they have the same output place o.

Property 2. The application of these merging operators Operator 1 and Operator 2 to the dependencies derived from the pairs in CausalR, and the knowledge of the t-invariant supports, leads to a net structure WFN $N_1$, which includes all the transitions.

Proof

Operator 1 forms paths of places and transitions, whilst Operator 2 determines when split and join substructures are created according to the computed t-invariants. $\square $

In Example 2, the application of merging operators to the relations in CausalR yields the set composed dependencies: $[t_1, t_6 || t_3], [t_2, t_3], [t_3, t_4], [t_4, t_5+t_7+t_8+t_9], [t_5, t_3], [t_6, t_7], [t_1+t_2+t_5, t_3]$. Afterwards, the obtained dependencies by applying Operator 1 and Operator 2 are ${{\textbf { {i}}}}$: $[-, t \in T_I] p_1: [t_1+t_2+t_5, t_3], p_1-p_2: [t_1, t_3 || t_6], p_3: [t_3, t_4], p_5: [t_4, t_5+t_7+t_8+t_9], p_4-p_5: [t_6 || t_4, t_7], o: [t \in T_O, -]$. The subsequent merging of transitions in dependencies substructures yields the WFN $N_1$ shown in Fig. 4.

6.2 Model adjustment

The discovered model $N_1$ replays all the traces in $\lambda $; besides, it could execute some additional traces (surplus language). Eventually, it is possible that $N_1$ could not replay some traces in $\lambda $. The WFN in Fig. 4 reproduces $\lambda $ of Example 1, but also other traces; in particular, the traces $t_2t_3t_4t_9$ and $t_1t_3t_4t_8$, which do not belong to $\lambda $, can be fired in $N_1$. This behaviour is because the computed model $N_1$ does not include PN elements (places and arcs) that ensure behaviours of dependencies not exhibited explicitly by the traces in $\lambda $, named implicit dependencies; therefore, $N_1$ must be adjusted.

6.2.1 Implicit dependencies

In a PN, the implicit dependencies represent the recall of the occurrence of a $t_i$, which is used as a precondition to enable a non-immediate subsequent transition $t_j$. In general, an implicit dependency $[t_i, t_j]$ represents a constraint in the flow of tokens in the PN by ensuring that $t_j$ can be fired only when ti has occurred before; thus, the absence of such an implicit dependency will allow the occurrence of more sequences in the net.

Definition 15

(Implicit dependency) In a 1-bounded PN, $[t_i, t_j]$ is called an implicit dependency, if albeit there exists a place between the transitions, the occurrence of ti does not produce a marking that immediately enables $t_j$; i.e., it is necessary for the occurrence of at least one transition $t_k$ before $t_j$.

After building the first model, implicit dependencies may be deduced to be included in $N_1$ in two ways: Type 1: adding a new place between two transitions, or Type 2: using a place already included in $N_1$. These situations are illustrated in Fig. 5, where the dependency $[t_x, t_w]$ is represented by a new place pi in Fig. 5a, and $[t_x, t_y]$ is represented using a previously computed place $p_j$ in Fig. 5b. Similarly, for the dependency $[t_x, t_z]$, the place of $[t_y, t_z]$ is used (Fig. 5c).

The following notions and conditions are useful to find both kinds of implicit dependencies in the traces of $\lambda $, which will be added to $N_1$.

Definition 16

(Implicit precedence) Let $t_i, t_j \in T$ be tasks. $t_i$ has an implicit precedence over $t_j$, denoted as $t_i \ll t_j$, if $t_i> < t_j$ and for every trace $\sigma _k \in \lambda $, $t_i$ always appears before $t_j$

The Implicit precedence between two transitions suggests an implicit dependency, but it is necessary to analyse other underlying properties to ensure the existence of such a dependency.

Definition 17

(Support-dependent tasks) The set of tasks support-dependent of a $Y_i \in Y(\lambda )$, denoted as $Sd(Y_i)$, contains tasks $t_x \in T$ which appear only in the support of $Y_i$. $Sd(Y_i) = \{ t_x\in<Y_i> | \not \exists<Y_i>, t_x \in <Y_i> \}$

For the t-invariants supports of the Example 1 $(<Y_1> = {t_1, t_3, t_4, t_6, t_7, t_e},<Y_2> = {t_2, t_3, t_4, t_8, t_e},<Y_3> = {t_3, t_4, t_5}, <Y_4> = {t_1, t_3, t_4, t_9, t_e})$, the support-dependent sets are $Sd(Y_1) = {t_7, t_6}, Sd(Y_2) = {t_2, t_8}, Sd(Y_3) = {t_5}, Sd(Y_4) = {t_9}.$

6.2.2 Implicit dependencies of Type 1

Now we can state the conditions in which a place must be added to relate two transitions that are not observed consecutively.

Proposition 4

Let $t_i$ and $t_j$ be transitions in $N_1$. If (i) $t_i$ and $t_j$ are related by an implicit precedence $(t_i \ll t_j)$, and (ii) there exists a support-dependent set $Sd(Y_k)$ that contains both transitions, then $t_i$ and $t_j$ are related by an implicit dependency $[t_i, t_j]$, which must be added to the structure of $N_1$. The set of all the implicit dependencies of $N_1$ is $IDep = \{ [t_i, t_j] | (t_i<<t_j) \wedge \exists Y_k$ where $\{t_i, t_j \} \not \subset Sd(Y_k) \}$

Proof

(Contraposition) Suppose that the dependency $[t_i, t_j]$ must not be added to the structure of $N_1$; this is because,

(i) A place $p_i$ of the dependency $[t_i, t_j]$ already exists as the result of applying Operator 1 or Operator 2; therefore, such transitions are not related by an implicit precedence, i.e. $\lnot (t_i<< t_j)$, or
(ii) $t_i$ does not need to occur always before $t_j$; then, both transitions may fire independently since they belong to different t-invariants; thus, there is not a support-dependent set that contains both transitions.$\square $

Corollary 1

Let $[t_i, t_j]$ be an implicit dependency where $t_i, t_j \in <Y_r>$ and $Y_r \in Y(\lambda )$. If $CY_r=0$, then, a new place $p_k \notin P_2$ must be added to $N_2$ to ensure $[t_i, t_j]$.

Proof

(contraposition) Suppose that $p_k \in P2$, since it is linked to either $t_i$, or $t_j$, then, $CY_r \ne 0$ (it is the case of dependencies of Type 2).$\square $

Conditions of Proposition 4 determine the existence of places that do not represent causal relationships. This is valuable because implicit dependencies are not exhibited in $\lambda $. The absence of such places would cause an exceeding language in the PN. In Example 1, the transitions $t_2, t_8$ meet the conditions of the proposition because $t_2\ll t_8$ and $t_2, t_8$ $Sd(t_2)$, therefore $[t_2, t_8]$ must be added to in $N_1$ yielding the model $N_2$ shown in Fig. 6.

6.2.3 Implicit dependencies of Type 2

Now, the supports of minimal t-invariants of $N_2$ in Fig. 6 are $J(N_2):<J_1> = {t_1, t_3, t_4, t_6, t_7, t_e},<J_2> = {t_2, t_3, t_4, t_8, t_e}, <J_3> = {t_3, t_4, t_5}$; these invariants differ from $Y(\lambda )$ computed in Subsection 6.1. The discrepancy between $Y(\lambda )$ and $J(N_2)$ is because the computed PN does not include the arcs (implicit dependencies type 2, Fig. 5b, c) which ensure the behaviours due to implicit dependencies not exhibited in $\lambda $.

$N_2$ must be adjusted by determining the suitable implicit dependencies that transform $N_2$ into $N_3$, whose t-invariants match with $Y(\lambda )$. The mismatching is detected when $\exists Y_i \in Y(\lambda )$ such that $CY_r \ne 0$, where C is the incidence matrix of $N_2$. To amend $N_2$ the next strategy must be applied.

Consider a $Y_r\in Y(\lambda )$. Let $p_k$ be the place that corresponds to the row in which $CY_r \ne 0$, more precisely, $C(p_k)Y_r \ne 0$. To determine the dependency $[t_i, t_j]$, another transition of $N_2$ must be linked through $p_k$ to one of the transitions in ${}{\varvec{\cdot }}p_k$(Fig. 5c) or $pk{}{\varvec{\cdot }}$ (Fig. 5b) following the construction procedure derived from the proof of the proposition stated below.

Proposition 5

Let $[t_i, t_j]$ be an implicit dependency where $t_i, t_j \in <Y_r>$ and $Y_r \in Y(\lambda )$. $[t_i, t_j]$ must be added to $N_2$ through a place $p_k$ of $N_2$ if $C(p_k)Y_r \ne 0$, to ensure $C(p_k)Y_r=0$ ($[t_i, t_j]$ is of Type 2).

Proof

(Direct) To ensure $C(p_k)Y_r=0$, two cases are considered:

(i) $C(p_k)Y_r=1$ This requires that $C(p_k, t_j) = -1$ to get $C(p_k)Y_r=0$; thus $t_i \in {}{\varvec{\cdot }}p_k$ (the arc $(p_k, t_j)$ must be added to get $[t_i, t_j]$).
(ii) $C(p_k)Y_r=-1$ This requires that $C(p_k, t_j) = 1$ to get $C(p_k)Y_r=0$; thus $t_i \in p_k{\varvec{\cdot }}$ (the arc $(t_j, p_k)$ must be added to get $[t_i, t_j]$).

Since $t_i\in Sd(Y_r)$, the added arc $(p_k, t_i)$ only affects $Y_r$. Similarly, since $t_j\in Sd(Y_r)$ the new arc $(t_j, p_k)$ does not alter the other t-invariants.$\square $

Proposition 6

If all the implicit dependencies added to $N_2$ through places $p_k$, such that $\forall p_k\, C(pk)Y_r \ne 0, \forall Y_r \in Y | CY_r \ne 0$, then the amended net $N_3$ fulfils $CY = 0.$

Proof

(Direct) When all the amendments to $N_2$ are performed through the procedure derived from the proof of Proposition 5, the amended net $N_3$ fulfils $CY(\lambda ) = 0$ and then $Y(\lambda )=J(N_3)$. $\square $

Algorithm 3 summarises the procedure derived from the previous result to obtain the implicit dependencies.

Consider $N_2$ in Fig. 6, obtained from the event log in Example 5. First, it is computed $J(N_1):<J_1> = \{t_1, t_3, t_4, t_6, t_7, t_e\},<J_2> = \{t_2, t_3, t_4, t_8, t_e\}, <J_3> = \{t_3, t_4, t_5\}$. There exists a mismatching between both sets since $Y(\lambda ) \not \subset J(N_1)$. It can be noticed that $Y4 \notin J(N_1)$, whilst $Y_1 = J_1, Y_2 = J_2$ and $Y_3=J_3$. In the analysis of $Y_4$, $p_k = p_2$ because it fulfils the condition $CN_1(p_2)Y_i \ne 0$, as shown in the equation 1.

$$\begin{aligned} \begin{bmatrix} -1&{}-1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1\\ 1&{}1&{}-1&{}0&{}1&{}0&{}0&{}0&{}0&{}0\\ 1&{}0&{}0&{}0&{}0&{}-1&{}0&{}0&{}0&{}0\\ 0&{}0&{}1&{}-1&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}1&{}-1&{}0&{}0&{}0\\ 0&{}0&{}0&{}1&{}-1&{}0&{}-1&{}-1&{}-1&{}0\\ 0&{}1&{}0&{}0&{}0&{}0&{}0&{}-1&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}1&{}1&{}1&{}-1 \end{bmatrix} \cdot \begin{bmatrix} 1\\ 0\\ 1\\ 1\\ 0\\ 0\\ 0\\ 0\\ 1\\ 1 \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 1\\ 0\\ 0\\ 0\\ 0\\ 0) \end{bmatrix}\end{aligned}$$

(1)

The Support dependent sets (Sd) computed in $Y_3 (\forall Y_i \in Y | Y_i \in J(N_1))$ are $Sd(t_1) = Sd(t_9) = \{t_1, t_9\}$. The transition $t_1 \in p_2{\varvec{\cdot }}$ is selected to find the implicit dependency $[t_1, t_j]$ because $t_1 \in <Y_4>$. The transition that fulfils the conditions $t_1 \ll t_j$ and $t_j = t_9$; therefore, the implicit dependency $[t_1, t_9]$ is added to $N_1$ by the arc $(p_2, t_9)$. Finally, the amended PN $N_3$, which replays $\lambda $ is shown in Fig. 7.

Remark 1

The procedure of Algorithm 3 does not need to compute the t-invariants of $N_2$. It only operates on the computed invariants $Y(\lambda )$ that do not agree with the computed net $N_2$ $Y_r \in Y(\lambda )$ such that $Y_rC\ne 0$

Property 3. Given an event log of task traces $\lambda \in T^*$, a safe PN model N ($N_3$) that reproduces $\lambda $ can be obtained by computing the invariants from $\lambda $, applying Operators 1 and 2, and performing the amendments of Algorithm 3.

Proof

The causality between transitions, stated by the pairs in $CausalR \cup R_{<}'$ represents the precedence relationship between consecutive transitions in the traces of $\lambda $, which are not in ConcR. Then, the t-invariants can be determined from $\lambda $ (Property 1). Besides, the substructure associated with a dependency $[t_i, t_j]$ ensures the consecutive occurrence of these transitions; then, based on the t-invariants, the application of Operator 1 and Operator 2 to all the dependencies $[t_i, t_j]$ lead to a PN structure that ensure the flow determined by the dependencies (Property 2). Finally, adjustments to $N_1$ provided by Algorithm 3 allow matching the t-invariants determined from $\lambda $ with those of the discovered model.$\square $

6.3 Complexity of the method

The method is based on the notions introduced in Section 4, whose determination procedures have a computational complexity of $O(|\lambda |)$. Thus, the complexity of computing the t-invariants $Y(|\lambda |)$ is $O((|V|+|E|)*||\lambda ||)$, which is the time for determining the strongly connected components in a graph, with V nodes and E edges, multiplied by the size of the log. Notice that this is the worst case in which each trace has a different e-cycle. Finally, the complexity of the procedure to compute each implicit dependency is $O(|P|*|T|)$; it is related to the matrix vector product. Thus, the complexity of the algorithm is polynomial on $|\lambda |$.

7 Implementation issues

7.1 Testing scheme

Algorithms and auxiliary procedures derived from the proposed discovery method have been implemented; the software has been tested on numerous WFNs of diverse structural complexity. The tests were performed on artificial logs following the scheme shown in Fig. 8. First, a WFN N including a transition $t_e$ is proposed; then, with the help of the PN editor/simulator PIPE [33], a workflow log $\lambda $ is produced. Then the discovery method module processes $\lambda $ yielding a model coded in XML, which is displayed using PIPE again. The obtained model N’ is then compared to N. This scheme allows testing the method in a controller manner by rediscovering WFN with diverse structures, which include cycles nested into t-components, concurrency, and implicit dependencies.

This scheme allows testing the method in a controller manner by rediscovering WFN with diverse structures, which include cycles nested into t-components, concurrency, and implicit dependencies.

7.2 Illustrative experiments

Artificial logs were produced using PN models that include diverse substructures, which exhibit complex repetitive behaviour: overlapped cycles (Fig. 9) and cycles in parallel threads (Fig. 10). Logs used in these experiments, named $\lambda _1$ and $\lambda _2$, are given below.

$$\begin{aligned} \lambda _1= & {} \{t_0 t_1 t_2 t_4 t_5 t_6 t_2 t_3 t_1 t_2 t_4 t_5 t_7 t_8 t_2 t_4 t_5 t_7\}, \{t_0 t_1 t_2 t_3 t_1 t_2 t_4 t_5 t_6 t_2 t_4 t_5 t_7 t_8 t_2 t_4 t_5 t_7 t_9 \}\\ \lambda _2= & {} \{ t_0 t_1 t_4 t_5 t_6 t_2 t_3 t_1 t_2 t_4 t_5 t_7\}, \{t_0 t_1 t_4 t_2 t_5 t_3 t_6 t_4 t_1 t_5 t_2 t_6 t_3 t_4 t_5 t_1 t_2 t_3 t_1 t_6 t_2\},\\{} & {} \{t_4 t_3 t_1 t_2 t_3 t_5 t_6 t_1 t_4 t_5 t_2 t_7\}, \{ t_0 t_4 t_1 t_5 t_2 t_7 \}, \{t_0 t_1 t_4 t_2 t_5 t_7\} \end{aligned}$$

Other experiments have been performed using several WFNs reported in the literature. Figure 11 shows five models obtained by applying the method to logs taken from [34], which present implicit dependencies. The dashed places and their respective input/output arcs correspond to implicit dependencies of type 1, whereas the dashed arcs joining existing places correspond to implicit dependencies of type 2.

A complete log for Fig. 11a is ACD, BCFE, BFCE; from this log the implicit dependency of type 1 [A,D] is detected. For the WFN shown in Fig. 11b, the complete log accordant is ACD, BCE, AFCE, ACFE; in this case, implicit dependencies of type 2 [A, D] and [B, E] are found. The corresponding log for the WFN in Fig. 11c is ACFBGE, AFCBGE, AFBCGE, AFBGCE, AFDGE; from this log, the method determined that [A, D], [D, E] are implicit dependencies of type 2. The processing of the complete log ACDEGH, ACDGEH, ACGDEH, BCDFH yields the workflow net in Fig. 11d; for this log, the method first found the implicit dependencies [A, E], [A, G], [B, F]; nevertheless, the net still does not match the t-invariants of the log; hence, a type 2 implicit dependency [C, F] is devised. Finally, the corresponding log of the WFN in Fig. 11e is FBG, ABC, FDBEG, FBDEG, FDEBG, ADEDEBG, ABDEC; in this case, implicit dependencies of type 1 [A, C] and [F, G] are found, and the arcs assuring implicit dependencies of type 2 are $(A, p_k), (F, p_k), (p_k, C)$, and $(p_k, G)$. Similar to the procedure for obtaining the WFN of Fig. 11b, $p_k$ is the result of merging two places.

8 Discussion

8.1 Main features

The proposed discovery method includes alternative strategies to that found in the literature, namely the search of invariants and the discovery of concurrent cyclic behaviours. The discovered WF-net is a qualitative model that allows reproducing the logs obtained from the execution of WF processes that behave as sound WF-nets, as specified in Sect. 3.1. This feature, called fitness in [35], is assured to be valued as 1.0 since all the precedence declared by pairs in $R_<$ (issued from $\lambda $) are represented in the discovered model as stated in Property 3. Furthermore, the procedures that implement the method are based on polynomial time algorithms on the size of the log, which is a welcomed feature for dealing with large logs. Comparing the proposal with an outstanding published method, the alpha++ algorithm [32], our approach can discover the reported models; besides, the computational complexity is lower.

8.2 Limitations and challenges

The first limitation we can point out arises from the assumptions stated in the problem formulation, which require that the obtained model uses only the transitions in T once. This constraint, issued from the standard problem formulation, can be relaxed when tasks symbols may be associated to more than one transition or when non-observable (silent) transitions are allowed. Another assumption held in this paper is that in the observed behaviour, the traces are recorded correctly; in particular, any task is missed in a trace. Although the method can build WF-nets that can reproduce the input logs, the discovered model could represent exceeding behaviour due to cycles in the synthesised PN. For example, the trace abcbcbd includes a repetition of the sub-trace cb; then, the model will represent $ab(cb)^+d$. The language overrepresentation (computed as a measure of precision in [35]) is due in part to the above feature; this analysis is out of the scope of the paper and currently is a research matter of the authors. The relationships between the actual, observed, and computed behaviours are depicted in Fig. 12.

During the tests of the method using artificial logs obtained through known WFN, we detected some particular sound WF-nets in which this method fails to rediscover all dependencies between tasks. Since the method is based on representing repetitive behaviour exhibited by the log through inferring the t-invariants, it cannot distinguish the supports of t-invariants in which one or several tasks need to occur a given number of times to reproduce the repetitive behaviour. In other words, when a t-component has a cycle in its execution, the algorithm may find more than one t-invariant; the outcome is a WF-net that can reproduce the observed log and other traces involving such a cycle, which are not in the log. Consider the two WFNs of Figure. 13; the net in Fig. 13a is executed to generate the complete workflow log ABCFGECDH, ABCECDFGH, ABFCGECDH, ABFCEGCDH, ABCEFCDGH, ABCFEGCDH; notice that this WF-net (more precisely, the extended WF-net) has only one t-invariant. During the application of the discovery method, two t-invariants supports $<Y_1>$ = A, B, C, F, D, G, H and $<Y_2>$ = C, E are computed; then the WF-net built is that shown in Fig. 13b, which in fact (the extended WF-net) has two t-invariants. The implicit dependencies of Type 1 [B, E] and [E, D] are missed and should be computed to rediscover the WFN used to generate the logs. In particular, for this example, a subsequent analysis must determine that the cycle of transitions C and E in $<Y_2>$ occurs once every time $<Y_1>$ is executed. The dependency between the executions of the t-invariants is still under research.

9 Conclusion

The discovery method proposed in this paper is based on determining the supports of t-invariants from the log $\lambda $; it allows building an initial model, which can be adjusted later, if needed, with the help of the computed t-invariants; the final model includes implicit causal relationships between transitions that have not been observed consecutively in the traces of $\lambda $. The discovered WFN replays all the traces in $\lambda $ from $M_0$ and may eventually accept exceeding iterative sub-sequences, which correspond to the behaviour inherent to PN with repetitive components. Based on polynomial-time algorithms, the method allows processing large event logs. The implemented software has been tested on artificial logs corresponding to WFNs with diverse structures; tests demonstrated the accuracy and efficiency of the method when complex PN structures are addressed. Further work regards the application of the method to event logs issued from actual processes. Current research addresses the problem of PN discovery from incomplete observed sequences and quality measures to assess the obtained model regarding the event log.

References

Gold, M.E.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)
Article MathSciNet Google Scholar
Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988). https://doi.org/10.1023/a:1022821128753
Article MathSciNet Google Scholar
Meda-Campana, M., Ramirez-Treviro, A., López-Mellado, E.: Asymptotic identification of discrete event systems. In: Proceedings of the 39th IEEE conference on, pp. 2266–2271. IEEE (2000)
Meda-Campana, M., López-Mellado, E.: Identification of concurrent discrete event systems using petri nets. In: Proceedings of the 17th IMACS world congress on computational and applied mathematics, pp. 11–15 (2005)
Giua, A., Seatzu, C.: Identification of free-labeled petri nets via integer programming. In: Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC’05. 44th IEEE conference on, pp. 7639–7644 (2005). IEEE
Cabasino, M.P., Giua, A., Seatzu, C.: Linear programming techniques for the identification of place/transition nets. In: Decision and control, 2008. CDC 2008. 47th IEEE conference on, pp. 514–520 (2008). IEEE
Dotoli, M., Pia Fanti, M., Mangini, A.M., Ukovich, W.: Identification of the unobservable behaviour of industrial automation systems by petri nets. Control. Eng. Pract. 19(9), 958–966 (2011)
Article Google Scholar
Klein, S., Litz, L., Lesage, J.-J.: Fault detection of discrete event systems using an identification approach. In: 16th IFAC world congress (2005)
Roth, M., Schneider, S., Lesage, J.-J., Litz, L.: Fault detection and isolation in manufacturing systems with an identified discrete event model. Int. J. Syst. Sci. 43(10), 1826–1841 (2012)
Article MathSciNet Google Scholar
Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.-J.: Input-output identification of controlled discrete manufacturing systems. Int. J. Syst. Sci. 45(3), 456–471 (2014)
Article MathSciNet Google Scholar
Estrada-Vargas, A.P., Lesage, J.-J., López-Mellado, E.: A stepwise method for identification of controlled discrete manufacturing systems. Int. J. Comput. Integr. Manuf. 28(2), 187–199 (2015). https://doi.org/10.1080/0951192X.2013.874591
Article Google Scholar
Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.-J.: A comparative analysis of recent identification approaches for discrete-event systems. Math. Prob. Eng. (2010). https://doi.org/10.1155/2010/453254
Article MathSciNet Google Scholar
Cabasino, M.P., Darondeau, P., Fanti, M.P., Seatzu, C.: Model identification and synthesis of discrete-event systems. In: Zhou, M., Li, H.X., Weijnen, M. (eds.) Contemporary issues in systems science and engineering. Wiley, London (2013)
Google Scholar
Aalst, W.M.: The application of petri nets to workflow management. J. Circuits Syst. Comput. 8(01), 21–66 (1998)
Article Google Scholar
Ou-Yang, C., Winarjo, H.: Petri-net integration–an approach to support multi-agent process mining. Expert Syst. Appl. 38(4), 4039–4051 (2011). https://doi.org/10.1016/j.eswa.2010.09.066
Article Google Scholar
Ma, J., Wang, K., Xu, L.: Modelling and analysis of workflow for lean supply chains. Enterp. Inf. Syst. 5(4), 423–447 (2011). https://doi.org/10.1080/17517575.2011.580007
Article Google Scholar
Cook, J.E., Wolf, A.L.: Automating process discovery through event-data analysis. In: 1995 17th international conference on software engineering, pp. 73–73 (1995). https://doi.org/10.1145/225014.225021
Cook, J.E., Du, Z., Liu, C., Wolf, A.L.: Discovering models of behavior for concurrent workflows. Comput. Ind. 53(3), 297–319 (2004)
Article Google Scholar
Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Workflow Logs. In: Schek, H.J., Saltor, F., Ramos, I., Schek, H.J., Saltor, F., Ramos, I., Alonso, G., Alonso, G. (eds.) EDBT Lecture Notes in Computer Science, vol. 1377, pp. 469–483. Springer, Berlin (1998). https://doi.org/10.1007/BFb0101003
Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. Knowledge and data engineering, ieee transactions on 16(9), 1128–1142 (2004)
Article Google Scholar
Wang, D., Ge, J., Hu, H., Luo, B.: A new process mining algorithm based on event type. In: Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth international conference on, pp. 1144–1151 (2011). IEEE
Wen, L., Wang, J., Sun, J.: Detecting implicit dependencies between tasks from event logs. In: Proceedings of the 8th Asia-Pacific Web conference on frontiers of WWW research and development. APWeb’06, pp. 591–603. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11610113_52
Wang, D., Ge, J., Hu, H., Luo, B., Huang, L.: Discovering process models from event multiset. Expert Syst. Appl. 39(15), 11970–11978 (2012)
Article Google Scholar
Aalst, W.M.P.: Process mining: discovery, conformance and enhancement of business Processes, 1st edn. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-19345-3
Augusto, A., Conforti, R., Dumas, M., Rosa, M.L., Maggi, F.M., Marrella, A., Mecella, M., Soo, A.: Automated discovery of process models from event logs: review and benchmark. IEEE Trans. Knowl. Data Eng. 31(4), 686–705 (2019). https://doi.org/10.1109/TKDE.2018.2841877
Article Google Scholar
Santos Garcia, C., Meincheim, A., Faria Junior, E.R., Dallagassa, M.R., Sato, D.M.V., Carvalho, D.R., Santos, E.A.P., Scalabrin, E.E.: Process mining techniques and applications—a systematic mapping study. Expert Syst. Appl. 133, 260–295 (2019). https://doi.org/10.1016/j.eswa.2019.05.003
Aalst, W.: Process mining. Data science in action. Springer, Berlin (2016)
Book Google Scholar
Aalst, J.C.: Process Mining Handbook. Lecture Notes in Business Information Processing, vol. 448, 1st edn. Springer, Berlin (2022). https://doi.org/10.1007/978-3-031-08848-3
Estrada-Vargas, A.P., López-Mellado, E., Lesage, J.J.: A black-box identification method for automated discrete-event systems. IEEE Trans. Autom. Sci. Eng. 99, 1–16 (2015). https://doi.org/10.1109/TASE.2015.2445332
Article Google Scholar
Tapia-Flores, T., López-Mellado, E., Estrada-Vargas, A.P., Lesage, J.J.: Petri net discovery of discrete event processes by computing t-invariants. In: Emerging technology and factory automation (ETFA), 2014 IEEE, pp. 1–8 (2014). https://doi.org/10.1109/ETFA.2014.7005080
Tapia-Flores, T., López-Mellado, E.: Inferring the repetitive behaviour from event logs for process mining discovery. In: Prasath, R., Gelbukh, A. (eds.) Min. Intell. Knowl. Explorat., pp. 164–173. Springer, Cham (2017)
Wen, L., Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Disc. 15(2), 145–180 (2007). https://doi.org/10.1007/s10618-007-0065-y
Article MathSciNet Google Scholar
Dingle, N.J., Knottenbelt, W.J., Suto, T.: Pipe2: a tool for the performance evaluation of generalised stochastic petri nets. SIGMETRICS Perform. Eval. Rev. 36(4), 34–39 (2009). https://doi.org/10.1145/1530873.1530881
Article Google Scholar
Leemans, S.J.J., Fahland, D., Aalst, W.M.P.: In: Colom, J.-M., Desel, J. (eds.) Discovering Block-Structured Process Models from Event Logs - A Constructive Approach, pp. 311–329. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38697-8_17
Buijs, J.C.A.M., Dongen, B.F., Aalst, W.M.P.: Quality dimensions in process discovery: the importance of fitness, precision, generalization and simplicity. Int. J. Cooper. Inf. Syst. 23(01), 1440001 (2014). https://doi.org/10.1142/S0218843014400012
Article Google Scholar

Download references

Acknowledgements

The first author is Tonatiuh Tapia-Flores; he has been sponsored by CONACYT under the Ph.D. Grant No. 263566.

Author information

Authors and Affiliations

CINVESTAV Unidad Guadalajara, Av. del Bosque 1145, 45019 , Zapopan, Jal., Mexico
Ernesto López-Mellado
Nextiva Mexico, Rubén Darío 425, 44680, Guadalajara, Jal., Mexico
Tonatiuh Tapia-Flores

Authors

Tonatiuh Tapia-Flores
View author publications
You can also search for this author in PubMed Google Scholar
Ernesto López-Mellado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ernesto López-Mellado.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest financial or non-financial with any person or organisation.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tapia-Flores, T., López-Mellado, E. Discovering workflow nets of concurrent iterative processes. Acta Informatica 61, 1–21 (2024). https://doi.org/10.1007/s00236-023-00445-5

Download citation

Received: 22 August 2022
Accepted: 10 August 2023
Published: 14 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00236-023-00445-5

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Discovering workflow nets of concurrent iterative processes

Abstract

Similar content being viewed by others

Inferring the Repetitive Behaviour from Event Logs for Process Mining Discovery

Foundations of Process Discovery

Discovering the “Glue” Connecting Activities

1 Introduction

2 Background

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

3 The discovery problem

3.1 Problem statement

Definition 7

Example 1

3.2 Assumptions

3.3 Outline of the method

4 Basic concepts and relations

4.1 Structuring the observed behaviour

Definition 8

4.1.1 Causal and concurrency relationships

Definition 9

Definition 10

Example 2

5 Discovering t-invariants

Definition 11

Example 3

Proposition 1

Proof

Proposition 2

Proof

Definition 12

Definition 13

Proposition 3

Proof

Example 4

Proof

6 Building the PN model

Definition 14

6.1 Composing substructures of dependencies

Proof

6.2 Model adjustment

6.2.1 Implicit dependencies

Definition 15

Definition 16

Definition 17

6.2.2 Implicit dependencies of Type 1

Proposition 4

Proof

Corollary 1

Proof

6.2.3 Implicit dependencies of Type 2

Proposition 5

Proof

Proposition 6

Proof

Remark 1

Proof

6.3 Complexity of the method

7 Implementation issues

7.1 Testing scheme

7.2 Illustrative experiments

8 Discussion

8.1 Main features

8.2 Limitations and challenges

9 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article