1 Introduction

The execution of business processes within a company generates traces of event data in its supporting information system. The goal of process mining [28] is to turn this data, recorded in event logs, into actionable knowledge. Three core branches form the basis of process mining: process discovery, conformance checking and process enhancement. In process discovery, this paper’s focus, the goal is to construct a process model based on an event log. In conformance checking the goal is to assess whether a given process model and event log conform with respect to each other in terms of described behaviour. In process enhancement the goal is to improve processes models, primarily, though not exhaustively, using the two aforementioned fields.

Several different process models exist that (largely) describe the behaviour in an event log. Hence, we need means to rank and compare these different process models. In process mining we typically judge the quality of process models based on four essential quality dimensions: replay-fitness, precision, generalization and simplicity [7, 28, 33]. Replay-fitness describes the fraction of behaviour in the event log that is also described by the model. Precision describes the fraction of behaviour described by the model that is also present in the event log. Generalization indicates a model’s ability to account for behaviour not part of the event log, e.g. in case of parallelism, it is often impossible to observe all possible behaviour in the event log. Simplicity refers to a model’s interpretability by a human analyst. A process discovery result ideally strikes an adequate balance between these four quality dimensions.

State-of-the-art process discovery algorithms guarantee that their discovered process models have both structural and behavioural properties [6, 16]. These guarantees have a positive impact on the aforementioned quality dimensions. As a consequence, the techniques are unable to find complex, non-local control flow patterns [29]. In [34] an integer linear programming (ILP) [24] based process discovery algorithm is proposed that is able to find such patterns. However, the algorithm does not provide the same guarantees as most state-of-the-art process discovery algorithms. Moreover, the algorithm only works well under the assumption that the event log only holds frequent behaviour that fits nicely into some underlying process model.

Real event logs typically include low-frequent exceptional behaviour, e.g. caused by people deviating from the normative process or cases that require special treatment. Because of this, applying ILP-based process discovery as-is on real data often yields, despite its potential, unsatisfactory results. In this paper we present a revised ILP-based process discovery algorithm that solves the inherent shortcomings of current ILP-based approaches. Our contribution is summarized as follows: (1) We show that our approach is able discover relaxed sound workflow nets, and (2) We present an effective, integrated, filtering algorithm that results in process models that abstract from infrequent and/or exceptional behaviour.

The proposed algorithm is implemented in the process mining framework ProM [39] (HybridILPMiner package) and available in RapidProM [5, 26]. We have compared our technique with two state-of-the-art filtering techniques [9, 15]. We additionally validated the applicability of our approach on two real life event logs [11, 19]. Our experiments confirm the effectiveness of the proposed approach, both in terms of resulting model quality and computational complexity.

The remainder of this paper is organized as follows. In Sect. 2, we discuss related work. In Sect. 3, we present background concepts. In Sect. 4, we show how to discover relaxed sound workflow nets. In Sect. 5, we present an integrated filtering algorithm that eliminates infrequent behaviour. In Sect. 6, we evaluate the proposed approach. Sect. 7 concludes the paper.

2 Related work

In this section we predominantly focus on the application of region theory, i.e. the theoretical foundation of ILP-based process discovery, in process discovery. We additionally discuss filtering techniques designed for process discovery.

2.1 Process discovery

The state-of-the-art process discovery algorithm, i.e. the Inductive Miner [16], discovers process models by applying a divide-and-conquer approach. The algorithm splits the event log into smaller parts and, recursively, finds models for these sub-logs which are later combined. The resulting models are hierarchically structured sound workflow nets [27]. A limitation of the approach is its inability to discover complex non-local control flow patterns. The discovery approach presented in [6], i.e. the Evolutionary Tree Miner, is able to find similar process models as the Inductive Miner. The algorithm opts an evolutionary computational approach and is therefore non-deterministic and does not guarantee termination. Like the Inductive Miner, it is not able to discover complex non-local control flow patterns. For an overview of other process discovery algorithms we refer to [12, 28, 35].

Several process discovery techniques are proposed based on region theory, i.e. a solution to the Petri net synthesis problem [23]. Region theory comes in two forms, i.e. state-based region theory [4, 13, 14] using transition systems as an input and language-based region theory [1, 2, 10, 17, 18] using languages as an input. The main difference between the synthesis problem and process discovery is related to generalization of the discovered models. Process models found by classical region theory approaches have perfect replay-fitness and maximal precision. Process discovery on the other hand aims at extracting a generalizing process model, i.e. precision, and in some cases replay-fitness, need not be maximized.

In [31] a process discovery approach is presented that transforms an event log into a transition system, after which state-based region theory is applied. Constructing the transition system is strongly parametrized, i.e. using different parameters yields different process discovery results. In [25] a similar approach is presented. The main contribution is a complexity reduction w.r.t. conventional region-based techniques. In [3] a process discovery approach is presented based on language-based region theory. The method finds a minimal linear basis of a polyhedral cone of integer points, based on the event log. It guarantees perfect replay-fitness, whereas it does not maximize precision. The worst-case time complexity of the approach is exponential in the size of the event log. In [8] a process discovery algorithm is proposed based on the concept of numerical abstract domains. Based on the event log’s prefix-closure, a convex polyhedron is approximated by means of calculating a convex hull. The convex hull is used to compute causalities in the input event log by deducing a set of linear inequalities which represent places. In [34] a first design of a process discovery ILP-formulation is presented. An objective function is presented, which is generalized in [37], that allows for expressing a preference for finding certain Petri net places. The work also presents means to formulate ILP constraints that help finding more advanced Petri net-types, e.g. Petri nets with reset- and inhibitor arcs.

All aforementioned techniques leverage the strict implications of region theory w.r.t. process discovery, i.e. precision maximization, poor generalization and poor simplicity, to some extent. However, the techniques still perform suboptimal. Since the techniques guarantee perfect replay-fitness, they tend to fail if exceptional behaviour is present in the event log, i.e. they produce models that are incorporating infrequent behaviour (outliers).

2.2 Filtering infrequent behaviour

Little work has been done regarding filtering of infrequent behaviour in context of process mining. The majority of work concerns unpublished/undocumented ad-hoc filtering implementations in the ProM framework [39].

In [9] an event log filtering technique is presented that filters on event level. Events within the event log are removed in case they do not fit an underlying, event log based, automaton. The technique can be used as a pre-processing step prior to invoking any discovery algorithm. In [15] the Inductive Miner [16] is extended with filtering capabilities to handle infrequent behaviour. The technique is tailored towards the internal working of the Inductive Miner algorithm and considers three different types of filters. Moreover, the technique exploits the inductive nature of the underlying algorithm, i.e. filters are applied on multiple levels.

3 Background

In this section we present basic notational conventions, event logs and workflow nets. Moreover, we present a process discovery ILP-formulation based on [34, 37].

3.1 Bags, sequences and vectors

\(X= \{e_1,e_2, \ldots , e_n\}\) denotes a set. \(\mathscr {P}(X)\) denotes the power set of \(X\). \(\mathbb {N}\) denotes the set of positive integers including 0 whereas \(\mathbb {N}^+\) excludes 0. \(\mathbb {R}\) denotes the set of real numbers. A bag (multiset) over \(X\) is a function \(B: X\rightarrow \mathbb {N}\) which we write as \([e_1^{v_1}, e_2^{v_2}, \ldots , e_n^{v_n}]\), where for \(1 \le i \le n\) we have \(e_i \in X, v_i \in \mathbb {N}^+\) and \(B(e_i) = v_i\). If for some element \(e, B(e) = 1\), we omit its superscript. An empty bag is denoted as \(\emptyset \). Element inclusion applies to bags: if \(e \in X\) and \(B(e) > 0\) then also \(e \in B\). Set operations, i.e. \(\uplus , {\setminus }, \cap \), extend to bags. The set of all bags over \(X\) is denoted \(\mathscr {B}(X)\).

A sequence \(\sigma \) of length k relates positions to elements \(e\in X\), i.e. \(\sigma : \{1,2,\ldots ,k\} \rightarrow X\). An empty sequence is denoted as \(\epsilon \). We write every non-empty sequence as \(\langle e_1, e_2, \ldots , e_k \rangle \). The set of all possible sequences over a set \(X\) is denoted as \(X^*\). We write concatenation of sequences \(\sigma _1\) and \(\sigma _2\) as \(\sigma _1 \cdot \sigma _2\), e.g. \(\langle a , b \rangle \cdot \langle c, d \rangle = \langle a,b,c,d \rangle \). Let \(Y \subseteq X\), we define \(\downarrow _{Y} :X^* \rightarrow Y^*\) recursively with \(\downarrow _{Y}(\epsilon ) = \epsilon \) and \(\downarrow _{Y}(\langle x \rangle \cdot \sigma ) = \langle x \rangle \cdot \downarrow _{Y}(\sigma )\) if \(x \in Y\) and \(\downarrow _{Y}(\sigma )\) otherwise. We write \(\sigma _{\downarrow _{Y}}\) for \(\downarrow _{Y}(\sigma )\).

Given \(Y \in X^*\), the prefix-closure of Y is: \(\overline{Y} = \{\sigma _1 \in X^* | \exists \sigma _2 \in X^* (\sigma _1 \cdot \sigma _2 \in Y)\}\). We extend the notion of a prefix-closure on bags of sequences. Let \(Y \subseteq X^*\) and \(B_{Y}: Y \rightarrow \mathbb {N}\) we define \(\overline{B}_{Y} : \overline{Y} \rightarrow \mathbb {N}\), such that: \( \overline{B}_{Y}(\sigma ) = B_{Y}(\sigma ) + \sum _{\sigma \cdot \langle e \rangle \in \overline{Y}} \overline{B}_{Y}(\sigma \cdot \langle e \rangle ) \). For example, \(B_2 = [\langle a,b \rangle ^5, \langle a,c \rangle ^3]\) yields \(\overline{B}_2 = [\epsilon ^{8}, \langle a \rangle ^{8},\langle a,b \rangle ^5, \langle a,c \rangle ^3]\).

Given set \(X\) and a range of values \(R \subseteq \mathbb {R}\). Vectors are denoted as \(\mathbf {z} \in R^{|X|}\), where \(\mathbf {z}(e) \in R\) and \(e \in X\). We assume vectors to be column vectors. For vector multiplication we assume that vectors agree on their indices. Throughout the paper we assume a total ordering on sets of the same domain. Given \(X= \{e_1, e_2, \ldots , e_n\}\) and \(\mathbf {z}_1, \mathbf {z}_2 \in R^{|X|}\) we have \(\mathbf {z}_1^{\intercal } \mathbf {z}_2 = \sum _{i=1}^{n} \mathbf {z}_1(e_i) \mathbf {z}_2(e_i)\). A Parikh vector \(\mathbf {p}\) represents the number of occurrences of an element within a sequence, i.e. \(\mathbf {p} : X^* \rightarrow \mathbb {N}^{|X|}\) with \(\mathbf {p}(\sigma ) = (\#_{e_1}(\sigma ), \#_{e_2}(\sigma ), \ldots , \#_{e_n}(\sigma ))\) where \(\#_{e_i}(\sigma )= |\{i' \in \{1,2, \ldots , |\sigma |\} \mid \sigma (i') = e_i\}|\).

3.2 Event logs and workflow nets

In process discovery, event logs, which describe the actual execution of activities in context of a business process, are the main source of input. An example event log is presented in Table 1. Consider all activities related to Case-id 1. John registers a request, after which Lucy examines it thoroughly. Pete checks the ticket after which Rob decides to reject the request. The execution of an activity in context of a business process is referred to as an event. A sequence of events, e.g. the sequence of events related to case 1, is referred to as a trace.

Table 1 Fragment of a fictional event log [28] (a row corresponds to an event)

Let \(\mathscr {A}\) denote the universe of all possible activities. An event log \(L\) is a bag of sequences over \(\mathscr {A}\), i.e., \(L\in \mathscr {B}(\mathscr {A}^*)\). Typically, there exists \(A_{L} \subset \mathscr {A}\) of activities that are actually present in \(L\). In some cases we refer to an event log as \(L\in \mathscr {B}(A_{L}^*)\). A sequence \(\sigma \in L\) represents a trace. We write case 1 as trace \(\langle \) “register request”,“examine thoroughly”, “check ticket”, “decide”, “reject request”\(\rangle \). In the remainder of the paper, we use simple characters for activity names, e.g. we write case 1 as \(\langle a,b,d,e,h \rangle \).

The goal within process discovery is to discover a process model based on an event log. In this paper we consider workflow nets (WF-nets) [27], based on Petri nets [22], to describe process models. We first introduce Petri nets and their execution semantics, after which we define workflow nets.

A Petri net is a bipartite graph consisting of a set of vertices called places and a set of vertices called transitions. Arcs connect places with transitions and vice versa. Additionally, transitions have a (possibly unobservable) label which describes the activity that the transition represents. A Petri net is a quadruple \(N= (P,T,F, \lambda )\), where \(P\) is a set of places and \(T\) is a set of transitions with \(P\cap T= \emptyset \). \(F\) denotes the flow relation of \(N\), i.e., \(F\subseteq (P\times T) \cup (T\times P)\). \(\lambda \) denotes the label function, i.e. given a set of labels \(\Lambda \) and a symbol \(\tau \notin \Lambda \), it is defined as \(\lambda :T\rightarrow \Lambda \cup \{ \tau \}\). For a node \(x \in P\cup T\), the pre-set of x in \(N\) is defined as \(\bullet x = \{y \mid (y,x) \in F\}\) and \(x \bullet = \{y \mid (x,y) \in F\}\) denotes the post-set of x. Graphically we represent places as circles and transitions as boxes. For every \((x,y) \in F\) we draw an arc from x to y. An example Petri net (which is also a WF-net) is depicted in Fig. 1. Observe that we have \(\bullet d = \{c_2\}, d \bullet = \{c_4\}\) and \(\lambda (d) =\) “check ticket”. The Petri net does not contain any silent transition.

Fig. 1
figure 1

Example WF-net \(W_1\), adopted from [28]

The execution semantics of Petri nets are based on the concept of markings. A marking \(M\) is a bag of tokens, i.e. \(M\in \mathscr {B}(P)\). Graphically, a place \(p\)’s marking is visualized by drawing \(M(p)\) number of dots inside place p, e.g. place “start” in Fig. 1. A marked Petri net is a 2-tuple \((N,M)\), where \(M\) represents \(N\)’s marking. We let \(M_i\) denote \(N\)’s initial marking. Transition \(t\in T\) is enabled in marking \(M\) if \(\forall p\in \bullet t(M(p) > 0)\). Enabled transition \(t\) in marking \(M\), may fire, which results in new marking \(M'\). If \(t\) fires, denoted as \((N,M)\xrightarrow {t}(N,M')\), then for each \(p\in P\) we have \(M'(p) = M(p) - 1\) if \(p\in \bullet t{\setminus } t\bullet , M'(p) = M(p) + 1\) if \(p\in t\bullet {\setminus } \bullet t\), and, \(M'(p) = M(p)\) otherwise, e.g. in Fig. 1 we have \((W_1,[start])\xrightarrow {a}(W_1,[c_1, c_2])\). Given sequence \(\sigma = \langle t_1, t_2, \ldots , t_n \rangle \in T^*, \sigma \) is a firing sequence of \((N,M)\), written as \((N,M) \xrightarrow []{\sigma }\mathrel {}\rightarrow (N,M')\) if and only if for \(n = |\sigma |\) there exist markings \(M_1,M_2,\ldots ,M_{n-1}\) such that \((N,M)\xrightarrow {t_1}(N,M_1), (N,M_1)\xrightarrow {t_2}(N,M_2),\ldots ,(N,M_{n-1})\xrightarrow {t_n}(N,M')\). We write \((N, M) \xrightarrow []{\sigma }\mathrel {}\rightarrow *\) if there exists a marking \(M'\) s.t. \((N, M) \xrightarrow []{\sigma }\mathrel {}\rightarrow (N, M')\). We write \((N, M) \rightsquigarrow (N, M')\) if there exists \(\sigma \in T^*\) s.t. \((N, M) \xrightarrow []{\sigma }\mathrel {}\rightarrow (N,M')\).

WF-nets extend Petri nets and require the existence of a unique source- and sink place which describe the start, respectively end, of a case. Moreover, each element within the WF-net needs to be on a path from the source to the sink place.

Definition 1

(Workflow net [27]) Let \(N = (P,T,F, \lambda )\) be a Petri net. Let \(p_i,p_o \in P\) with \(p_i \ne p_o\). Let \(\Lambda \subset \mathscr {A}\) be a set of activities, let \(\tau \notin \Lambda \) and let \(\lambda :T\rightarrow \Lambda \cup \{\tau \}\). Tuple \(W= (P, T, F, p_i, p_o, \lambda )\) is a workflow net (WF-net) if and only if:

  1. 1.

    \(\bullet p_i = \emptyset \)

  2. 2.

    \(p_o \bullet = \emptyset \)

  3. 3.

    Each element \(x \in P\cup T\) is on a path from \(p_i\) to \(p_o\).

The execution semantics defined for Petri nets can directly be applied on the elements \(P, T\) and \(F\) of \(W= (P, T, F, p_i, p_o, \lambda )\). Notation-wise we substitute \(W\) for its underlying net structure \(N= (P, T, F)\), e.g. \((W, M) \rightsquigarrow (W, M')\). In context of WF-nets, we assume \(M_i = [p_i]\) and \(M_f = [p_o]\) unless mentioned otherwise.

Several behavioural quality metrics, that do not need any form of domain knowledge, exist for WF-nets. Several notions of soundness of WF-nets are defined [32]. For example, classical sound WF-nets are guaranteed to be free of livelocks, deadlocks, and other anomalies that can be detected automatically. In this paper we consider the weaker notion of relaxed soundness. Relaxed soundness requires that each transition is at some point enabled, and, after firing such transition we are able to eventually reach the final marking.

Definition 2

(Relaxed soundness [32]) Let \(W= (P, T, F, p_i, p_o, \lambda )\) be a WF-net. \(W\) is relaxed sound if and only if: \(\forall t\in T(\exists M, M' \in \mathscr {B}(P)((W,[p_i]) \rightsquigarrow (W,M) \wedge (W,M) \xrightarrow {t} (W, M') \wedge (W,M') \rightsquigarrow (W, [p_o])))\).

Reconsider \(W_1\) (Fig. 1) and assume we are given an event log with one trace: \(\langle a,b,d,e,h \rangle \). It is quite easy to see that \(W_1\) is relaxed sound. Moreover, replay-fitness is perfect, i.e. \(\langle a,b,d,e,h \rangle \) is in the WF-net’s labelled execution language. Precision is not perfect as the WF-net can produce a lot more traces than just \(\langle a,b,d,e,h \rangle \).

3.3 Discovering petri net places using integer linear programming

In [34] an integer linear programming (ILP)-formulation [24] is presented which allows for finding places of a Petri net. A solution to the ILP-formulation corresponds to a region, which in turn corresponds to a Petri net place. The premise of a region is the fact that its corresponding place, given the prefix-closure of an event log, does not block the execution of any sequence within the prefix-closure. We represent a region as an assignment of binary decision variables describing the incoming and outgoing arcs of its corresponding place, as well as its marking.

Given an event log \(L\) over set of activities \(A_{L}\), a region is a triple \(r= (m,\mathbf {x},\mathbf {y})\), with \(m \in \{0,1\}\) and \(\mathbf {x},\mathbf {y} \in \{0,1\}^{|A_L|}\), that adheres to:

$$\begin{aligned} \forall \sigma = \sigma ' \cdot \langle a \rangle \in \overline{L} (m + \mathbf {p}(\sigma ')^{\intercal }\mathbf {x} - \mathbf {p}(\sigma )^{\intercal }\mathbf {y} \ge 0 ) \end{aligned}$$
(3.1)

A region \(r\) is translated to a Petri net place \(p\) as follows. Given a Petri net structure that has a unique transition \(t_{a} \in T\) with \(\lambda (t_{a})=a\). If, \(\mathbf {x}(a) = 1\), we add \(t_{a}\) to \(\bullet p\). Symmetrically, if \(\mathbf {y}(a) = 1\), we add \(t_{a}\) to \(p\bullet \). Finally, if \(m = 1\), place \(p\) is initially marked. Since translating a region to a place is deterministic, we are also able to translate a place to a region, e.g. place \(c_2\) in Fig. 1 corresponds to a region with \(\mathbf {x}(a) = 1, \mathbf {x}(f) = 1, \mathbf {y}(d) = 1\) and all other variables set to zero.

Prior to presenting the basic ILP-formulation for finding regions, we formulate regions in terms of matrices, which we use in the ILP-formulation.

Definition 3

(Region (matrix form)) Given an event log \(L\) over a set of activities \(A_{L}\), let \(m \in \{0,1\}\) and let \(\mathbf {x},\mathbf {y} \in \{0,1\}^{|A_{L}|}\). Let \(\mathbf {M}\) and \(\mathbf {M}'\) be two \(|\overline{L} {\setminus } \{\epsilon \} | \times |A_{L}|\) matrices with \(\mathbf {M}(\sigma ,a) = \mathbf {p}(\sigma )(a)\) and \(\mathbf {M}'(\sigma ,a) = \mathbf {p}(\sigma ')(a)\) (where \(\sigma = \sigma ' \cdot \langle a' \rangle \in \overline{L}\)). Tuple \(r = (m, \mathbf {x}, \mathbf {y})\) is a region if and only if:

$$\begin{aligned} m \mathbf {1} + \mathbf {M}'\mathbf {x} - \mathbf {M} \mathbf {y} \ge \mathbf {0} \end{aligned}$$
(3.2)

We additionally define matrix \(\mathbf {M}_{L}\) which is an \(|L| \times |A_{L}|\) matrix with \(\mathbf {M}_{L}(\sigma , a) = \mathbf {p}(\sigma )(a)\) for \(\sigma \in L\), i.e., \(\mathbf {M}_L\) is the equivalent of \(\mathbf {M}\) for all traces in the event log. We define a general process discovery ILP-formulation that guarantees to find a non-trivial region, i.e. regions unequal to \((0, \mathbf {0}, \mathbf {0})\) and \((1, \mathbf {1}, \mathbf {1})\), with the property that its corresponding place is always empty after replaying each trace within the event log.

Definition 4

(Process discovery ILP-formulation [34]) Given an event log \(L\) over a set of activities \(A_{L}\) and corresponding matrices \(\mathbf {M}, \mathbf {M}'\) and \(\mathbf {M}_L\). Let \(c_m \in \mathbb {R}\) and \(\mathbf {c_x}, \mathbf {c_y} \in \mathbb {R}^{|A_L|}\). The process discovery ILP-formulation, \(ILP_{L}\), is defined as:

$$\begin{aligned} \begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} \mathbf{minimize} &{} &{} z = c_m m + \mathbf {c_x}^{\intercal } \mathbf {x} +\mathbf {c_y}^{\intercal } \mathbf {y} &{} &{} \hbox {objective function}\\ \mathbf{such that} &{} &{} m \mathbf {1} + \mathbf {M}' \mathbf {x} - \mathbf {M} \mathbf {y} \ge \mathbf {0} &{} &{} \hbox {theory of regions}\\ {{\varvec{and}}} &{} &{} m \mathbf {1} + \mathbf {M}_L(\mathbf {x} - \mathbf {y}) = \mathbf {0} &{} &{} \hbox {corresp. place is empty after each trace} \\ &{} &{} \mathbf {1}^{\intercal } \mathbf {x} + \mathbf {1}^{\intercal }\mathbf {y} \ge 1 &{} &{} \hbox {at least one arc connected}\\ &{} &{} \mathbf {0} \le \mathbf {x} \le \mathbf {1} &{} &{} \hbox {i.e. } \mathbf {x} \in \{0,1\}^{|A|}\\ &{} &{} \mathbf {0} \le \mathbf {y} \le \mathbf {1} &{} &{} \hbox {i.e. } \mathbf {y} \in \{0,1\}^{|A|}\\ &{} &{} 0 \le m \le 1 &{} &{} \hbox {i.e. } m \in \{0,1\}\\ \end{array} \end{aligned}$$

Definition 4 allows us to find a region that minimizes objective function \(z = c_m m + \mathbf {c_x}^{\intercal } \mathbf {x} +\mathbf {c_y}^{\intercal } \mathbf {y}\). Multiple instantiations of z, i.e. in terms of objective coefficients \(c_m, \mathbf {c_x}\) and \(\mathbf {c_y}\), are possible. In [34] an objective function is proposed that minimizes 1-values in \(\mathbf {x}\) and maximizes 1-values in \(\mathbf {y}\), i.e. in the region’s corresponding place the number of incoming arcs is minimized whereas the number of outgoing arcs is maximized. In [37] the aforementioned objective function is extended such that it minimizes the time a token resides in the corresponding place. Both objective functions are expressible as a more general function which favours minimal regions [37], i.e. regions that are not expressible as a non-negative linear combination of two other regions. This is interesting since non-minimal regions correspond to implicit places [34]. In this paper we simply assume that one uses such an objective function.

4 Discovering relaxed sound workflow nets

Using the basic formulation with some objective function instantiation only yields one, optimal, result. However, we are interested in finding multiple places that together form a workflow net. In [34] multiple approaches are presented to find multiple, different Petri net places. Here we adopt, and generalize, the causal approach.

4.1 Discovering multiple places based on causal relations

One of the most suitable techniques to find multiple regions in a controlled, structured manner, is by exploiting causal relations present within an event log. A causal relation between activities a and b implies that activity a causes b, i.e. b is likely to follow (somewhere) after activity a. Several approaches exist to compute causalities [35]. For example, in [30] a causal relation \(a \rightarrow _{L} b\) from activity a to activity b is defined to hold if, within some event log \(L\), we find traces of the form \(\langle \ldots , a,b, \ldots \rangle \) though we do not find traces of the form \(\langle \ldots , b,a, \ldots \rangle \). In [40, 41] this relation was further developed to take frequencies into account as well. Given these multiple definitions, we assume the existence of a causal relation oracle which, given an event log, produces a set of pairs (ab) indicating that activity a has a causal relation with (to) activity b.

Definition 5

(Causal relation oracle) A causal relation oracle \(\gamma _c\) maps a bag of traces to a set of activity pairs, i.e. \(\gamma _c: \mathscr {B}(\mathscr {A}^*) \rightarrow \mathscr {P}(\mathscr {A}\times \mathscr {A})\).

A causal oracle only considers activities present in an event log, i.e. \(\gamma _c(L) \in \mathscr {P}(A_L\times A_L)\). It defines a directed graph with \(A_L\) as vertices and each pair \((a,b) \in \gamma _c(L)\) as an arc between a and b. Later we exploit the graph-based view, for now we refer to \(\gamma _c(L)\) as a collection of pairs. When adopting a causal ILP process discovery strategy, we try to find net places that represent a causality found in the event log. Given an event log \(L\), for each pair \((a,b) \in \gamma _c(L)\) we enrich the constraint body with three constraints: (1) \(m = 0\), (2) \(\mathbf {x}(a) = 1\) and (3) \(\mathbf {y}(b) = 1\). The three constraints ensure that if we find a solution to the ILP, it corresponds to a place which is not marked and connects transition a to transition b. Given pair \((a,b) \in \gamma _c(L)\) we denote the corresponding extended causality based ILP-formulation as \(ILP_{(L,a \rightarrow b)}\).

After solving \(ILP_{(L,a \rightarrow b)}\) for each \((a,b) \in \gamma _c(L)\), we end up with a set of regions that we are able to transform into places in a resulting Petri net. Since we enforce \(m=0\) for each causality, none of these places is initially marked. Moreover, due to constraints based on \(m \mathbf {1} + \mathbf {M}_L(\mathbf {x} - \mathbf {y}) = \mathbf {0}\), the resulting place is empty after replaying each trace in the input event log within the net. Since we additionally enforce \(\mathbf {x}(a) = 1\ \text {and}\ \mathbf {y}(b) = 1\), if we find a solution to the ILP, the corresponding place has both input and output arcs and is not eligible for being a source/sink place. Hence, the approach as-is does not allow us to find WF-nets. In the next section we show that a simple pre-processing step performed on the event log, together with specific instances of \(\gamma _c(L)\), allows us to discover WF-nets which are relaxed sound.

4.2 Discovering workflow nets

Consider example event log \(L_1 = [\langle a,b,d,e,g \rangle ^{10}, \langle a,c,d,e,f,d,b,e,g \rangle ^{12}, \langle a,d,c,e,h \rangle ^{9}, \langle a,b,d,e,f,c,d,e,g \rangle ^{11}, \langle a,d,c,e,f,b,d,e,h \rangle ^{13}]\). Observe that for each trace \(\sigma \) in \(L_1\) we have \((W_1,[start]) \xrightarrow []{\sigma }\mathrel {}\rightarrow (W_1,[end])\). Let \(A_f \subseteq A_L\) denote the set of final activities, i.e. activities \(a_f\) s.t. there exists a trace of the form \(\langle \ldots , a_f \rangle \) in the event log. For example, for \(L_1, A_f = \{g,h\}\). After solving each \(ILP_{L, a \rightarrow b}\) instance based on \(\gamma _c(L)\) and adding corresponding places, we know that when we exactly replay any trace from \(L_1\), after firing g or h, the net is empty. Since g and h never co-occur in a trace, it is trivial to add a sink place \(p_o\), s.t. after replaying each trace in \(L_1, p_o\) is the only place marked, i.e. \(\bullet p_o = \{f,g\}\) and \(p_o \bullet = \emptyset \) (place “end” in Fig. 1). In general, such decision is not trivial. However, a trivial case for adding a sink \(p_o\) is the case when there is only one end activity that uniquely occurs once, at the end of each trace, i.e. \(A_f = \{a_f\}\) and there exists no trace of the form \(\langle \ldots ,a_f, \ldots , a_f \rangle \). In such case we have \(\bullet p_o = \{a_f\}, p_o \bullet = \emptyset \).

A similar rationale holds for adding a source place. We define a set \(A_s\) that denotes the set of start activities, i.e. activities \(a_s\) s.t. there exists a trace of the form \(\langle a_s, \ldots \rangle \) in the event log. For each activity \(a_s\) in \(A_s\) we know that for some traces in the event log, these are the first ones to be executed. Thus, we know that the source place \(p_i\) must connect, in some way, to the elements of \(A_s\). Like in the case of final transitions, creating a source place is trivial when \(A_s = \{a_s\}\) and there exists no trace of the form \(\langle a_s, \ldots , a_s, \ldots \rangle \), i.e. the start activity uniquely occurs once at the beginning of each trace. In such case we create place \(p_i\) with \(\bullet p_i = \emptyset , p_i \bullet = \{a_s\}\).

In order to be able to find a source and a sink place, it suffices to guarantee that sets \(A_s\) and \(A_f\) are of size one and their elements always occur uniquely at the start, respectively, end of a trace. We formalize this idea through the notion of unique start/end event logs, after which we show that transforming an arbitrary event log to such unique start/end event log is trivial.

Definition 6

(Unique start/end event log) Let \(L\) be an event log over a set of activities \(A_L\). \(L\) is a Unique Start/End event Log (USE-Log) if there exist \(a_s,a_f \in A_{L}\) s.t. \(a_s \ne a_f, \forall \sigma \in L(\sigma (1) = a_s \wedge \forall i \in \{2,3,\ldots ,|\sigma |\}(\sigma (i) \ne a_s))\) and \(\forall \sigma \in L(\sigma (|\sigma |) = a_f \wedge \forall i \in \{1,2,\ldots ,|\sigma |-1\}(\sigma (i) \ne a_f))\).

Since the set of activities \(A_L\) is finite, it is trivial to transform any event log to a USE-log. Assume we have an event log \(L\) over \(A_L\) that is not a USE-log. We generate two “fresh” activities \(a_s,a_f \in \mathscr {A}\) s.t. \(a_s,a_f \notin A_{L}\) and create a new event log \(L'\) over \(A_{L} \cup \{a_s, a_f\}\), by adding \(\langle a_s \rangle \cdot \sigma \cdot \langle a_f \rangle \) to \(L'\) for each \(\sigma \in L\). We let \(\pi : \mathscr {B}(\mathscr {A}^*) \rightarrow \mathscr {B}(\mathscr {A}^*)\) denote such USE-transformation. We omit \(a_s\) and \(a_f\) from the domain of \(\pi \) and assume that given some USE-transformation the two symbols are known.

Clearly, after applying a USE-transformation, finding a unique source and sink place is trivial. It also provides an additional advantage considering the ability to find WF-nets. In fact, an ILP instance \(ILP_{(L, a \rightarrow b)}\) always has a solution if \(L\) is a USE-log. We provide a proof of this property in Lemma 1, after which we present an algorithm that, given specific instantiations of \(\gamma _c\), discovers WF-nets.

Lemma 1

(A USE-Log based causality has a solution) Let \(L\) be an event log over a set of activities \(A_{L}\). Let \(\pi : \mathscr {B}(\mathscr {A}^*) \rightarrow \mathscr {B}(\mathscr {A}^*)\) denote a USE-transformation function and let \(a_s, a_f\) denote the start and end activities. For every \((a,b) \in \gamma _c(\pi (L))\) with \(a \ne a_f\) and \(b \ne a_s, ILP_{(\pi (L), a \rightarrow b)}\) has a solution.

Proof

See [38]. \(\square \)

In Algorithm 1 we present an ILP-Based process discovery approach that uses a USE-log internally in order to find multiple Petri net places. For every \((a,b) \in \gamma _c(\pi (L))\) with \(a \ne a_f\) and \(b \ne a_s\) it solves \(ILP_{(\pi (L), a \rightarrow b)}\). Moreover, it finds a unique source and sink place.

figure a

The algorithm constructs an initially empty Petri net \(N = (P,T,F)\). Subsequently for each \(a \in A_{L} \cup \{a_s, a_f\}\) a transition \(t_a\) is added to \(T\). For each causal pair in the USE-variant of input event log \(L\), a place \(p_{(a,b)}\) is discovered by solving \(ILP_{(\pi (L), a \rightarrow b)}\) after which \(P\) and \(F\) are updated accordingly. The algorithm adds an initial place \(p_i\) and connects it to \(t_{a_s}\) and similarly creates sink place \(p_o\) which is connected to \(t_{a_f}\). For transition \(t_a\) related to \(a\in A_{L}\), we have \(\lambda (t_a) = a\), whereas \(\lambda (t_{a_s}) = \lambda (t_{a_f})= \tau \).

The algorithm is guaranteed to always find a solution to \(ILP_{(\pi (L), a \rightarrow b)}\), hence for each causal relation a place is found. Additionally, a unique source and sink place are constructed. However, the algorithm does not guarantee that we find a connected component, i.e. requirement 3 of Definition 1. In fact, the nature of \(\gamma _c\) determines whether or not we discover a WF-net. In Theorem 1 we characterize this nature and prove, by exploiting Lemma 1, that we are able to discover WF-nets.

Theorem 1

(There exist sufficient conditions for finding WF-nets) Let \(L\) be an event log over a set of activities \(A_L\). Let \(\pi : \mathscr {B}(\mathscr {A}^*) \rightarrow \mathscr {B}(\mathscr {A}^*)\) denote a USE-transformation function. Let \(a_s, a_f\) denote the unique start- and end activity of \(\pi (L)\). Let \(\gamma _c: \mathscr {B}(\mathscr {A}^*) \rightarrow \mathscr {P}(\mathscr {A}\times \mathscr {A})\) be a causal oracle and consider \(\gamma _c(\pi (L))\) as a directed graph. If each \(a\in A_L\) is on a path from \(a_s\) to \(a_f\) in \(\gamma _c(\pi (L))\), and there is no path from \(a_s\) to itself, nor a path from \(a_f\) to itself, then \(\texttt {ILP-Based Process Discovery}(L, \gamma _c)\) returns a WF-net.

Proof

See [38]. \(\square \)

Theorem 1 proves that if we use a causal structure that, when interpreting it as a graph, has the property that each \(a\in A_L\) is on a path from \(a_s\) to \(a_f\), the result of \(\texttt {ILP-Based Process Discovery}(L, \gamma _c)\) is a WF-net. Although this seems a rather strict property of the causal structure, there exists a specific causal graph definition that guarantees this property [40]. Hence we are able to use this definition as an instantiation for \(\gamma _c\).

Theorem 1 does not provide any behavioural guarantees, i.e. a WF-net is a purely graph-theoretical property. Recall that the premise of a region is that it does not block the execution of any sequence within the prefix-closure of an event log. Intuitively we deduce that we are therefore able to fire each transition in the WF-net at least once. Moreover, since we know that \(a_f\) is the final transition of each sequence in \(\pi (L)\), and after firing the transition each place based on any \(ILP_{(\pi (L), a\rightarrow b)}\) is empty, we know that we are able to mark \(p_o\). These two observations hint on the fact that the WF-net is relaxed sound, which we prove in Theorem 2

Theorem 2

Let \(L\) be an event log over a set of activities \(A_L\). Let \(\pi : \mathscr {B}(\mathscr {A}^*) \rightarrow \mathscr {B}(\mathscr {A}^*)\) denote a USE-transformation function and let \(a_s, a_f\) denote the unique start- and end activity of \(\pi (L)\). Let \(\gamma _c: \mathscr {B}(\mathscr {A}^*) \rightarrow \mathscr {P}(\mathscr {A}\times \mathscr {A})\) be a causal oracle. Let \(W= (P, T, F, p_i, p_o, \lambda ) = \texttt {ILP-Based Process Discovery}(A, L, \gamma _c)\). If \(W\) is a WF-net, then \(W\) is relaxed sound.

Proof

See [38]. \(\square \)

We have shown that with a few pre- and post-processing steps and a specific class of causal structures we are able to guarantee to find WF-nets that are relaxed sound. These results are interesting since several process mining techniques require WF-nets as an input. The ILP problems solved still require their solutions to allow for all possible behaviour in the event log. As a result, the algorithm incorporates all infrequent exceptional behaviour and still results in over-fitting complex WF-nets. Hence, in the upcoming section we show how to efficiently prune the ILP constraint body to identify and eliminate infrequent exceptional behaviour.

5 Dealing with infrequent behaviour

In this section we present an efficient pruning technique that identifies and eliminates constraints related to infrequent exceptional behaviour. We first present the impact of infrequent exceptional behaviour after which we present the pruning technique.

5.1 The impact of infrequent exceptional behaviour

In this section we highlight the main cause of ILP-based discovery’s inability to handle infrequent behaviour and we devise a filtering mechanism that exploits the nature of the underlying body of constraints.

Let us again consider example event log \(L_1\), i.e., \(L_1 = [\langle a,b,d,e,g \rangle ^{10}, \langle a,c,d,e,f,d,b,e,g \rangle ^{12}, \langle a,d,c,e,h \rangle ^{9}, \langle a,b,d,e,f,c,d,e,g \rangle ^{11}, \langle a,d,c,e,f,b,d,e,h \rangle ^{13}]\). Using an implementation of Algorithm 1 in ProM [39], with a suitable causal structure \(\gamma _c\), we find the WF-net depicted in Fig. 2a. The WF-net describes the same behaviour as the model presented in Fig. 1 and has perfect replay-fitness w.r.t. \(L_1\). However, if we create event log \(L_1'\) by simply adding one instance of the trace \(\langle a,b,c,d,e,g \rangle \), we obtain the result depicted in Fig. 2b. Due to one exceptional trace, the model allows us, after executing a or f, to execute an arbitrary number of b- and c-labelled transitions. This is undesirable since precision of the resulting process model drops significantly. Thus, the addition of one exceptional trace results in a less comprehensible WF-net and reduces the precision of the resulting WF-net.

When analysing the two models we observe that they share some equal places, e.g. both models have a place \(p_{(\{a,f\},\{d\})}\) with \(\bullet p_{(\{a,f\},\{d\})} = \{a,f\}\) and \(p_{(\{a,f\},\{d\})} \bullet = \{d\}\). However, the two places \(p_{(\{a,f\},\{b,c\})}\) with \(\bullet p_{(\{a,f\},\{b,c\})} = \{a,f\}\) and \(p_{(\{a,f\},\{b,c\})} \bullet = \{b,c\}\) and \(p_{(\{b,c\},\{e\})}\) with \(\bullet p_{(\{b,c\},\{e\})} = \{b,c\}\) and \(p_{(\{b,c\},\{e\})} \bullet = \{e\}\) in Fig. 2a, are not present in Fig. 2b. These are “replaced” by the less desirable places containing self-loops in Fig. 2b. This is caused by the fact that the constraint body of the ILP’s based on event log \(L'_1\) contain all constraints present in the ones related to \(L_1\), combined with the additional constraints depicted in Table 2.

For place \(p_{(\{a,f\},\{b,c\})}\) in Fig. 2a we define a corresponding tuple \(r = (m, \mathbf {x}, \mathbf {y})\) with \(\mathbf {x}(a) = 1, \mathbf {x}(f) = 1, \mathbf {y}(b)= 1\) and \(\mathbf {y}(c) = 1\) (all other variables 0). The additional constraints in Table 2 all evaluate to \(-\,1\) for \(r\), e.g. constraint \(m + \mathbf {x}(a_s) + \mathbf {x}(a) + \mathbf {x}(b) - \mathbf {y}(a_s) - \mathbf {y}(a) - \mathbf {y}(b) - \mathbf {y}(c)\) evaluates to \(0 + 0 + 1 + 0 - 0 - 0 - 1 - 1 = -\,1\). In case of place \(p_{(\{b,c\},\{e\})}\) we observe that the corresponding tuple \(r = (m, \mathbf {x}, \mathbf {y})\) with \(\mathbf {x}(b) = 1, \mathbf {x}(c) = 1\) and \(\mathbf {y}(e) = 1\), yields a value of 1 for all constraints generated by trace \(\langle a,b,c,d,e,g \rangle \). For the constraints having a “\(\ge 0\) right hand side” this is valid, however, for constraint \(m + \mathbf {x}(a_s) + \mathbf {x}(a) + \mathbf {x}(b) + \mathbf {x}(c) + \mathbf {x}(d) + \mathbf {x}(e) + \mathbf {x}(g) + \mathbf {x}(a_f) - \mathbf {y}(a_s) - \mathbf {y}(a) - \mathbf {y}(b) - \mathbf {y}(c) - \mathbf {y}(d) - \mathbf {y}(e) - \mathbf {y}(g) - \mathbf {y}(a_f) = 0\) this is not valid.

Fig. 2
figure 2

Results of applying Algorithm 1 (HybridILPMiner package in the ProM Framework [39]) based on \(L_1\) and \(L'_1\). a Result based on event log \(L_1\). b Result based on event log \(L'_1\)

Table 2 Some of the newly added constraints based on trace \(\langle a,b,c,d,e,g \rangle \) in event log \(L'_1\), starting from prefix \(\langle a,b,c \rangle \) which is not present in \(\overline{L_1}\)

The example shows that the addition of \(\langle a,b,c,d,e,g \rangle \) yields constraints that invalidate places \(p_{(\{a,f\},\{b,c\})}\) and \(p_{(\{b,c\},\{e\})}\). As a result the WF-net based on event log \(L'_1\) contains places with self-loops on both b and c which greatly reduces its precision and simplicity. Due to the relative infrequency of trace \(\langle a,b,c,d,e,g \rangle \) it is arguably acceptable to trade-off the perfect replay-fitness guarantee of ILP-Based process discovery and return the WF-net of Fig. 2a, given \(L'_1\). Hence, we need filtering techniques and/or trace clustering techniques in order to remove exceptional behaviour. However, apart from simple pre-processing, we aim at adapting the ILP-based process discovery approach itself to be able to cope with infrequent behaviour.

By manipulating the constraint body such that it no longer allows for all behaviour present in the input event log, we are able to deal with infrequent behaviour within event logs. Given the problems that arise because of the presence of exceptional traces, a natural next step is to leave out the constraints related to the problematic traces. An advantage of filtering the constraint body is the fact that the constraints are based on the prefix-closure of the event log. Thus, even if all traces are unique yet they do share prefixes, we are able to filter. Additionally, leaving out constraints decreases the size of the ILP’s constraint body, which has a potential positive effect on the time needed to solve an ILP. We devise a graph-based filtering technique, i.e., sequence encoding filtering, that allows us to prune constraints based on trace frequency information.

5.2 Sequence encoding graphs

As a first step towards sequence encoding filtering we define the relationship between sequences and constraints. We do this in terms of sequence encodings. A sequence encoding is a vector-based representation of a sequence in terms of region theory, i.e. representing the sequence’s corresponding constraint.

Definition 7

(Sequence encoding) Given a set of activities \(A = \{a_1, a_2, \ldots , a_n\}\). \(\mathbf {\phi } : A^* \rightarrow \mathbb {N}^{2|A| + 1}\) denotes the sequence encoding function mapping every \(\sigma \in A ^*\) to a \(2 \cdot |A| + 1\)-sized vector. We define \(\mathbf {\phi }\) as:

$$\begin{aligned} \begin{array}{cc} \mathbf {\phi }(\sigma ' \cdot \langle a \rangle ) = \begin{pmatrix} 1\\ \mathbf {p}(\sigma ')\\ -\mathbf {p}(\sigma ' \cdot \langle a \rangle ) \end{pmatrix} &{} \mathbf {\phi }(\epsilon ) = \begin{pmatrix} 1\\ 0\\ \vdots \\ 0 \end{pmatrix} \end{array} \end{aligned}$$

As an example of a sequence encoding vector consider sequence \(\langle a_s, a,b \rangle \) originating from \(\overline{\pi (L'_1)}\), for which we have \(\mathbf {\phi }(\langle a_s, a,b \rangle )^{\intercal } = (1,1,1,0,0,0,0,0,0,0,0,-\,1,-\,1,-\,1,0,0,0,0,0,0,0)\). Sequence encoding vectors directly correspond to region theory based constraints, e.g. if we are given \(m \in \{0,1\}\) and \(\mathbf {x}, \mathbf {y} \in \{0,1\}^{|A|}\) and create a vector \(\mathbf {r}\) where \(\mathbf {r}(1) = m, \mathbf {r}(2) = \mathbf {x}(a_s), \mathbf {r}(3) = \mathbf {x}(a)\), ..., \(\mathbf {r}(10) = \mathbf {x}(h), \mathbf {r}(11) = \mathbf {x}(a_f), \mathbf {r}(12) = \mathbf {y}(a_s)\), ..., \(\mathbf {r}(21) = \mathbf {y}(a_f)\), then \(\mathbf {\phi }(\langle a_s, a,b \rangle )^{\intercal }\mathbf {r} = m + \mathbf {x}(a_s) + \mathbf {x}(a) - \mathbf {y}(a_s) - \mathbf {y}(a) - \mathbf {y}(b)\). As a compact notation for \(\sigma = \sigma ' \cdot \langle a \rangle \) we write \(\mathbf {\phi }(\sigma )\) as a pair of the bag representation of the Parikh vector of \(\sigma '\) and a, i.e. \(\mathbf {\phi }(\langle a_s, a, b \rangle )\) is written as \(([a_s,a], b)\) whereas \(\mathbf {\phi }(\langle a_s, a,b,c \rangle )\) is written as \(([a_s,a,b],c)\). For \(\mathbf {\phi }(\epsilon )\) we write \(([], \bot )\).

Table 3 Schematic overview of sequence encodings based on \(\overline{\pi (L'_1)}\)

Consider the prefix-closure of \(\pi (L'_1)\) which generates the linear inequalities presented in Table 3. The table shows each sequence present in \(\overline{\pi (L'_1)}\) accompanied by its \(\mathbf {\phi }\)-value and the number of occurrences of the sequence in \(\overline{\pi (L'_1)}\), e.g. \(\overline{\pi (L'_1)}(\langle a_s,a \rangle ) = 56\). Observe that there is a relation between the occurrence of a sequence and its corresponding postfixes, i.e. after the 56 times that sequence \(\langle a_s,a \rangle \) occurred, \(\langle a_s,a,b \rangle \) occurred 22 times, \(\langle a_s,a,c \rangle \) occurred 12 times and \(\langle a_s,a,d \rangle \) occurred 22 times (note: \(56 = 22 + 12 + 22\)). Due to coupling of sequences to constraints, i.e. by means of sequence encoding, we can now apply the aforementioned reasoning to constraints as well. The frequencies in \(\overline{\pi (L'_1)}\) allow us to decide whether the presence of a certain constraint is in line with predominant behaviour in the event log. For example, in Table 3, \(\mathbf {\phi }(\langle a_s,a,b,c \rangle )\) relates to infrequent behaviour as it appears only once.

To apply filtering, we construct a weighted directed graph in which each sequence encoding acts as a vertex. We connect two vertices by means of an arc if the source constraint corresponds to a sequence that is a prefix of a sequence corresponding to the target constraint, i.e., we connect \(\mathbf {\phi }(\langle a_s, a \rangle )\) to \(\mathbf {\phi }(\langle a_s, a,b \rangle )\) as \(\langle a_s, a \rangle \) is a prefix of \(\langle a_s,a,b \rangle \). The arc weight corresponds to trace frequency in the input event log.

Fig. 3
figure 3

An example sequence encoding graph \(G'_1\), based on example event log \(L'_1\)

Definition 8

(Sequence encoding graph) Given event log \(L\) over set of activities \(A_{L}\). A sequence encoding graph is a directed graph \(G = (V,E, \psi )\) where \(V = \{\mathbf {\phi }(\sigma ) \mid \sigma \in \overline{L} \}, E \subseteq V \times V\) s.t. \((\mathbf {\phi }(\sigma '),\mathbf {\phi }(\sigma )) \in E \Leftrightarrow \exists a \in A (\sigma ' \cdot \langle a \rangle = \sigma )\) and \(\psi : E \rightarrow \mathbb {N}\) where:

$$\begin{aligned} \psi (v_1,v_2) = \sum _{\tiny \begin{matrix} \sigma \in \overline{L}\\ \mathbf {\phi }(\sigma ) = v_2 \end{matrix}}\overline{L}(\sigma ) - \sum _{\tiny \begin{matrix}\sigma ' \in \overline{L}\\ \sigma ' \cdot \langle a \rangle \in \overline{L} \\ \mathbf {\phi }(\sigma ' \cdot \langle a \rangle ) = v_2 \\ \mathbf {\phi (\sigma ')} \ne v_1 \end{matrix}} \overline{L}(\sigma ') \end{aligned}$$

Consider the sequence encoding graph in Fig. 3, based on \(\pi (L'_1)\), as an example. By definition, \(([], \bot )\) is the root node of the graph and connects to all one-sized sequences. Within the graph we observe the relation among different constraints, combined with their absolute frequencies based on \(L'_1\).

5.3 Filtering

Given a sequence encoding graph we are able to filter out constraints. In Algorithm 2 we devise a simple breadth-first traversal algorithm, i.e. Sequence Encoding Filtering—Breadth First Search (SEF-BFS), that traverses the sequence encoding graph and concurrently constructs a set of ILP constraints. The algorithm needs a function as an input that is able to determine, given a vertex in the sequence encoding graph, what portion of adjacent vertices remains in the graph and which are removed.

Definition 9

(Sequence encoding filter) Given event log \(L\) over set of activities \(A_{L}\) and a corresponding sequence encoding graph \(G = (V,E, \psi )\). A sequence encoding filter is a function \(\kappa : V \rightarrow \mathscr {P}(V)\).

Note that \(\kappa \) is an abstract function and might be parametrized. As an example consider \(\kappa ^{\alpha }_{\max }\) which we define as:

$$\begin{aligned} \kappa ^{\alpha }_{\max }(v) = \left\{ v' \mid (v,v') \in E \wedge \psi (v,v') \ge (1-\alpha ) \cdot \max _{v'' \in V} \psi (v,v'') \right\} ,\ \alpha \in [0,1] \end{aligned}$$

Other instantiations of \(\kappa \) are possible as well and hence \(\kappa \) is a parameter of the general approach. It is however desirable that \(\kappa (v) \subseteq \{v' \mid (v,v') \in E\}\), i.e. it only considers vertices reached by v by means of an arc. Given an instantiation of \(\kappa \), it is straightforward to construct a filtering algorithm based on breadth-first graph traversal, i.e. SEF-BFS.

figure b

The algorithm inherits its worst-case complexity of breadth first search, multiplied by the worst-case complexity of \(\kappa \). Thus, in case \(\kappa \)’s worst-case complexity is O(1) then we have \(O(|V| + |E|)\) for the SEF-BFS-algorithm. It is trivial to prove, by means of induction on the length of a sequence encoding’s corresponding sequence, that a sequence encoding graph is acyclic. Hence, termination is guaranteed.

As an example of executing the SEF-BFS algorithm, reconsider Fig. 3. Assume we use \(\kappa ^{0.75}_{\max }\). Vertex \(([],\bot )\) is initially present in Q and will be analysed. Since \(([],a_s)\) is the only child of \(([],\bot )\), it is added to Q. Vertex \(([],\bot )\) is removed from the queue and is never inserted in the queue again due to the acyclic property of the graph. Similarly, since \(([a_s],a)\) is the only child of \(([], a_s)\) it is added to Q. All children of \(([a_s],a)\), i.e. \(([a_s,a],b), ([a_s,a],c)\) and \(([a_s,a],d)\), are added to the queue since the maximum corresponding arc value is 22, and, \((1 - 0.75) * 22 = 5.5\), which is smaller than the lowest arc value 12. When analysing \(([a_s,a],b)\) we observe a maximum outgoing arc with value 21 to vertex \(([a_s,a,b],d)\) which is enqueued in Q. Since \((1-0.25)*21=5.25\), the algorithm does not enqueue \(([a_s,a,b],c)\). Note that the whole path of vertices from \(([a_s,a,b],c)\) to \(([a_s,a,b,c,d,e,g], a_f)\) is never analysed and is stripped from the constraint body, i.e. they are never inserted in C.

When applying ILP-based process discovery based on event log \(L' _1\) with sequence encoding filtering and \(\kappa ^{0.75}_{\max }\), we obtain the WF-net depicted in Fig. 2a. As explained, the filter leaves out all constraints related to vertices on the path from \(([a_s,a,b],c)\) to \(([a_s,a,b,c,d,e,g], a_f)\). Hence, we find a similar model to the model found on event log \(L_1\) and are able to filter out infrequent exceptional behaviour.

6 Evaluation

Algorithm 1 and sequence encoding filtering are implemented in the HybridILPMiner package (http://svn.win.tue.nl/repos/prom/Packages/HybridILPMiner/) which is available in the ProM framework [39] (http://www.promtools.org) and the RapidProM framework [26].Footnote 1 Using this implementation we validated the approach. In an artificial setting we evaluated the quality of models discovered and the efficiency of applying sequence encoding filtering. We also compare sequence encoding to the IMi [15] algorithm and automaton-based filtering [9]. Finally, we assess the performance of sequence encoding filtering on real event data [11, 19].

6.1 Model quality

The event logs used in the empirical evaluation of model quality are artificially generated event logs and originate from a study related to the impact of exceptional behaviour to rule-based approaches in process discovery [20]. Three event logs where generated out of three different process models, i.e. the ground truth event logs. These event logs do not consist of any exceptional behaviour, i.e. every trace fits the originating model. The ground truth event logs are called a12f0n00, a22f0n00 and a32f0n00. The two digits behind the a character indicate the number of activities present in the event log, i.e. a12f0n00 contains 12 different activities. From each ground truth event log, by means of trace manipulation, four other event logs are created that do contain exceptional behaviour. Manipulation concerns tail/head of trace removal, random part of the trace body removal and interchanging two randomly chosen events [20]. The percentages of trace manipulation are 5, 10, 20 and 50%. The manipulation percentage is incorporated in the last two digits of the event log’s name, i.e. the 5% manipulation version of the a22f0n00 event log is called a22f0n05. In this section we only study results of the experiments using the a22f0nXX event logs, in [38] we report on a12f0nXX and a32f0nXX as well.

The existence of ground truth event logs, free of exceptional behaviour, is of utmost importance for evaluation. We need to be able to distinguish normal from exceptional behaviour in an unambiguous manner. Within evaluation, these event logs, combined with the quality dimension precision, allow us to judge how well a technique is able to filter out exceptional behaviour. Recall that precision is defined as the number of traces producible by the process model that are also present in the event log. Thus if all traces producible by a process model are present in an event log, precision is maximal, i.e. the precision value is 1. If the model allows for traces that are not present in the event log, precision is lower than 1.

If exceptional behaviour is present in an event log, the conventional ILP-based process discovery algorithm produces a WF-net that allows for all exceptional behaviour. As a result, the algorithm is unable to find any meaningful patterns within the event log. This typically leads to places with a lot of self-loops. The acceptance of exceptional behaviour by the WF-net, combined with the inability to find meaningful patterns yields a low level of precision, when using the ground truth log as a basis for precision computation. On the other hand, if we discover models using an algorithm that is more able to handle the presence of exceptional behaviour, we expect the algorithm to allow for less exceptional behaviour and find more meaningful patterns. Thus, w.r.t. the ground truth model, we expect higher precision values.

To evaluate the sequence encoding filtering approach, we have applied the ILP-based process discovery algorithm with sequence encoding filtering using \(\kappa ^{\alpha }_{max}\) and \(\alpha = 0,0.05,0.1,\ldots ,0.95,1\). Moreover, we performed similar experiments for IMi [15] (http://svn.win.tue.nl/repos/prom/Packages/InductiveMiner/) and automaton based event log filtering [9] (http://svn.win.tue.nl/repos/prom/Packages/NoiseFiltering/) combined with ILP-based discovery. We measured precision [21] and replay-fitness [33] based on the ground truth event logs. The replay-fitness results of the experiments are presented in Fig. 4. Precision results are presented in Fig. 5. In the charts we plot replay-fitness/precision against the noise level and filter threshold. We additionally use a colour scheme to highlight the differences in value.

Fig. 4
figure 4

Replay-fitness measurements based on a22f0nXX. a Sequence encoding. b IMi [15]. c ILP with Automaton Filter [9]

Fig. 5
figure 5

Precision measurements based on a22f0nXX. a Sequence encoding. b IMi [15]. c ILP with Automaton Filter [9]

For the sequence encoding filter (Figs. 4a, 5a) we observe that replay-fitness is often 1, except for very rigorous levels of filtering, i.e. \(\alpha =0\) and \(\alpha =0.05\). When applying it as rigorous as possible, i.e. \(\alpha =0\), we observe relatively stable replay-fitness values of around 0.6, for different levels of noise. The discovered model at 0% noise level has a precision value of 1. This implies that the filter, in the case of 0% noise, removes behaviour that is present in the ground-truth event log. Precision drops to around 0.7 for increasing levels of noise. The relative stable levels of replay-fitness and precision for increasing levels of noise when using using \(\alpha = 0\) suggest that the filter only incorporates a few branches of most frequent behaviour, which is the same throughout different levels of noise. Since the precision values are lower than 1, combined with the fact that parallelism exists in the original model, it seems that the most frequent branches do incorporate some form of parallelism that generate behaviour not observed in the event log.

For the 5 and 10% noise levels we observe that threshold values in between 0 and 0.6 achieve acceptable levels of precision. These values are slightly lower than the precision values related to 0% noise, which implies that the filter in these cases is not able to remove all noise. The rapid trend towards precision values close to 0 for threshold levels above 0.6 suggests that the filter does not remove any or very little noise. For larger levels of noise we observe a steeper drop in precision. Only very low threshold levels (up to 0.2) achieve precision values around 0.3. The results suggest that these levels of noise introduce levels of variety in the data that no longer allow the sequence encoding filter to identify (in)frequent behaviour. Hence, even for low threshold values the filter still incorporates noise into the resulting process models.

For IMi (Figs. 4b, 5b) we observe similar behaviour (note that the filter threshold works inverted w.r.t. sequence encoding filtering, i.e. a value of 1 implies most rigorous filtering). However, replay-fitness drops a little earlier compared to sequence encoding filtering. The drop in precision of sequence encoding filtering is smoother than the drop in precision of IMi, i.e. there exist some spikes within the graph. Hence, the applying filtering within IMi seems to be less deterministic.

Finally, automaton based filtering (Figs. 4c, 5c) rapidly drops to replay-fitness values of 0. Upon inspection it turns out the filter returns empty event logs for the corresponding threshold and noise levels. Hence, the filter seems to be very sensitive around a threshold value in-between 0 and 0.2. The precision results for the automaton based filter (Fig. 5c) are as expected. With a low threshold value we have very low precision, except when we have a 0% noise level. Towards a threshold level of 0.2, precision increases after which it maximizes out to a value of 1. This is in line with the replay-fitness measurements.

We conclude that the sequence encoding filter and IMi lead to comparable results. However, the sequence encoding filter provides more expected results, i.e. IMi behaves somewhat less deterministic. The automaton based filter does provide good results, however, sensibility of the filter threshold is much higher compared to sequence encoding filtering and IMi.

6.2 Computation time

Using sequence encoding filtering, we leave out constraints refer to exceptional behaviour. Hence, we reduce the size of the core ILP constraint body and thus expect a decrease in computation time when applying rigorous filtering, i.e. \(\kappa ^{\alpha }_{max}\) with \(\alpha \) towards 0. Using RapidMiner we repeated similar experiments to the experiments performed for model quality, and measured cpu-execution time for the three techniques. However, we only use threshold values 0, 0.25, 0.75 and 1.

Fig. 6
figure 6

CPU-Execution Time (ms) for a22f0nXX event logs (logarithmic scale) for different levels of noise. The percentage of noise is depicted on top of each bar chart

In Fig. 6 we present the average cpu-execution time, based on 50 experiment repetitions, needed to obtain a process model from an event log. For each level of noise we depict computation time for different filter threshold settings. For IMi, we measured the inductive miner algorithm with integrated filtering. For sequence encoding and automaton filtering, we measure the time needed to filter, discover a causal graph and solve underlying ILP problems. Observe that for IMi and the automaton-based filter, filtering most rigorously is performed with threshold levels of 1, as opposed to sequence encoding filtering which filters most rigorously at threshold 0.

We observe that IMi is fastest in most cases. Computation time slightly increases when the amount of noise increases within the event logs. For sequence encoding filtering we observe that lower threshold values lead to faster computation times. This is as expected since a low threshold value removes more constraints from the ILP constraint body than a high threshold value. The automaton-based filter is slowest in all cases. The amount of noise seems to have little impact on the computation time of the automaton-based filter, it seems to be predominantly depending on the filter threshold. From Fig. 6 we conclude that IMi in general out-performs sequence encoding in terms of computation time. However, sequence encoding, in turn out-performs automaton-based filtering.

6.3 Application to real-life event logs

We tested the applicability of sequence encoding filtering using real-life event logs. We used an event log related to a road fines administration process [11] and one regarding the treatment of patients suspected to have sepsis [19].

Fig. 7
figure 7

a Fitness and precision. b Number of arcs. c Fitness and precision. d Number of arcs. Replay-fitness, precision and complexity based on the Road Fines log [11] (Fig. 7a, b) and the Sepsis log [19] (Fig. 7c, d)

The results are presented in Fig. 7. In case of the Road Fines event log (figures on the left-hand side of Fig. 7) we observe that replay-fitness is around 0.46 whereas precision is around 0.4 for \(\alpha \)-values from 0 to 0.5. The number of arcs for the models of these \(\alpha \)-values remains constant (as well as the number of places and the number of transitions) suggesting that the models found are the same. After this the replay-fitness increases further to around 0.8 and reaches 1 for an \(\alpha \)-level of 1. Interestingly, precision shows a little increase around \(\alpha \)-levels between 0.5 and 0.75 after which it drops slightly below its initial value. In this case, an \(\alpha \)-level in-between 0.5 and 0.75 seems most appropriate in terms of replay-fitness, precision and simplicity.

In case of the Sepsis event log (figures on the left-hand side of Fig. 7) we observe that replay-fitness and precision are roughly behaving as each-other’s inverse, i.e. replay-fitness increases whereas precision decreases for increasing \(\alpha \)-levels. We moreover observe that the number of arcs within the process models is steadily increasing for increasing \(\alpha \)-levels. In this case, an \(\alpha \)-level in-between 0.1 and 0.4 seems most appropriate in terms of replay-fitness, precision and simplicity.

Finally, for each experiment we measured the associated computation time of solving all ILP problems. In case of the Road Fines event log, solving all ILP problems takes roughly 5 s. In case of the Sepsis event log, obtaining a model ILP problems takes less than 1 s.

As our experiments show, there is no specific threshold most suitable for sequence encoding, i.e. this greatly depends on the event log. We do however observe that using lower threshold values, e.g. 0–0.4, leads to less complex models. We therefore, in practical settings, advise to use a lower threshold value first, which also reduces computation time due to a smaller constraint body size, and based on the obtained result increase or decrease the threshold value if necessary.

7 Conclusion

The work presented in this paper is motivated by the observation that existing region-based process discovery techniques are useful, as they are able to find non-local complex control flow patterns. However, the techniques do not provide any structural guarantees w.r.t. the resulting process models, and, they are unable to cope with infrequent, exceptional behaviour in event logs.

The approach presented in this paper extends techniques presented in [34, 36, 37]. We have proven that our approach is able to discover relaxed sound workflow nets, i.e. we are now able to guarantee structural properties of the resulting process model. Additionally, we presented the sequence encoding filtering technique which enables us to apply filtering exceptional behaviour within the ILP-based process discovery algorithm. Our experiments confirm that the technique enables us to find meaningful Petri net structures in data consisting of exceptional behaviour, using ILP-based process discovery as an underlying technique. Sequence encoding filtering proves to be comparable to the IMi [15] approach, i.e. an integrated filter of the Inductive Miner [16], in terms of filtering behaviour. Moreover, it is considerably faster than the general purpose filtering approach of [9] and less sensible to variations in the filter threshold.

8 Future work

An interesting direction for future work concerns combining ILP-based process discovery techniques with other process discovery techniques. The Inductive Miner discovers sound workflow nets, however, these models lack the ability to express complex control flow patterns such as a milestone pattern. Some of these patterns are however reconstructible using ILP-based process discovery. Hence, it is interesting to combine these approaches with possibly synergetic effects w.r.t. the process mining quality dimensions.

Another interesting approach is the development of more advanced general purpose filtering techniques. Most discovery algorithms assume the input event logs to be free of noise, infrequent and/or exceptional behaviour. Real-life event logs however typically contain a lot of such behaviour. Surprisingly, little research is performed towards filtering techniques that greatly enhance process discovery results, independent of the discovery algorithm used.