Discovering Relaxed Sound Workflow Nets using Integer Linear Programming

Process mining is concerned with the analysis, understanding and improvement of business processes. Process discovery, i.e. discovering a process model based on an event log, is considered the most challenging process mining task. State-of-the-art process discovery algorithms only discover local control-flow patterns and are unable to discover complex, non-local patterns. Region theory based techniques, i.e. an established class of process discovery techniques, do allow for discovering such patterns. However, applying region theory directly results in complex, over-fitting models, which is less desirable. Moreover, region theory does not cope with guarantees provided by state-of-the-art process discovery algorithms, both w.r.t. structural and behavioural properties of the discovered process models. In this paper we present an ILP-based process discovery approach, based on region theory, that guarantees to discover relaxed sound workflow nets. Moreover, we devise a filtering algorithm, based on the internal working of the ILP-formulation, that is able to cope with the presence of infrequent behaviour. We have extensively evaluated the technique using different event logs with different levels of exceptional behaviour. Our experiments show that the presented approach allow us to leverage the inherent shortcomings of existing region-based approaches. The techniques presented are implemented and readily available in the HybridILPMiner package in the open-source process mining tool-kits ProM and RapidProM.


Introduction
The execution of business processes within a company generates traces of event data in its supporting information system. The goal of process mining [2] is to turn this data, recorded in event logs, into actionable knowledge. Three core branches form the basis of process mining: process discovery, conformance checking and process enhancement. In process discovery, this paper's focus, the goal is to construct a process model based on an event log. In conformance checking the goal is to assess whether a given process model and event log conform with respect to each other in terms of described behaviour. In process enhancement the goal is to improve processes models, primarily, though not exhaustively, using the two aforementioned fields.
Several different process models exist that (largely) describe the behaviour in an event log. Hence, we need means to rank and compare these different process models. In process mining we typically judge the quality of process models based on four essential quality dimensions: replay-fitness, precision, generalization and simplicity [2,3,18]. Replay-fitness describes the fraction of behaviour in the event log that is also described by the model. Precision describes the fraction of behaviour described by the model that is also present in the event log. Generalization indicates a model's ability to account for behaviour not part of the event log, e.g. in case of parallelism, it is often impossible to observe all behaviour in the event log. Simplicity refers to a model's interpretability by a human analyst. A process discovery result ideally strikes and adequate balance between these four quality dimensions.
A field closely related to process discovery is Petri net synthesis [11]. Here the problem is to, given a behavioural system description, decide whether there exists a Petri net [34] that allows for all behaviour of the system description. Moreover, it needs to minimize additional behaviour. Most Petri net synthesis approaches use region theory [12] which comes in two forms: state-based region theory [15,23,24] (using transition systems), and language-based region theory [10,21] (using languages). Applying classical region theory using an event log as a system description results in Petri nets with maximal replay-fitness. Moreover, precision is maximized. An implicit consequence is poor generalization and poor simplicity. Using these techniques directly on real event logs therefore results in process models that are not an adequate representation of the event log and do not allow us to reach the global goal of process mining, i.e. turning data into actionable knowledge.
In [44] a process discovery algorithm is proposed on top of language-based region theory. The core of the algorithm is an Integer Linear Programming (ILP)-formulation that is solved multiple times using slight variations. The main contribution is a relaxation of the precision maximization property of languagebased region theory. The algorithm still guarantees that the resulting Process model is able to replay all behaviour in the event log. Opposed to state-ofthe-art process discovery algorithms, the algorithm provides limited guarantees w.r.t. structural and behavioural properties of the resulting process models. Moreover, the algorithm only works well under the assumption that the event log only holds frequent behaviour that fits nicely into some underlying process model.
Real event logs typically include low-frequent exceptional behaviour, e.g. caused by people deviating from the normative process, cases that require special treatment, employees solving unexpected issues in an ad-hoc fashion etc. Considering all irregularities together with "normal behaviour" yields incomprehensible models, both in classical region-based synthesis and region-based process discovery techniques. In this paper we tackle these problems by extending and improving existing, region theory based, algorithms [44][45][46]. This paper's contributions are summarized as follows: 1. We show that our approach is able discover relaxed sound workflow nets.
2. We present an effective, integrated, filtering algorithm that results in process models that abstract from infrequent and/or exceptional behaviour.
The proposed algorithm is implemented in the process mining framework ProM [38] (HybridILPMiner package) and is available in RapidProM [4,16]. We have compared our technique with two state-of-the-art filtering techniques [20,26]. We additionally validated the applicability of our approach on two real life event logs [27,30]. Our experiments confirm the effectiveness of the proposed approach, both in terms of resulting model quality and computational complexity.
The remainder of this paper is organized as follows. In section 2 we motivate the need to further develop ILP-based process discovery. In section 3 we discuss related work. In section 4 we present background related to event logs, Petri nets and region theory. In section 5 we show how to incorporate regions within process discovery. In section 6 we show that we are able to guarantee discovery of relax sound workflow nets. In section 7 we present an integrated effective algorithm to eliminate infrequent exceptional behaviour. In section 8 we present an evaluation of the proposed approach. section 9 concludes the paper.
Consider the following set of sequences of executed business process activities: a, c, d, e, f , a, c, b, d, f , a, c, e, d, f and a, e, c, d, f . If we apply ILP-Based process discovery [44], i.e. using region theory, we obtain the process model in Figure 1.
The model describes that activity a is always executed first. After activity a we are able to execute activity c and e in any order, i.e. they are in a parallel construct. However, after executing activity c we are able to perform activity b instead of e. However, this is only possible as long as we do not execute activity d, which we are able to execute after we have executed c. Finally we always execute activity f . In the model, the choice of executing activity b instead of e is influenced by the global state of the system. Such pattern is called a milestone pattern [6].
If we apply the aforementioned state-of-the-art discovery algorithms on these same data, we obtain the models depicted in Figure 2. None of the models adequately describes the milestone pattern. Some models do not even guarantee perfect replay-fitness, i.e. Figure 2a. Other models, such as the model in Figure 2c have very low precision. The only model that actually describes the same behaviour is the model in Figure 2d. However, the model does not capture the milestone pattern, i.e. we need to analyse the behaviour of the model to derive the conclusion that it describes the behaviour of a milestone pattern.
As the example shows there is a clear incentive for process discovery algorithms based on (language-based) region theory. However, the state-of-the-art technique based on language-based region theory [44] has a number of deficiencies. It is not able to guarantee that the resulting Petri net is a workflow net, i.e. a Petri net with favourable graph-theoretical properties. For example, the Petri net in Figure 1 does not have a unique sink place. The Inductive Miner for example does guarantee to return (sound) workflow nets. Moreover, ILPbased process discovery greatly suffers from the presence of infrequent and/or exceptional behaviour. Assume we have an event log containing thousands of repetitions of the aforementioned sequences a, c, d, e, f , a, c, b, d, f , a, c, e, d, f and a, e, c, d, f . If we inject just one exceptional sequence, e.g. a, b, c, d, f , and apply the current state-of-art region-based discovery algorithm, we obtain the model depicted in Figure 3. Within the model, activity b is now able to occur in parallel with activity c. Thus, by adding one infrequent, exceptional sequence, the algorithm is no longer able to detect the milestone pattern.
In this paper we solve the two aforementioned issues. Firstly, we present an approach that guarantees to find relaxed sound workflow nets. Secondly, we present an effective integrated filtering technique that identifies and ignores infrequent and/or exceptional behaviour.  [44], containing a milestone pattern.
(e) Heuristics Miner [40,41]. Figure 2: Results of several state-of-the-art discovery algorithms in the ProM Framework [38], when given sequences containing a milestone pattern.

Related Work
We predominantly focus on related work in the area of region theory and its application to process discovery. We also focus on filtering techniques for process discovery. For a detailed overview of process discovery algorithms we refer to [2,22,39].
Region Theory and Petri Net Synthesis Region theory is a solution to the Petri net synthesis problem [35]. The two terms are therefore often used interchangeably. The synthesis problem is to, given a behavioural system description, decide whether there exists a Petri net that allows for all behaviour described by the system description, and, at the same time minimizes additional behaviour. Initial work focused on solving the synthesis problem using transition systems as a system description [15,23,24], i.e. state-based region theory. A set of states within the transition system forms a region, which defines a place in the resulting Petri net. Region theory has also been applied using a prefix-closed language as a system description [10,21], i.e. languagebased region theory. Here, a region is an assignment of decision variables over the language's alphabet, again defining a place in the resulting Petri net. Finally, language-based region theory has also been extended for labelled partial orders [14,28,29].
Process Discovery In contrast to Petri net synthesis, process discovery aims at extracting a generalizing process model from an incomplete behavioural system description, i.e. an event log. Additionally, we typically need to abstract from infrequent behaviour in order to focus on the mainstream behaviour in the event log.
In [7] a process discovery approach is presented that transforms an event log into a transition system, after which state-based region theory is applied. Constructing the transition system is strongly parametrized, i.e. using different parameters yields different process discovery results. In [37] a similar approach is presented. The main contribution is a complexity reduction w.r.t. conventional region-based techniques.
In [13] a process discovery approach is presented based on language-based region theory. The method finds a minimal linear basis of a polyhedral cone of integer points, based on the event log. It guarantees perfect replay-fitness, whereas it does not maximize precision. The worst-case time complexity of the approach is exponential in the size of the event log. In [19] a process discovery algorithm is proposed based on the concept of numerical abstract domains. Based on the event log's prefix-closure a convex polyhedron is approximated by means of calculating a convex hull. The convex hull is used to compute causalities within the input log by deducing a set of linear inequalities which represent places. In [44] a first design of a process discovery ILP-formulation is presented. An objective function is presented, which is generalized in [46], that allows for expressing a preference for finding certain Petri net places. The work also presents means to formulate ILP-constraints that help finding more advanced Petri net-types, e.g. Petri nets with reset-and inhibitor arcs.
All aforementioned techniques leverage the strict implications of region the-ory w.r.t. process discovery, i.e. precision maximization, poor generalization and poor simplicity, to some extend. However, the techniques still perform suboptimal. Since the techniques guarantee perfect replay-fitness, they tend to fail if exceptional behaviour is present in the event log, i.e. they produce models that are incorporating infrequent behaviour (outliers).
Filtering Infrequent Behaviour Little work has been done regarding filtering of infrequent behaviour in context of process mining. The majority of work concerns unpublished/undocumented ad-hoc filtering implementations in the ProM framework [38].
In [20] an event log filtering technique is presented that filters on event level. Events within the event log are removed in case they do not fit an underlying, event log based, automaton. The technique can be used as a pre-processing step prior to invoking a discovery algorithm.
In [26] Leemans et al. show how to extend the Inductive Miner [25] with filtering capabilities to handle infrequent behaviour. The technique is tailored towards the internal working of the Inductive Miner algorithm and considers three different types of filters. Moreover, the technique exploits the inductive nature of the underlying algorithm, i.e. filters are applied on multiple levels.

Background
In this section we present basic notational conventions, event logs and workflow nets.

Bags, Sequences and Vectors
.., e n } denotes a set. P(X) denotes the power set of X. N denotes the set of positive integers including 0 whereas N + excludes 0. R denotes the set of real numbers. A bag (bag) over X is a function B : X → N which we write as [e v1 1 , e v2 2 , ..., e vn n ], where for 1 ≤ i ≤ n we have e i ∈ X, v i ∈ N + and e vi i ≡ B(e i ) = v i . If for some element e, B(e) = 1, we omit its superscript. An empty bag is denoted as ∅. Element inclusion applies to bags: if e ∈ X and B(e) > 0 then also e ∈ B. Set operations, i.e. ⊎, \, ∩, extend to bags. The set of all bags over X is denoted B(X).

Event Logs and Workflow Nets
In process discovery an event log acts as a main source of input and describes the actual execution of activities in context of a business process. An example event log, adopted from [2], is presented in Table 1. Consider all activities related to Case-id 1. John registers a request, after which Lucy examines it thoroughly. Pete checks the ticket after which Rob decides to reject the request. The execution of an activity in context of a business process is referred to as an event. A sequence of events, e.g. the sequence of events related to case 1, is referred to as a trace.
Let A denote the universe of all possible activities. An event log L is a bag of sequences over A, i.e., L ∈ B(A * ). Typically, there exists A L ⊂ A of activities that are actually present in L. In some cases we refer to an event log as L ∈ B(A * L ). A sequence σ ∈ L represents a trace. We write case 1 as trace "register request","examine thoroughly", "check ticket", "decide", "reject request" . In the remainder of the paper we use simple characters for activity names, e.g. we write case 1 as a, b, d, e, h .
The goal within process discovery is to discover a process model based on Figure 4: Example WF-net W 1 , adopted from [2].
an event log. In this paper we consider workflow nets (WF-nets) [1], based on Petri nets [34], to describe process models. We first introduce Petri nets and their execution semantics, after which we define workflow nets. A Petri net is a bipartite graph consisting of a set of vertices called places and a set of vertices called transitions. Arcs connect places with transitions and vice versa. Additionally, transitions have a (possibly unobservable) label which describes the activity that the transition represents. A Petri net is a quadruple N = (P, T, F, λ), where P is a set of places and T is a set of transitions with P ∩T = ∅. F denotes the flow relation of N , i.e., F ⊆ (P ×T )∪(T ×P ). λ denotes the label function, i.e. given a set of activities Λ ⊂ A and an unobservable activity τ / ∈ Λ, it is defined as λ : T → Λ ∪ {τ }. For a node x ∈ P ∪ T , the pre-set of x in N is defined as •x = {y | (y, x) ∈ F } and x• = {y | (x, y) ∈ F } denotes the post-set of x. Graphically we represent places as circles and transitions as boxes. For every (x, y) ∈ F we draw an arc from x to y. An example Petri net (which is also a WF-net) is depicted in Figure 4. Observe that we have •d = {c 2 }, d• = {c 4 } and λ(d) ="reject request". The Petri net does not contain any silent transition.
The execution semantics of Petri nets are based on the concept of markings A marking M is a bag of tokens, i.e. M ∈ B(P ). Graphically, a place p's marking is visualized by drawing M (p) number of dots inside place p, e.g. place "start" in ..,|σ ′ |} (λ(σ ′ (i)) = σ ′′ (i)) ∧ σ ′′ ↓Λ = σ} WF-nets extend Petri nets and require the existence of a unique source-and sink place which describe the start, respectively end, of a case. Moreover, each element within the WF-net needs to be on a path from the source to the sink place.
Definition 1 (Workflow net [1]). Let N = (P, T, F, λ) be a Petri net. Let The execution semantics defined for Petri nets can directly be applied on the elements P , T and F of W = (P, T, F, p i , p o , λ). Notation-wise we substitute W for its underlying net structure N = (P, T, F ), e.g. We compute metrics such as replay-fitness and precision, as introduced in the introduction, based on an event log. Several behavioural quality metrics, that do not need any form of domain knowledge, exist for WF-nets. Several notions of soundness of WF-nets are defined [5]. For example, classical sound WF-nets are guaranteed to be free of livelocks, deadlocks, and other anomalies that can be detected automatically. In this paper we consider the weaker notion of relaxed soundness. Relaxed soundness requires that each transition is at some point enabled, and, after firing such transition we are able to eventually reach the final marking.
Reconsider W 1 ( Figure 4) and assume we are given an event log with one trace: a, b, d, e, h . It is quite easy to see that W 1 is relaxed sound. Moreover, replay-fitness is perfect, i.e. a, b, d, e, h is in the WF-net's labelled execution language. Precision is not perfect as the WF-net can produce a lot more traces than just a, b, d, e, h .

Discovering Petri Net Places using Integer Linear Programming
In this section we show how to, given an event log as an input, discover multiple places of a Petri net using language based regions.

Regions
Conceptually, a region represents a place in a Petri net that, given the prefixclosure of an event log, does not block the execution of any sequence within the prefix-closure. We represent a region as an assignment of binary decision variables describing the incoming and outgoing arcs of its corresponding place, as well as its marking.
Variable m indicates whether or not the region's corresponding place contains a token, x denotes incoming arcs and y denotes outgoing arcs. Consider event log L 1 = [ a, b, d, e, g 10 , a, c, d, e, f, d, b, e, g 12 , a, d, c, e, h 9 , a, b, d, e, f, c, d, e, g 11 , a, d, c, e, f, b, d, e, h 13 ]. In Table 2 we depict a part of the corresponding set of linear inequalities based on Definition 3. For every non-empty sequence in L 1 , i.e. a , a, b , ..., a, d, c, e, f, b, d, e, h there is an associated linear inequality in terms of the variables m, x and y. For example, a leads to m − y(a), a, b leads to m + x(a) − y(a) − y(b) etc. Note that the inequalities abstract from the ordering of activities in traces, e.g. a, c, d, e and a, d, c, e both map to m + A region r is translated to a Petri net place p as follows. Given a Petri net that has a unique transition t a for each a ∈ A L such that λ(t a ) = a. If, for a ∈ A L x(a) = 1, we add t a to •p. Symmetrically, if for a ∈ A L y(a) = 1, we add t a to p•. Finally, if m = 1, place p is initially marked. Since translating a region to a place is deterministic, we are also able to translate a place to a region, e.g. place c 2 in Figure 4 corresponds to a region with x(a) = 1, x(f ) = 1, y(d) = 1 and all other variables set to zero. Triples r 0 = (0, 0, 0) and r 1 = (1, 1, 1) are always regions and hence are trivial regions. We let R(L) denote the set of non-trivial regions based on event log L.

A Basic ILP Formulation
Set R(L) represents a huge set of regions. However, when using L 1 as an input for process discovery, our goal is to find (a very similar WF-net to) the WF-net in Figure 4. Several regions exist in R(L 1 ) that are not a place in Figure 4. For example, a variable assignment with x(a) = 1 and y(e) = 1 (all other variables zero), i.e. representing a place connecting transitions a and e, is a region. We therefore need means to search through the solution space to find regions that are of interest.
Firstly, we are only interested in minimal regions, i.e. regions that are not expressible as a non-negative linear combination of two other regions, because non-minimal regions correspond to implicit places [44]. Hence, when applying region-based techniques in terms of process discovery, we only search for minimal regions. Finding all minimal regions does however not suffice, e.g. the aforementioned region with x(a) = 1 and y(e) = 1 is minimal yet implicit. To this end we define an Integer Linear Programming (ILP) [36] formulation using the region definition as a constraint body [44]. ILP is a mathematical optimization problem defined over a set of integer variables. The objective and constraints of an ILP-problem are an expression in terms of the variables of linear form. Before introducing the basic ILP-formulation for the purpose of process discovery, we reformulate regions in terms of matrices.
We additionally define matrix M L which is an |L| × |A L | matrix with M L (σ, a) = p(σ)(a) for σ ∈ L, i.e., M L is the equivalent of M for all traces in the event log. We define a general process discovery ILP-formulation that guarantees to find a non-trivial region with the additional property that the corresponding place is always empty after replaying each trace within the event log.
Definition 5 (Process Discovery ILP-formulation). Given an event log L over a set of activities A L and corresponding matrices M, M ′ and M L . Let c m ∈ R and c x , c y ∈ R |AL| . The process discovery ILP-formulation, ILP L , is defined as: Definition 5 acts as a basic formulation for process discovery using ILP. To actually use the formulation we need to instantiate c m , c x and c y , i.e. the objective coefficients, with meaningful values. By varying the actual values for the objective coefficients we are able to let the ILP favour different solutions. In [44] an objective function is proposed that minimizes the number of incoming arcs and maximizes the number of outgoing arcs to a place. In [46] the aforementioned objective function is extended such that it minimizes the time a token resides in the corresponding place. Both objective functions are expressible as a more general function which favours minimal regions [46]. In general we are able to use any objective function, as long as it favours minimal regions. Hence, in this paper we assume that one uses such objective function.
Using the basic formulation with some objective function instantiation only yields one, optimal, result. Hence we need a more structured approach for finding multiple Petri net places, using the ILP formulation presented as a basis.

Exploiting Causalities
We need to find multiple regions that together form places of a WF-net, in line with the behaviour present within the event log. One of the most suitable techniques to find multiple regions in a controlled, structured manner, is by exploiting causal relations present within an event log. A causal relation between activities a and b implies that activity a causes b, i.e. b is likely to follow (somewhere) after activity a.
Several approaches exist to compute causalities relations [22]. The α-Miner [8] defines causal relation a → L b from activity a to activity b if, within some event log L, we find traces of the form ..., a, b, ... though we do not find traces of the form ..., b, a, ... . Within the Heuristics Miner [40,41] this relation was further developed to take frequencies into account as well. Given these multiple definitions, we assume the existence of a causal relation oracle which, given an event log, produces a set of pairs (a, b) indicating that activity a has a causal relation with (to) activity b. A causal oracle maps an event log onto its activities, i.e. γ c (L) ∈ P(A L ×A L ). It defines a directed graph with A L as vertices and each pair (a, b) ∈ γ c (L) as an arc between a and b. Later we exploit this graph-based view, for now we refer to γ c (L) as a collection of pairs.
When adopting a causal-based ILP process discovery strategy, we try to find net places that represent a causality found in the event log. Given an event log L, for each pair (a, b) ∈ γ c (L) we enrich the constraint body with three constraints: 1.) m = 0, 2.) x(a) = 1 and 3.) y(b) = 1. The three constraints ensure that if we find a solution to the ILP it corresponds to a place which is not marked and connects transition a to transition b. Given pair (a, b) ∈ γ c (L) we denote the corresponding extended causality based ILP-formulation as ILP (L,a→b) .
After solving ILP (L,a→b) for each (a, b) ∈ γ c (L), we end up with a set of regions that we are able to transform into places in a resulting Petri net. Since we enforce m = 0 for each causality, none of these places is initially marked. Moreover, due to constraints based on m 1 + M L ( x − y) = 0, the resulting place is empty after replaying each trace in the input event log within the net. Since we additionally enforce x(a) = 1 and y(b) = 1, if we find a solution to the ILP, the corresponding place has both input and output arcs and is not eligible for being a source/sink place. Hence, the approach as-is does not allow us to find WF-nets. In the next section we show that a simple pre-processing step performed on the event log, together with specific instances of γ c (L), allows us to discover WF-nets which are relaxed sound. .., a f in the event log. For example, for L 1 , A f = {g, h}. After solving each ILP L,a→b instance based on γ c (L) and adding corresponding places, we know that when we exactly replay any trace from L 1 , after firing g or h, the net is empty. Since g and h never co-occur in a trace, it is trivial to add a sink place p o , s.t. after replay each trace in L 1 , p o is the only place marked, i.e. •p o = {f, g} and p o • = ∅ (place "end" in Figure 4). In general, such decision is not trivial. However, a trivial case for adding a sink p o is the case when there is only one end activity that uniquely occurs once, at the end of each trace, i.e. A f = {a f } and there exists no trace of the form ..., a f , ..., a f . In such case we have

Discovering Relaxed Sound Workflow Nets
A similar rationale holds for adding a source place. We define a set A s that denotes the set of start activities, i.e. activities a s s.t. there exists a trace of the form a s , ... in the event log. For each activity a s in A s we know that for some traces in the event log, these are the first ones to be executed. Thus, we know that the source place p i must connect, in some way, to the elements of A s . Like in the case of final transitions, creating a source place is trivial when A s = {a s } and there exists no trace of the form a s , ..., a s , ... , i.e. the start activity uniquely occurs once in each trace. In such case we create place p i with In order to be able to find a source and a sink place, it suffices to guarantee that sets A s and A f are of size one and their elements always occur uniquely at the start, respectively, end of a trace. We formalize this idea through the notion of unique start/end event logs, after which we show that transforming an arbitrary event log to such unique start/end event log is trivial.
Definition 7 (Unique start/end event log). Let L be an event log over a set of activities Since the set of activities A L is finite, it is trivial to transform any event log to a USE-log. Assume we have an event log L over A L that is not a USE-log. We generate two "fresh" activities a s , a f ∈ A s.t. a s , a f / ∈ A L and create a new event log L ′ over A L ∪ {a s , a f }, by adding a s · σ · a f to L for each σ ∈ L. We let π : B(A * ) → B(A * ) denote such USE-transformation. We omit a s and a f from the domain of π and assume that given some USE-transformation the two symbols are known.
Clearly, after applying a USE-transformation, finding a unique source and sink place is trivial. It also provides an additional advantage considering the ability to find WF-nets. In fact, an ILP instance ILP L,a→b always has a solution if L is a USE-log. We provide a proof of this property in Lemma 1, after which we present an algorithm that, given specific instantiations of γ c , discovers WFnets. Constructive. We consider the case a = a s and b = a f . We show that variable assignment x(a s ) = x(a) = x(b) = y(a) = y(b) = y(a f ) = 1, all other variables 0 (Figure 5a), adheres to all constraints of ILP (π(L),a→b) .
Since x(a s ) = 1 and a s occurs uniquely at the start of each trace, if σ ′ = ǫ such constraint equals 0, and, 1 otherwise.
Case III: x = b Similar to Case II. Case IV: x = a f . We again have p(σ ′ )(a) = p(σ)(a) and p(σ ′ )(b) = p(σ)(b). Since p(σ)(a f ) y(a f ) = p(σ ′ )(a s ) x(a s ) = 1, each constraint equals 0.  In Algorithm 1 we present an ILP-Based process discovery approach that uses a USE-log internally in order to find multiple Petri net places. For every (a, b) ∈ γ c (π(L)) with a = a f and b = a s it solves ILP (π(L),a→b) . Moreover, it finds a unique source and sink place.
The algorithm constructs an initially empty Petri net N = (P, T, F ). Subsequently for each a ∈ A L ∪ {a s , a f } a transition t a is added to T . For each causal pair in the USE-variant of input event log L, a place p (a,b) is discovered by solving ILP (π(L),a→b) after which P and F are updated accordingly. The algorithm adds an initial place p i and connects it to t as and similarly creates sink place p o which is connected to t a f . For transition t a related to a ∈ A L , we have λ(t a ) = a, whereas λ(t as ) = λ(t a f ) = τ .
The algorithm is guaranteed to always find a solution to ILP (π(L),a→b) , hence for each causal relation a place is found. Additionally, a unique source and sink place are constructed. However, the algorithm does not guarantee that we find a connected component, i.e. requirement 3 of Definition 1. In fact, the nature of γ c determines whether or not we discover a WF-net. In Theorem 1 we characterize this nature and prove, by exploiting Lemma 1, that we are able to discover WF-nets.  Theorem 1 (There exist sufficient conditions for finding WF-nets). Let L be an event log over a set of activities A L . Let π : B(A * ) → B(A * ) denote a USE-transformation function. Let a s , a f denote the unique start-and end activity of π(L). Let γ c : B(A * ) → P(A × A) be a causal oracle and consider γ c (π(L)) as a directed graph. If each a ∈ A L is on a path from a s to a f in γ c (π(L)), and there is no path from a s to itself, nor a path from a f to itself, then ILP-Based Process Discovery(L, γ c ) returns a WF-net.
On the structure of γ c (π(L)). By the requirements on γ c (π(L)) and Lemma 1, we know that for each (a, b) ∈ γ c (π(L)) a corresponding place will be found that has a transition labelled with a as an input and a transition labelled b as an output. Hence every path in γ c (π(L)) corresponds to a path in the resulting net and as a consequence, every transition is on a path from a s to a f . As every place that is added has input transition ( x(a) = 1) and an output transition ( y(b) = 1), every place is also on a path from a s to a f . By construction this then also holds from p i to p o .
Theorem 1 proves that if we use a causal structure that, when interpreting it as a graph, has the property that each a ∈ A L is on a path from a s to a f , the result of ILP-Based Process Discovery(L, γ c ) is a WF-net. Although this seems a rather strict property of the causal structure, there exists a specific causal graph definition that guarantees this property [41]. Hence we are able to use this definition as an instantiation for γ c .
Theorem 1 does not provide any behavioural guarantees, i.e. a WF-net is a purely graph-theoretical property. Recall that the premise of a region is that it does not block the execution of any sequence within the prefix-closure of an event log. Intuitively we deduce that we are therefore able to fire each transition in the WF-net at least once. Moreover, since we know that a f is the final transition of each sequence in π(L), and after firing the transition each place based on any ILP π(L),a→b is empty, we know that we are able to mark p o . These two observations hint on the fact that the WF-net is relaxed sound, which we prove in Theorem 2 ). Observe that t as is trivially enabled in M i = [p i ] since •t as = {p i }. Consider arbitrary t ∈ T \ {t as , t a f }. We know ∃ σ∈π(L) (σ = a s · σ ′ · λ(t) · σ ′′ · a f ).
Let t ′ 1 , t ′ 2 , ..., t ′ n s.t. λ(t ′ 1 ), λ(t ′ 2 ), ..., λ(t ′ n ) = σ ′ . The fact that each place p ∈ P \ {p i , p o } corresponds to a region yields that we may deduce M ′ (p) = 0, then p does not correspond to a region). Hence for any t ∈ T \ {t as , t a f } there exists a marking reachable from [p i ] that enables t. Now let t ′′ 1 , t ′′ 2 , ..., t ′′ n s.t. λ(t ′′ 1 ), λ(t ′′ 2 ), ..., λ(t ′′ n ) = σ ′′ . Note that also, again by the fact that each place p ∈ P \ {p i , p o } corresponds to a region, we may We have shown that with a few pre-and post-processing steps and a specific class of causal structures we are able to guarantee to find WF-nets that are relaxed sound. These results are interesting since several process mining techniques require WF-nets as an input. The ILP problems solved still require their solutions to allow for all possible behaviour in the event log. As a result, the algorithm incorporates all infrequent exceptional behaviour and still results in over-fitting complex WF-nets. Hence, in the upcoming section we show how to efficiently prune the ILP constraint body to identify and eliminate infrequent exceptional behaviour.

Dealing with Infrequent Behaviour
In this section we present an efficient pruning technique that identifies and eliminates constraints related to infrequent exceptional behaviour. We first present the impact of infrequent exceptional behaviour after which we present the pruning technique.

The Impact of Infrequent Exceptional Behaviour
In section 2 we already indicated the impact of infrequent behaviour on the results of ILP-based process discovery. In this section we highlight the main cause of ILP-based discovery's inability to handle infrequent behaviour and we devise a filtering mechanism that exploits the nature of the underlying body of constraints.
Let us again reconsider example event log L 1 , i.e., L 1 = [ a, b, d, e, g 10 , a, c, d, e, f, d, b, e, g 12 , a, d, c, e, h 9 , a, b, d, e, f, c, d, e, g 11 , a, d, c, e, f, b, d, e, h 13 ]. Using an implementation of Algorithm 1 in ProM [38], with a suitable causal structure γ c , we find the WF-net depicted in Figure 6a. The WF-net describes the same behaviour as the model presented in Figure 4 and has perfect replay-fitness w.r.t. L 1 . However, if we create event log L ′ by simply adding one instance of the trace a, b, c, d, e, g , we obtain the result depicted in Figure 6b. Due to one exceptional trace, the model allows us, after executing a or f to execute an arbitrary number of b-and c-labelled transitions. This is undesirable since precision of the resulting process model drops significantly. Thus, the addition of one exceptional trace results in a less comprehensible WF-net and reduces the precision of the resulting WF-net.
When analysing the two models we observe that they share some equal places, e.g. both models have a place p  Figure 6a, are not present in Figure 6b. These are "replaced" by the less desirable places containing self-loops in Figure 6b. This is caused by the fact that L ′ 1 contains all traces present in L 1 , combined with the additional constraints depicted in Table 3.
For place p ({a,f },{b,c}) in Figure 6a we define a corresponding tuple r = (m, x, y) with x(a) = 1, x(f ) = 1, y(b) = 1 and y(c) = 1 (all other variables 0). The additional constraints in Table 3 The example shows that the addition of a, b, c, d, e, g yields constraints that invalidate places p ({a,f },{b,c}) and p ({b,c},{e}) . As a result the WF-net based on event log L ′ 1 contains places with self-loops on both b and c which greatly reduces its precision and simplicity. Due to the relative infrequency of trace a, b, c, d, e, g it is arguably acceptable to trade-off the perfect replay-fitness guarantee of ILP-Based process discovery and return the WF-net of Figure 6a, given L ′ 1 . Hence, we need filtering techniques and/or trace clustering techniques in order to remove exceptional behaviour. However, apart from simple pre-processing, we aim at adapting the ILP-based process discovery approach itself to be able to cope with infrequent behaviour.
By manipulating the constraint body such that it no longer allows for all behaviour present in the input event log, we are able to deal with infrequent behaviour within event logs. Given the problems that arise because of the presence of exceptional traces, a natural next step is to leave out the constraints related to the problematic traces. An advantage of filtering the constraint body is the fact that the constraints are based on the prefix-closure of the event log. Thus, even if all traces are unique yet they do share prefixes, we are able to filter. Additionally, leaving out constraints decreases the size of the ILP's constraint body, which has a potential positive effect on the time needed to solve an ILP. We devise a graph-based filtering technique, i.e., sequence encoding filtering, that allows us to prune constraints based on trace frequency information.

Sequence Encoding Graphs
As a first step towards sequence encoding filtering we define the relationship between sequences and constraints. We do this in terms of sequence encodings. A sequence encoding is a vector-based representation of a sequence in terms of region theory, i.e., representing the sequence's corresponding constraint.
Consider the prefix-closure of π(L ′ 1 ) which generates the linear inequalities presented in Table 4. The table shows each sequence present in π(L ′ 1 ) accompanied by its φ-value and the number of occurrences of the sequence in π(L ′ 1 ), e.g. π(L ′ 1 )( a s , a ) = 56. Observe that there is a relation between the occurrence of a sequence and its corresponding postfixes, i.e. after the 56 times that sequence a s , a occurred, a s , a, b occurred 22 times, a s , a, c occurred 12 times and a s , a, d occurred 22 times (note: 56 = 22 + 12 + 22). Due to coupling of Table 4: Schematic overview of sequence encodings based on π(L ′ 1 ). 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 sequences to constraints, i.e. by means of sequence encoding, we can now apply the aforementioned reasoning to constraints as well. The frequencies in π(L ′ 1 ) allow us to decide whether the presence of a certain constraint is in line with predominant behaviour in the event log. For example, in Table 4, φ( a s , a, b, c ) seems to relate to infrequent behaviour as it appears only once.
To apply filtering, we construct a weighted directed graph in which each sequence encoding acts as a vertex. We connect two vertices by means of an arc if the source constraint corresponds to a sequence that is a prefix of a sequence corresponding to the target constraint, i.e., we connect φ( a s , a ) to φ( a s , a, b )  as a s , a is a prefix of a s , a, b . Arc weight is based on trace frequency in the input event log.
Definition 9 (Sequence encoding graph). Given event log L over set of activities A L . A sequence encoding graph is a directed graph G = (V, E, ψ) where and ψ : E → N where: Consider the sequence encoding graph in Figure 7, based on π(L ′ 1 ), as an example. By definition, ([], ⊥) is the root node of the graph and connects to all one-sized sequences. Within the graph we observe the relation among different constraints, combined with their absolute frequencies based on L ′ 1

Filtering
Given a sequence encoding graph we are able to filter out constraints. In Algorithm 2 we devise a simple breadth-first traversal algorithm, i.e. Sequence Encoding Filtering -Breadth First Search (SEF-BFS), that traverses the sequence encoding graph and concurrently constructs a set of ILP constraints. The algorithm needs a function as an input that is able to determine, given a vertex in the sequence encoding graph, what portion of adjacent vertices remains in the graph and which are removed.
Definition 10 (Sequence encoding filter). Given event log L over set of activities A L and a corresponding sequence encoding graph G = (V, E, ψ). A sequence encoding filter is a function κ : V → P(V ). ( Figure 7: An example sequence encoding graph G ′ 1 , based on example event log L ′ 1 .

Algorithm 2: SEF-BFS
Note that κ is an abstract function and might be parametrized As an example consider κ α max which we define as: Other instantiations of κ are possible as well and hence κ is a parameter of the general approach. It is however desirable that it only considers vertices reached by v by means of an arc. Given an instantiation of κ, it is straightforward to construct a filtering algorithm based on breadthfirst graph traversal, i.e. SEF-BFS.
The algorithm inherits its worst-case complexity of breadth first search, multiplied by the worst-case complexity of κ. Thus, in case κ's worst-case complexity is O(1) then we have O(|V | + |E|) for the SEF-BFS-algorithm. It is trivial to prove, by means of induction on the length of a sequence encoding's corresponding sequence, that a sequence encoding graph is acyclic. Hence, termination is guaranteed.
As an example of executing the SEF-BFS algorithm, reconsider  ([a s , a, b], c). Note that the whole path of vertices from ([a s , a, b], c) to ([a s , a, b, c, d, e, g], a f ) is never analysed and is stripped from the constraint body.
When applying ILP-based process discovery based on event log L ′ 1 with sequence encoding filtering and κ 0.75 max , we obtain the WF-net depicted in Figure 6a.
As explained, the filter leaves out all constraints related to vertices on the path from ([a s , a, b], c) to ([a s , a, b, c, d, e, g], a f ). Hence, we find a similar model to the model found on event log L 1 and are able to filter out infrequent exceptional behaviour.

Evaluation
Algorithm 1 and Algorithm 2 (sequence encoding filtering) are implemented in the HybridILPMiner (http://svn.win.tue.nl/repos/prom/Packages/HybridILPMiner/ package within the ProM framework [38] (http://www.promtools.org) and RapidProM framework [4]. 23 Using this implementation we validated the approach. In an artificial setting we evaluated the quality of models discovered and the efficiency of applying sequence encoding filtering. We also compare sequence encoding to the IMi [26] algorithm and automaton-based filtering [20]. Finally, we assess the performance of sequence encoding filtering on real event data [27,30].

Model Quality
The event logs used in the empirical evaluation of model quality are artificially generated event logs and originate from a study related to the impact of exceptional behaviour to rule-based approaches in process discovery [31]. Three event logs where generated out of three different process models, i.e. the ground truth event logs. These event logs do not consist of any exceptional behaviour, i.e. every trace fits the originating model. The ground truth event logs are called a12f0n00, a22f0n00 and a32f0n00. The two digits behind the a character indicate the number of activities present in the event log, i.e. a12f0n00 contains 12 different activities. From each ground truth event log, by means of trace manipulation, four other event logs are created that do contain exceptional behaviour. Manipulation concerns tail/head of trace removal, random part of the trace body removal and interchanging two randomly chosen events [31]. The percentages of trace manipulation are 5%, 10%, 20% and 50%. The manipulation percentage is incorporated in the last two digits of the event log's name, i.e. the 5% manipulation version of the a22f0n00 event log is called a22f0n05.
The existence of ground truth event logs, free of exceptional behaviour, is of utmost importance for evaluation. We need to be able to distinguish normal from exceptional behaviour in an unambiguous manner. Within evaluation, these event logs, combined with the quality dimension precision, allow us to judge how well a technique is able to filter out exceptional behaviour. Recall that precision is defined as the number of traces producible by the process model that are also present in the event log. Thus if all traces producible by a process model are present in an event log, precision is maximal, i.e. the precision value is 1. If the model allows for traces that are not present in the event log, precision is lower than 1.
If exceptional behaviour is present in an event log, the conventional ILPbased process discovery algorithm produces a WF-net that allows for all exceptional behaviour. As a result, the algorithm is unable to find any meaningful patterns within the event log. This typically leads to places with a lot of selfloops. The acceptance of exceptional behaviour by the WF-net, combined with the inability to find meaningful patterns yields a low level of precision, when using the ground truth log as a basis for precision computation. On the other hand, if we discover models using an algorithm that is more able to handle the presence of exceptional behaviour, we expect the algorithm to allow for less exceptional behaviour and find more meaningful patterns. Thus, w.r.t. the ground truth model, we expect higher precision values.
To evaluate the sequence encoding filtering approach, we have applied the ILP-based process discovery algorithm with sequence encoding filtering using κ α max and α = 0, 0.05, 0.1, ..., 0.95, 1. Moreover, we performed similar experiments for the IMi [26] 4 and the automaton based event log filter of [20] 5 . After applying the automaton based filter we applied ILP-based process discovery as a process discovery algorithm. We measured precision [33] and replay-fitness [3] based on the ground truth event logs.
In Figure 8 we present the replay-fitness results of the experiments with the a12f0nXX event logs. In the charts we plot replay-fitness against the noise level and filter threshold. We additionally use a colour scheme to highlight the differences in value. Sequence Encoding Filtering has low replay-fitness values for all event logs when using a filter threshold of 0. The replay-fitness value of the models found quickly rises to 1 and remains 1 for all filter threshold above 0.2. In case of IMi, for a filter value of 1.0 (comparable to 0.0 for sequence encoding) we observe some values of 1 for replay-fitness. Non-perfect replayfitness seems to be more local, concentrated around noise levels 5% and 10% with corresponding threshold levels in-between 0.4 and 0.8. Finally, automaton-based   filtering rapidly loses perfect replay-fitness when the filter threshold exceeds 0.2. Only for a noise-level of 0 it seems to retain high replay-values. Upon inspection it turns out the filter returns empty event logs for the corresponding threshold and noise levels.
In Figure 9 we present the precision results of the experiments with the a12f0nXX event logs. For sequence encoding the chart shows expected behaviour, i.e. with high noise levels and high filter thresholds precision is low. There is however an unexpected drop in precision for noise-level 0 with a filter threshold around 0.2. The IMi filter behaves a bit more unexpected since the drop in precision seems mainly depending on the noise level rather than the filter setting. We expect the precision to be higher in case a filter threshold of 1.0 is chosen. There is only a slight increase for the 50% noise log when comparing a filter threshold of 0 to a filter threshold of 1. Finally, precision of the automaton filter behaves as expected, i.e., precision rapidly increases together with an increase in the filter threshold.
The replay-fitness results of the experiments with the a22f0nXX event logs are presented in Figure 10. The charts very similar behaviour to the results reported for the a12f0nXX event logs. The sequence encoding filter in Figure 10a has a replay-fitness value of around 0.6 when applying it as rigorous as possible, i.e. using α = 0. This implies that the filter even removes behaviour that is present in the ground-truth event log. For increasing filter thresholds the replay-  fitness value reaches a value of 1 rapidly, i.e., the model is able to reproduce all traces in the event log. For IMi (Figure 10b) we observe similar behaviour (note that the filter threshold works inverted w.r.t. sequence encoding filtering, i.e. a value of 1 implies most rigorous filtering). However, replay-fitness drops a little earlier compared to sequence encoding filtering. Finally, automaton based filtering, depicted in Figure 10b, rapidly drops to 0. Again this is due to the fact that the filter tends to return empty logs for high threshold values. Hence, the filter seems to be very sensitive around a threshold value in-between 0 and 0.2. The precision results of the experiments with the a22f0nXX event logs are presented in Figure 11. For both the sequence encoding ( Figure 11a) and IMi (Figure 11b) we observe a precision value of around 0.6 based on the event logs without any noise. This is due to the fact that the originating model contains a loop which leads to imprecision. We observe that both sequence encoding filtering as well as IMi follow the same pattern in terms of precision. However, the drop in precision of sequence encoding filtering is more smooth than the drop in precision of IMi, i.e. there exist some spikes within the graph. Hence, the applying filtering within IMi seems to be less deterministic. Finally, the precision results for the automaton based filter are as expected. With a low threshold value we have very low precision, except when we have a 0% noise level. Towards a threshold level of 0.2 precision increases after which it maximizes out to a value of 1. This is in line with the replay-fitness measurements.
In Figure 12 we present the replay-fitness results of the experiments with the a32f0nXX event logs. Due to excessive computation time the automaton based filter [20] is left out of the analysis. We observe that sequence encoding filtering behaves similar to the experiments performed with the a12f0nXX and a22f0nXX event logs. The replay-fitness again quickly increase to 1 for increasing filter threshold values. We observe that IMi seems to filter out more behaviour related to the underlying system model when the filter threshold increases.
In Figure 13 we present the precision results of the experiments with the a32f0nXX event logs. Observe that, due to loop structures, the precision of a model that equals the originating model is only roughly 0.6. Sequence encoding filtering shows a smooth decrease in precision when both noise and filter-  thresholds are increased, which is as expected. With low noise levels and a low threshold sequence encoding seems to be able to filter out the infrequent behaviour, however, if there is too much noise and too little is removed we start finding WF-nets with self-loop places. IMi seems to result in models with a sightly higher precision compared to sequence encoding filtering. As is the case in the a22f0nXX event logs, we observe spike behaviour in precision of IMi based models hinting at non-deterministic behaviour of the filter. Based on our experiments, we conclude that the sequence encoding filter and IMi give comparable results. However, the sequence encoding filter provides more expected results, i.e. IMi behaves somewhat deterministic. The automaton based filter does provide good results, however, sensibility of the filter threshold is much higher compared to sequence encoding filtering and IMi.

Computation time
The core of sequence encoding filtering is leaving out constraints that are likely to refer to exceptional behaviour. Thus, we reduce the size of the core ILP constraint body. Hence, we expect a decrease in computation time when applying rigorous filtering, i.e. κ α max with α towards 0. Using RapidMiner we repeated similar experiments to the experiments performed for model quality, and measured cpu-execution time for the three techniques. However, we only use threshold values 0, 0.25, 0.75 and 1.
In Figure 14 we present the average cpu-execution time, based on 50 experiment repetitions, needed to obtain a process model from an event log. For each level of noise we depict computation time for different filter threshold settings, 0% noise is depicted in the left-most figure, 50% in the right-most figure. For IMi, we measured the inductive miner algorithm with integrated filtering. For sequence encoding and automaton filtering, we measure the time needed to filter, discover a causal graph and solve underlying ILP problems. As we observe in Figure 14, IMi is fastest in all cases except for a threshold of 0 where sequence encoding tends to outperform IMi and automaton-based filtering. We observe that in all cases computation time increases when the amount of noise increases within the event logs. For sequence encoding filtering we observe that  (Figure 15a and Figure 15b) and the Sepsis log [30] (Figure 15c and Figure 15d).
lower threshold values lead to faster computation times. This is as expected since a low threshold value removes more constraints from the ILP constraint body than a high threshold value. The automaton-based filter is slowest in all cases. The amount of noise seems to have little impact on the computation time of the automaton-based filter, it seems to be predominantly depending on the filter threshold. From Figure 14 we conclude that IMi in general out-performs sequence encoding in terms of computation time. However, sequence encoding, in turn out-performs automaton-based filtering, specifically for higher threshold settings.

Application to Real-Life Event Logs
We additionally tested the applicability of sequence encoding filtering using real-life event logs. We used two event logs, one related to the administration process of handling road fines [27] and one regarding the patient treatment of patients suspected to have sepsis [30]. The results are presented in Figure 15. In case of the Road Fines event log (figures on the left-hand side of Figure 15) we observe that replay-fitness is around 0.46 whereas precision is around 0.4 for α-values from 0 to 0.5. The number of arcs for the models of these α-values remains constant (as well as the number of places and the number of transitions) suggesting that the models found are the same. After this the replay-fitness increases further to around 0.8 and reaches 1 for an α-level of 1. Interestingly, precision shows a little increase around α-levels between 0.5 and 0.75 after which it drops slightly below its initial value. In this case, an α-level in-between 0.5 and 0.75 seems most appropriate in terms of replay-fitness, precision and simplicity.
In case of the Sepsis event log (figures on the left-hand side of Figure 15) we observe that replay-fitness and precision are roughly behaving as each-other's inverse, i.e. replay-fitness increases whereas precision decreases for increasing αlevels. We moreover observe that the number of arcs within the process models is steadily increasing for increasing α-levels. In this case, an α-level in-between 0.1 and 0.4 seems most appropriate in terms of replay-fitness, precision and simplicity.
Finally, for each experiment we measured the associated computation time of solving all ILP problems. In case of the Road Fines event log, solving all ILP problems takes roughly 5 seconds. In case of the Sepsis event log, obtaining a model ILP problems takes less than 1 second.

Conclusion
The work presented in this paper is motivated by the observation that existing region-based process discovery techniques are useful, as they are able to find nonlocal complex control-flow patterns. However, the techniques do not provide any structural guarantees w.r.t. the resulting process models, and, they are unable to cope with infrequent, exceptional behaviour in event logs.
The approach presented in this paper extends techniques presented in [44][45][46]. We have proven that our approach is able to discover relaxed sound workflow nets, i.e. we are now able to guarantee structural properties of the resulting process model. Additionally, we presented the sequence encoding filtering technique which enables us to apply filtering exceptional behaviour within the ILP-based process discovery algorithm. Our experiments confirm that the technique enables us to find Petri net structures in data consisting of exceptional behaviour, using ILP-based process discovery as an underlying technique. Sequence encoding filtering proves to be comparable to the IMi [26] approach, i.e. an integrated filter of the Inductive Miner [25], in terms of filtering behaviour. It is considerably faster than the general purpose filtering approach of [20] and less sensible to variations in the filter threshold.
Future Work An interesting direction for future work concerns combining ILP-based process discovery techniques with other process discovery techniques. The Inductive Miner discovers sound workflow nets, however, these models are lack the ability to express complex control flow patterns such as a milestone pattern. Some of these patterns are however reconstructible using ILP-based process discovery. Hence, it is interesting to combine these approaches with possibly synergetic effects w.r.t. the process mining quality dimensions.
Another interesting approach is the development of more advanced general purpose filtering techniques. Most discovery algorithms assume the input event logs to be free of noise, infrequent and/or exceptional behaviour. Real-life event logs however typically contain a lot of such behaviour. Surprisingly, little research is performed towards filtering techniques that greatly enhance process discovery results, independent of the discovery algorithm used.