Discovering more precise process models from event logs by filtering out chaotic activities
Abstract
Process Discovery is concerned with the automatic generation of a process model that describes a business process from execution data of that business process. Real life event logs can contain chaotic activities. These activities are independent of the state of the process and can, therefore, happen at rather arbitrary points in time. We show that the presence of such chaotic activities in an event log heavily impacts the quality of the process models that can be discovered with process discovery techniques. The current modus operandi for filtering activities from event logs is to simply filter out infrequent activities. We show that frequencybased filtering of activities does not solve the problems that are caused by chaotic activities. Moreover, we propose a novel technique to filter out chaotic activities from event logs. We evaluate this technique on a collection of seventeen reallife event logs that originate from both the business process management domain and the smart home environment domain. As demonstrated, the developed activity filtering methods enable the discovery of process models that are more behaviorally specific compared to process models that are discovered using standard frequencybased filtering.
Keywords
Information systems Business process intelligence Process mining Knowledge discovery1 Introduction
Process Mining (van der Aalst 2016) is a scientific discipline that bridges the gap between process analytics and data analysis and focuses on the analysis of event data logged during the execution of a business process. Events contain information on what was done, by whom, for whom, where, when, etc. Such event data is often readily available from information systems such as Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), or Business Process Management (BPM) systems. Process discovery, which plays a prominent role in process mining, is the task of automatically generating a process model that accurately describes a business process based on such event data. Many process discovery techniques have been developed over the last decade (e.g. Buijs et al. 2012; Buijs et al. 2009; Günther and van der Aalst 2007; Herbst 2000; Leemans et al. 2013b; Solé and Carmona 2013; van Zelst et al. 2015), producing process models in various forms, such as Petri nets (Murata 1989), process trees (Buijs et al. 2012), and Business Process Model and Notation (BPMN) models (Object Management Group 2011).
In this paper, we show that existing approaches do not solve the problem of chaotic activities and we present a technique to handle the issue. This paper is structured as follows: in Section 2 we introduce basic concepts used throughout the paper. In Section 3 we propose an approach to filter out chaotic activities. In Section 4 we evaluate our technique using synthetic data where we artificially insert chaotic activities and check whether the filtering techniques can filter out the inserted chaotic activities. Additionally, Section 4 proposes a methodology to evaluate activity filtering techniques in a reallife setting where there is no ground truth knowledge on which activities are truly chaotic, and motivates this methodology by showing that its results are consistent with the synthetic evaluation on the synthetic datasets. In Section 5 the results on a collection of seventeen reallife event logs are discussed. In Section 6 we discuss how the activity filtering techniques can be used in a togglebased approach for humanintheloop process discovery. In Section 7 we discuss related techniques in the domains of process discovery and the filtering of event logs. Section 8 concludes this paper and discusses several directions for future work.
2 Preliminaries
In this section, we introduce concepts and notation throughout this paper.
X = {a_{1}, a_{2},…, a_{n}} denotes a finite set. \(\mathcal {P}(X)\) denotes the power set of X, i.e., the set of all possible subsets of X. X∖Y denotes the set of elements that are in set X but not in set Y, e.g., {a, b, c}∖{a, c}={b}. X^{∗} denotes the set of all sequences over a set X and σ = 〈a_{1}, a_{2},…, a_{n}〉 denotes a sequence of length n, with σ(i) = a_{i} and 〈〉 the empty sequence. \(\sigma {\upharpoonright }_{X}\) is the projection of σ on X, e.g. \(\langle a,b,c,a,b,c\rangle {\upharpoonright }_{\{a,c\}}=\langle a,c,a,c \rangle \). σ_{1} ⋅ σ_{2} denotes the concatenation of sequences σ_{1} and σ_{2}, e.g., 〈a, b, c〉⋅〈d, e〉 = 〈a, b, c, d, e〉.
A multiset (or bag) over X is a function \(B:X{\rightarrow }\mathbb {N}\) which we write as \([a_{1}^{w_{1}},a_{2}^{w_{2}},\dots ,a_{n}^{w_{n}}]\), where for 1≤ i≤ n we have a_{i}∈X and \(w_{i}{\in }\mathbb {N}^{+}\). The set of all multisets over X is denoted \(\mathcal {B}(X)\).
In the context of process mining, we assume the set of all process activities Σ to be given. Event logs consist of sequences of events where each event represents a process activity.
Definition 1 (Event, Trace, and Event Log)
An event e in an event log is the occurrence of an activity e∈Σ. We call a (nonempty) sequence of events σ∈Σ^{+} a trace. An event log \(L{\in }\mathcal {B}({{\Sigma }^{+}})\) is a multiset of traces.
L=[〈a, b, c〉^{2},〈b, a, c〉^{3}] is an example event log over process activities Σ = {a, b, c}, consisting of 2 occurrences of trace 〈a, b, c〉 and three occurrences of trace 〈b, a, c〉. Activities(L) denotes the set of process activities Σ that occur in L, e.g., Activities(L) = {a, b, c}. #(a, L) denotes the number of occurrences of activity a in log L, e.g., #(a, L) = 5.
A process model notation that is frequently used in the area of process mining is the Petri net. Petri nets can be automatically transformed into process model notations that are commonly used in business environments, such as BPMN and BPEL (Lohmann et al. 2009). A Petri net is a directed bipartite graph consisting of places (depicted as circles) and transitions (depicted as rectangles), connected by arcs. A transition describes an activity, while places represent the enabling conditions of transitions. Labels of transitions indicate the type of activity that they represent. Unlabeled transitions (τtransitions) represent invisible transitions (depicted as gray rectangles), which are only used for routing purposes and are not recorded in the event log.
Definition 2 (Labeled Petri net)
A labeled Petri net N = 〈P, T, F, ℓ〉 is a tuple where P is a finite set of places, T is a finite set of transitions such that P∩T = ∅, F⊆(P×T)∪(T×P) is a set of directed arcs, called the flow relation, and \(\ell {:}T{\nrightarrow }{\Sigma }\) is a partial labeling function that assigns a label to a transition, or leaves it unlabeled (the τtransitions).
We write • n and n • for the input and output nodes of n ∈ P ∪ T (according to F). A state of a Petri net is defined by its marking\(m{\in } \mathcal {B}(P)\) being a multiset of places. A marking is graphically denoted by putting m(p) tokens on each place p∈P. State changes occur through transition firings. A transition t is enabled (can fire) in a given marking m if each input place p∈•t contains at least one token. Once t fires, one token is removed from each input place p∈•t and one token is added to each output place p^{′}∈t •, leading to a new marking m^{′} = m−• t + t •.
A firing of a transition t leading from marking m to marking m^{′} is denoted as step \(m {\overset {t}{\longrightarrow }} m^{\prime }\). Steps are lifted to sequences of firing enabled transitions, written \(m {\overset {\gamma }{\longrightarrow }} m^{\prime }\) and γ∈T^{∗} is a firing sequence.
Defining an initial and a set of final markings allows defining the language accepted by a Petri net as a set of finite sequences of activities.
Definition 3 (Accepting Petri Net)
An accepting Petri net is a triplet APN = (N, m_{0}, MF), where N is a labeled Petri net, \(m_{0}{\in }\mathcal {B}(P)\) is its initial marking, and \(\mathit {MF}{\subseteq }\mathcal {B}(P)\) is its set of possible final markings. A sequence σ∈Σ^{∗} is a trace of an accepting Petri net APN if there exists a firing sequence \(m_{0}{\overset {\gamma }{\longrightarrow }}m_{f}\) such that m_{f}∈MF, γ∈T^{∗} and ℓ(γ) = σ.
In the Petri nets that are shown in this paper, places that belong to the initial marking contain a token and places belonging to a final marking contain a bottom right label f_{i} with i a final marking identifier or are simply marked as Open image in new window in case of a single final marking.
The language\(\mathfrak {L}(\mathit {APN})\) is the set of all its traces, i.e., \(\mathfrak {L}(\mathit {APN})=\{l(\gamma )  \gamma {\in }T^{*}{\land }\exists _{m_{f}{\in }MF}m_{0}{\overset {\gamma }{\longrightarrow }}m_{f}\}\), which can be of infinite size when APN contains loops. While we define the language for accepting Petri nets, in theory, \(\mathfrak {L}(M)\) can be defined for any process model M with formal semantics. We denote the universe of process models as \(\mathcal {M}\). For each \(M{\in }\mathcal {M}\), \(\mathfrak {L}(M)\subseteq {\Sigma }^{+}\) is defined.
A process discovery method is a function \(\mathit {PD}:\mathcal {B}({{\Sigma }^{+}})\rightarrow \mathcal {M}\) that provides a process model for a given event log. The goal is to discover a process model that is a good description of the process from which the event log was obtained, i.e., it should allow for all the behavior that was observed in the event log (called fitness) while it should not allow for too much behavior that was not seen in the event log (called precision). For an event log L, \(\tilde {L}{=}\{\sigma {\in }{\Sigma }^{+}L(\sigma ){>}0\}\) is the trace set of L. For example, for log L=[〈a, b, c〉^{2},〈b, a, c〉^{3}], \(\tilde {L}{=}\{\langle a,b,c\rangle \langle b,a,c\rangle \}\). For an event log L and a process model M, we say that L is fitting on M if \(\tilde {L}{\subseteq }\mathfrak {L}(M)\). Precision is related to the behavior that is allowed by a model M that was not observed in the event log L, i.e., \(\mathfrak {L}(M){\setminus }\tilde {L}\).
3 Informationtheoretic approaches to activity filtering
We consider a chaotic activity to be an activity for which the probability to occur does not change (or changes little) as an effect of occurrences of other activities and moreover, the occurrence of a chaotic activity does not change (or changes little) the probabilities to occur for other activities, i.e., they are not part of the process flow. More formally, consider a business process that is described by some process model M, with Σ some set of nonchaotic business activities that are modeled in M. Now consider a set of chaotic activities Σ^{c} with Σ ∩Σ^{c} = ∅, i.e., the probability of occurrence of the activities in Σ^{c} do not impact nor are impacted by the occurrence of other activities. Let M^{′} be the process model that consists of M and additionally contains the chaotic activities Σ^{c} without constraints. If M is modeled as a Petri net, then M^{′} contains one additional labeled transition t for each activity in Σ^{c}, with • t = t• = ∅. For example, let M be Fig. 1b and Σ^{c} = {X}, then Fig. 3 shows M^{′}. Let L^{′} be an event log that is obtained by executing the business process while also observing the activities in Σ^{c}, i.e., playing out model M^{′}. Process discovery algorithms generally make some assumption about the degree of completeness of the event log. For example, both the Inductive Miner (Leemans et al. 2013a) and the αminer (van der Aalst et al. 2004) assume the log to be directlyfollowscomplete, i.e., each pair of activities in the process that can possibly directlyfollow each other is assumed to at least once directly follow each other in the log. When chaotic activities are present in the log, it becomes very hard for such completeness assumptions to be met, as many observed traces from the process are needed to observe all possible occurrences of an activity that is unconstrained in when it can occur.
The directlyfollows ratio, denoted dfr(a, b, L), represents the ratio of the events of activity a that are directly followed by an event of activity b in event log L, i.e., \(\mathit {dfr}(a,b,L){=}\frac {\#(\langle a,b\rangle ,L)}{\#(a,L)}\).
Likewise, the directlyprecedes ratio, denoted dpr(a, b, L), represents the ratio of the events of activity a that are directly preceded by an event of activity b in event log L, i.e., \(\mathit {dpr}(a,b,L){=}\frac {\#(\langle b,a\rangle ,L)}{\#(a,L)}\).
L^{⌋} contains the traces of event log L appended with an artificial end event that we represent with ⌋. For each σ = 〈e_{1}, e_{2},…, e_{n}〉 in log L, log L^{⌋} contains a trace σ^{⌋} = 〈e_{1}, e_{2},…, e_{n},⌋〉. Likewise, L^{⌊} contains the traces of event log L prepended with an artificial start event⌊, i.e., for each σ = 〈e_{1}, e_{2},…, e_{n}〉 in log L, log L^{⌊} contains a trace σ^{⌊} = 〈⌊, e_{1},…, e_{n}〉. The artificial start and end events allow us to define the ratio of start events of an activity, e.g., dfr(a,⌋, L^{⌋}) and dpr(a,⌊, L^{⌊}) represent the ratio of events of activity a that respectively occur at the end of a trace and at the beginning of a trace.
Assuming an arbitrary but consistent order over the set of process activities Activities(L), dfr(a, L) represents the vector of values dfr(a, b, L^{⌋}) for all b∈Activities(L) ∪{⌋} and dpr(a, L) represents the vector of values dpr(a, b, L^{⌊}) for all b ∈Activities(L) ∪{⌊}. From a probabilistic point of view, we can regard dfr(a, L) and dpr(a, L) as the empirical estimates of the categorical distributions over respectively the activities directly prior to a and directly after a, where the empirical estimates are based on #(a, L) trials.
3.1 Direct entropybased activity filtering
We define the entropy of an activity in an event log L based on its directlyfollows ratio vector and the directlyprecedes ratio vector by using the usual definition of function for the categorical probability distribution: \(H(X)={\sum }_{x{\in }X}x\log _{2}(x)\). We define the entropy of activity a ∈Activities(L) in log L as: H(a, L) = H(dfr(a, L)) + H(dpr(a, L)). In case there are zero probability values in the directly follows or directly precedes vectors, i.e., 0 ∈dfr(a, L) ∨ 0 ∈dpr(a, L), then the value of the corresponding summand 0log 2(0) is taken as 0, which is consistent with the limit \(\lim \limits _{p\to 0+}p\log _{2}(p)= 0\).
Algorithm 1 describes a greedy approach to iteratively filter the most randomly behaving (chaotic) activity from the event log. The algorithm takes an event log L as input and produces a list of event logs, such that the first element of the list contains a version of L with one activity filtered out, and each following element of the list has one additional activity filtered out compared to the previous element.
In the example event log L, Algorithm 1 starts by filtering out activity x, followed by activity b or c. The algorithm stops when there are two activities left in the event log. The reason not to filter any more activities past this point is closely related to the aim of process discovery: uncovering relations between activities. From an event log with less than two activities no relations between activities can be discovered.
3.2 The entropy of infrequent activities and laplace smoothing
We defined entropy of the activities in an event log L is based on the directlyfollows ratios dfr and the directlyprecedes ratios dpr of the activities in L. The empirical estimates of the categorical distributions dfr(a, L) and dpr(a, L) become unreliable for small values of #(a, L). In the extreme case, when #(a, L)= 1, dfr(a, L) assigns an estimate of 1 to the activity that the single activity a in L happens to be preceded by and contains a probability of 0 for the other activities. Likewise, when #(a, L)= 1, dpr(a, L) assigns value 1 to one activity and value 0 to all others. Therefore, #(a, L)= 1 leads to H(dfr(a, L))= 0 and H(dfr(a, L))= 0. This shows an undesirable consequence of Algorithm 1, infrequent activities are unlikely to be filtered out. In the extreme case, the activities that occur only once, which are the last in line activities to be filtered out. This effect is undesired, as very infrequent activities should not be the primary focus of the process model discovered from an event log.
We aim to mitigate this effect by applying Laplace smoothing (Zhai and Lafferty 2004) to the empirical estimate of the categorical distributions over the preceding and succeeding activities. Therefore, we define a smoothed version of the directlyfollows and directlyprecedes ratios, \(\mathit {dfr}^{s}(a,b,L){=}\frac {\alpha ~+~\#(\langle a,b\rangle ,L)}{\alpha ({\mathit {Activities}(L)+ 1})+\#(a,L)}\), with smoothing parameter \(\alpha {\in }\mathbb {R}_{\ge 0}\). The value of dfr^{s}(a, b, L) will always be between the empirical estimate dfr(a, b, L) and the uniform probability \(\frac {1}{\mathit {Activities}(L)+ 1}\), depending on the value α. Similar to dfr and dpr, dfr^{s}(a, L) represents the vector of values dfr^{s}(a, b, L^{⌋}) for all b∈Activities(L) ∪{⌋} and dpr^{s}(a, L) represents the vector of values dpr^{s}(a, b, L^{⌊}) for all b ∈Activities(L) ∪{⌊}. From a Bayesian point of view, Laplace smoothing corresponds to the expected value of the posterior distribution that consists of the categorical distribution given by dfr(a, L) and a Dirichlet distributed prior that assigns equal probability to each of the possible number of next activities Activities(L) + 1 (including ⌋). Parameter α indicates the weight that is assigned to the prior belief w.r.t. the evidence that is found in the data. An alternative definition of the entropy of log L, based on the smoothed distributions over the preceding and succeeding activities, is as follows: H^{s}(a, L) = H(dfr^{s}(a, L)) + H(dpr^{s}(a, L)). The smoothed direct entropybased activity filter is similar to Algorithm 1, where function H in line 5 of the algorithm is replaced by H^{s}. Function H(a, L) starts from the assumption that an activity is nonchaotic unless we see sufficient evidence in the data for its chaoticness, function H^{s}(a, L) in contrast starts from the assumption that is is chaotic, unless we see evidence sufficient evidence in the data for its nonchaoticness.
Categorical distribution dfr(a, L) consists of Activities(L) + 1, therefore, the maximum entropy of an activity decreases as more activities get filtered out of the event log. The keep the values of H^{s}(a, L) comparable between iterations of the filtering algorithm, we propose to gradually increase the weight of the prior by setting weight parameter α to \(\frac {1}{\mathit {Activities(L)}}\).
3.3 Indirect entropybased activity filtering
An alternative approach to the method proposed in Algorithm 1 is to filter out activities such that the other activities in the log become less chaotic. We define the total entropy of an event log L as the sum of the entropies of the activities in the log, i.e., \(H(L)={\sum }_{a\in \mathit {Activities}(L)}H(a,L)\).
3.4 An indirect entropybased activity filter with laplace smoothing
The algorithm for indirect entropybased activity filtering with Laplace smoothing is identical to Algorithm 2, in which function H in line 5 is replaced by function H^{s}.
4 Evaluation using synthetic data
 Frequent randomlypositioned activities

the number of events inserted for all k randomlypositioned activities is maxa∈Activities(L)#(a, L).
 Infrequent randomlypositioned activities

the number of events inserted for all k randomlypositioned activities is mina∈Activities(L)#(a, L).
 Uniform randomlypositioned activities

for each of the k inserted randomlypositioned activities the frequency is chosen at randomly from a uniform probability distribution with minimum value min a∈Activities(L)#(a, L) and maximum value maxa∈Activities(L)#(a, L).
In step (3) we filter out all the inserted randomlypositioned activities from the event log, by removing activities onebyone using the activity filtering approaches, until all k artificially inserted activities have been removed again. We then count how many of the activities that were originally in the process model we also removed during this procedure (step (4)). Using this approach, we compare the direct entropybased activity filtering approach (with and without Laplace smoothing) with the indirect entropybased activity filtering approach (with and without Laplace smoothing). Furthermore, we compare those activity filtering techniques with activity filtering techniques that are based on the frequency of activities, such as filtering out the activities starting from the least frequent activity (leastfrequentfirst), or starting from the most frequent activity (mostfrequentfirst). Frequencybased activity filtering techniques are the current default approach for filtering activities from event logs.
The original process models A12 and A22 can be rediscovered from generated event logs L_{A12} and L_{A22} with the Inductive Miner (Leemans et al. 2013a) when there are no added randomlypositioned activities. Figure 5b shows the process model discovered by the Inductive Miner (Leemans et al. 2013a) after inserting one uniform randomlypositioned activity, activity X, into L_{A12}. The insertion of activity X causes the Inductive Miner to create a model that overgeneralizes the behavior of the event log, as indicated by many silent transitions in the process model that allow activities to be skipped. Adding a second uniform randomlypositioned activity Y to L_{A12} results in the Inductive Miner discovering a process model (shown in Fig. 5c) that overgeneralizes even further, allowing for almost all sequences over the set of activities. Figure 6b shows the process model discovered by the Inductive Miner after inserting two uniform randomlypaced activities (X and Y ) into L_{A22}. The addition of X and Y has the effect that activity C is no longer positioned at the correct place in the process model, but it is instead put in parallel to the whole process, making the process model overly general, as it wrongly allows for activity C to occur before A and B, or after D, E, F, and G. Figures 5b, c and 6b further motivate the need for filtering out chaotic activities.
Frequent randomlypositioned activities will impact the quality of process models discovered with process discovery to a higher degree than infrequent randomlypositioned activities. Each randomlypositioned activity that is inserted at a random position in the event log is placed inbetween two existing events in that log (or at the start or end of the trace). By inserting randomlypositioned activity X inbetween two events of activities A and C respectively, the directlyfollows relation between activities A and C gets weakened. Therefore, the impact of randomlypositioned activity X is proportional to its frequency #(X, L).
4.1 Results
The number of incorrectly filtered activities per filtering approach on L_{A12} and L_{A22} with k added Uniform (U) / Frequent (F) /Infrequent (I) chaotic activities
Approach  1  2  4  8  16  32  64  128  

U  F  I  U  F  I  U  F  I  U  F  I  U  F  I  U  F  I  U  F  I  U  F  I  
Maruster A12 (Number of inserted randomlypositioned activities \(\rightarrow \))  
Direct  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  12  4  0  12  10  1  12 
Direct (\(\alpha {=}\frac {1}{A}\))  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  4  0  6  6  2  12 
Indirect  0  0  0  0  0  0  0  0  0  0  0  1  1  0  1  1  0  1  2  0  1  3  1  6 
Indirect (\(\alpha {=}\frac {1}{A}\))  0  0  0  0  0  0  0  0  0  0  0  1  1  0  1  1  0  1  2  0  1  2  1  10 
Leastfrequentfirst  9  12  0  11  12  0  6  12  0  11  12  0  11  12  0  12  12  0  12  12  0  12  12  0 
Mostfrequentfirst  11  0  12  3  0  12  7  0  12  10  0  12  12  0  12  12  0  12  12  0  12  12  0  12 
Maruster A22 (Number of inserted randomlypositioned activities \(\rightarrow \))  
Direct  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  5 
Direct (\(\alpha {=}\frac {1}{A}\))  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  5 
Indirect  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  1  0  1  1  0  1  1  0  1 
Indirect (\(\alpha {=}\frac {1}{A}\))  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  1  0  1  0  0  1  1  0  1 
Leastfrequentfirst  16  22  0  17  22  0  6  22  0  21  22  0  19  22  0  22  22  0  22  22  0  22  22  0 
Mostfrequentfirst  7  0  22  8  0  22  19  0  22  17  0  22  19  0  22  22  0  22  22  0  22  22  0  22 
4.2 An evaluation methodology for event data without ground truth information
In a reallife data evaluation that we perform in the following section, there is no ground truth knowledge on which activities of the process are chaotic. This motivates a more indirect evaluation in which we evaluate the quality of the process model discovered from the event log after filtering out activities with the proposed activity filtering techniques. In this section we propose a methodology for evaluation of activity filtering techniques by assessing the quality of discovered process models, we apply this evaluation methodology to the Maruster A12 and Maruster A22 event logs, and we discuss the agreement between the findings of Table 1 and the quality of the discovered process models.
There are several ways to quantify the quality of a process model for an event log. Ideally, a process model M should allow for all behavior that was observed in the event log L, i.e., \(\tilde {L}\setminus \mathfrak {L}(M)\) should be as small as possible, preferably empty. The fitness quality dimension covers this. Furthermore, model M should not allow for too much additional behavior that was not seen in the event log, i.e., \(\mathfrak {L}(M)\setminus \tilde {L}\) should be as small as possible. This aspect is called precision. For each process model that we discovered, we measure fitness and precision with respect to the filtered log. Fitness is measured using the alignmentbased fitness measure (Adriansyah et al. 2011) and we measure precision using negative event precision (Vanden Broucke et al. 2013). Based on the fitness and precision results we also calculate Fscore (De Weerdt et al. 2011), i.e., the harmonic mean between fitness and precision.
Precision is likely to increase by filtering out one or more activities from an event log independently of which activities are removed from the log, as a result of two factors. First, precision measures express \(\mathfrak {L}(M)\setminus \tilde {L}\) in terms of the number of activities that are enabled at certain points in the process, w.r.t. the number of activities seen that were actually observed at these points in the process. With the log and model containing fewer activities after filtering, the number of enabled activities is likely to decrease as well. Secondly, activity filtering leads to log L^{′} that contains less behavior than original log L (i.e., \(\tilde {L^{\prime }}\) is smaller than \(\tilde {L}\)), this makes it easier for process discovery methods to discover a process model with less behavior. These two factors make precision values between event logs with different numbers of activities filtered out incomparable. The degree to which the behavior of filtered log L^{′} decreases w.r.t. an unfiltered log L depends on the activities that are filtered out: when very chaotic activities are filtered from L the behavior decreases much more than when very structured activities are filtered from L. One effect of this is that too much behavior in a process model affects the precision of that model more for the log from which the nonchaotic activities are filtered out than for the log from which the chaotic activities are filtered out.
To measure the behavior allowed by the process model independent of which activities are filtered from the event log is to determine the average number of enabled activities when replaying the traces of the log on the model. To deal with traces of the event log that do not fit the behavior of the process model, we calculate alignments (Adriansyah et al. 2011) between log and model. Alignments are a function \({\Gamma }^{m}:\mathcal {M}\times {\Sigma }^{+}\rightarrow \mathcal {B}(P)^{+}\) that map each trace from the event log to a sequence of markings 〈m_{0},…, m_{f}〉 that are reached to replay that trace on the model, with m_{0} the initial marking and m_{f}∈MF, such that for each two consecutive markings 〈m_{i}, m_{i+ 1}〉 there exists a transition t ∈ T such that m_{i+ 1} = m_{i} −•t + t •. Furthermore, alignments also provide a function \({\Gamma }^{t}:\mathcal {M}\times {\Sigma }^{+}\rightarrow T^{+}\) that provides the sequence of transitions 〈t_{0},…t_{n}〉 that matches the changes in the sequence of markings, i.e., m_{1} = m_{0} −•t_{0} + t_{0} •, etc. For each trace σ ∈Σ^{+} that fits a process model \(N\in \mathcal {M}\) the alignment l(Γ^{t}(N, σ)) = σ. For unfitting traces σ ∈Σ^{+}, the alignment is such that l(Γ^{t}(N, σ)) is as close as possible to l according to some cost function. We refer to Adriansyah et al. (2011) for a more exhaustive introduction of alignments. Let \(\overline {{\Gamma }^{t}}\) denote the sequence consisting of only the visible transitions in Γ^{t}, and let \(\overline {{\Gamma }^{m}}\) correspondingly denote the sequence of markings prior to each firing of a visible transition. Given a marking \(m\in \mathcal {B}(P)\) we define the nondeterminism of that marking to be the number of reachable visible transitions that can be fired as first next visible transition from m, i.e., \(\mathit {nondeterminism}(m)=\{a{\in }{\Sigma }m\overset {\gamma }{\longrightarrow }m_{i}\land t{\in }\gamma \land l(t)=a \land \forall _{\gamma _{i}{\in }\gamma }\gamma _{i}{\in }\mathit {dom}(l){\implies }\gamma _{i}{=}t\}\). We define the nondeterminism of a model \(N\in \mathcal {M}\) given a trace σ ∈Σ^{+} as the average nondeterminism of the markings \(\overline {{\Gamma }^{m}(N,\sigma )}\) and define the nondeterminism for a model N and a log L as the average nondeterminism over the traces of L.
5 Evaluation using real life data
An overview of the event logs used in the experiments
Name  Category  # traces  # events  # activities 

BPI’12 (Van Dongen 2012)  Business  13087  164506  23 
BPI’12 resource 10939 (Tax et al. 2016)  Business  49  1682  14 
Environmental permit (Buijs 2014)  Business  1434  8577  27 
SEPSIS (Mannhardt 2016)  Business  1050  15214  16 
Traffic Fine (De Leoni and Mannhardt 2015)  Business  150370  561470  11 
Bruno (Bruno et al. 2013)  Human behavior  57  553  14 
CHAD 1600010 (McCurdy et al. 2000)  Human behavior  26  238  10 
MIT A (Tapia et al. 2004)  Human behavior  16  2772  27 
MIT B (Tapia et al. 2004)  Human behavior  17  1962  20 
Ordonez A (Ordónez et al. 2013)  Human behavior  15  409  12 
van Kasteren (van Kasteren et al. 2008)  Human behavior  23  220  7 
Cook hh102 labour (Cook et al. 2013)  Human behavior  18  576  18 
Cook hh102 weekend (Cook et al. 2013)  Human behavior  18  210  18 
Cook hh104 labour (Cook et al. 2013)  Human behavior  43  2100  19 
Cook hh104 weekend (Cook et al. 2013)  Human behavior  18  864  19 
Cook hh110 labour (Cook et al. 2013)  Human behavior  21  695  17 
Cook hh110 weekend (Cook et al. 2013)  Human behavior  6  184  14 
For each event log, we apply seven different activity filtering techniques for comparison: 1) direct entropy filter without Laplace smoothing, 2) direct entropy filter with Laplace smoothing (\(\alpha {=}\frac {1}{\mathit {Activities(L)}}\)), 3) indirect entropy filter without Laplace smoothing, 4) indirect entropy filter with Laplace smoothing (\(\alpha {=}\frac {1}{\mathit {Activities(L)}}\)), 5) leastfrequentfirst filtering, 6) mostfrequentfirst filtering, 7) filtering the activities from the log in a random order. Recall that the activity filtering procedure stops at the point where all but two activities are filtered from the event log because process models that contain just one activity do not communicate any information regarding the relations between activities. For each event log and for each activity filtering approach we discover a process model after each filtering step (i.e., after each removal of an activity). The process discovery step is performed with two process discovery approaches: the Inductive Miner (Leemans et al. 2013a), and the Inductive Miner infrequent (20%) (Leemans et al. 2013b).
5.1 Results on business process event logs
5.2 Results on human behavior event logs
Left: the order in which activities are filtered using the direct activity filter with Laplace smoothing (\(\alpha =\frac {1}{Activities(L)}\)) on the van Kasteren log
Order  Filtered activity (indirect entropybased filter with Laplace smoothing)  Filtered activity (leastfrequentfirst filter) 

1  Use toilet  Prepare dinner 
2  Get drink  Get drink 
3  Leave house  Prepare breakfast 
4  Take shower  Take shower 
5  Go to bed  Go to bed 
6  Prepare breakfast  Leave house 
7  Prepare dinner  Use toilet 
5.3 Aggregated analysis over all event logs
We define \(\overline {W}_{i}^{x}=\frac {{W_{i}^{x}}}{17}\) as the average number of other activity filtering techniques that are outperformed by filtering technique i at the point where at least x% of activities are explained.
Kendall τ_{b} rank correlation between five activity filtering methods, mean and standard deviation over the 17 event logs
Direct  Direct (\(\alpha {=}\frac {1}{A}\))  Indirect  Indirect (\(\alpha {=}\frac {1}{A}\))  Leastfrequentfirst  

Direct  1.0  0.2956  0.0829  0.1408  0.0504 
Direct (\(\alpha {=}\frac {1}{A}\))  0.2956  1.0  0.0698  0.0536  0.1454 
Indirect  0.0829  0.0698  1.0  0.6852  − 0.0275 
Indirect (\(\alpha {=}\frac {1}{A}\))  0.1408  0.0536  0.6852  1.0  − 0.0392 
Leastfrequentfirst  0.0504  0.1454  − 0.0275  − 0.0392  1.0 
Number of event logs for which we can reject the null hypothesis that the orderings of activities returned by activity filters are uncorrelated, according to the tau test
Direct  Direct (\(\alpha {=}\frac {1}{A}\))  Indirect  Indirect (\(\alpha {=}\frac {1}{A}\))  Leastfrequentfirst  

Direct  17  5  1  2  0 
Direct (\(\alpha {=}\frac {1}{A}\))  5  17  1  1  3 
Indirect  1  1  17  17  3 
Indirect (\(\alpha {=}\frac {1}{A}\))  2  1  17  17  3 
Leastfrequentfirst  0  3  3  3  17 
6 Entropybased toggles for process discovery
In the previous section we have shown that all four configurations of the entropybased activity filtering technique lead to more deterministic process models compared to simply filtering out infrequent activities. However, the differences in determinism of the process models that are discovered after applying any of the four configurations are small and dependent on the event log to which they are applied. Furthermore, all four configurations of the activity filtering technique simply impose an ordering over the activities, but do not specify at which step the filtering should be stopped. Additionally, the proposed filtering technique ignores the semantics of activities: activities that are chaotic may still be relevant for the process. Leaving them out of the process model to discover will harm the usefulness of the discovered process model.
7 Related work
Existing work on filtering of event logs in process mining focuses either at removing logging mistakes (called noise) with the aim to prevent those logging mistakes to propagate to a discovered process model, or alternatively focus at removing infrequent behavior in order to discover a process model that models the mainstream behavior in the log. With regard to noise, reallife events logs often contain all sorts of such data quality issues (Suriadi et al. 2017), include incorrectly logged events, events that are logged in the wrong order, and events that took place without being logged. Many event log filtering techniques have been proposed to address the problem of noise (Conforti et al. 2017; Lu et al. 2015; Fani Sani et al. 2017; Ghionna et al. 2008; Cheng and Kumar 2015). Note that there is a conceptual difference between chaotic activities and noise: where noise finds its origin in mistakes related to logging, events from chaotic activities are in fact correctly logged, but are still undesired because they represent an activity that is logged by the system even though it not part of the main process flow. Chaotic activities are also clearly distinct from filtering of infrequent behavior, as chaotic activities can be frequent.
Existing filtering techniques in the process mining field can be classified into four categories: 1) event filtering techniques, 2) process discovery techniques that have an integrated filtering mechanism build in, 3) trace filtering techniques, and 4) activity filtering techniques. We use these categories to discuss and structure related work.
7.1 Event filtering
Conforti et al. (2017) recently proposed a technique to filter out outlier events from an event log. The technique starts by building a prefix automaton of the event log, which is minimal in terms of the number of arcs in the automaton, using an Integer Linear Programming (ILP) solver. Infrequent arcs are removed from the minimal prefix automaton, and finally, the events belonging to removed arcs are filtered out from the event log.
Lu et al. (2015) advocate the use of event mappings (Lu et al. 2014) to distinguish between events that are part of the mainstream behavior of a process and outlier events. Event mappings compute similar behavior and dissimilar behavior between each two executions of the process as a mapping: the similar behavior is formed by all pairs of events that are mapped to each other, whereas events that are not mapped are dissimilar behavior. Fani Sani et al. (2017) proposes the use of sequential pattern mining techniques to distinguish between events that are part of the mainstream behavior and outlier events.
All three of the event filtering techniques listed above aim filter out outlier events from the event log, while keeping the mainstream behavior. Event filtering techniques model the frequently occurring contexts of activities and filter out the contexts of activities that occur infrequently in the log. For example, consider an activity B such that 98% of its occurrences are in context 〈…, A, B, C,… 〉, with the remaining 2% of the events of activity B are in context 〈…, D, B, E,… 〉, then the B events that occur between D and E will be filtered out by event filtering techniques. Note that our filtering technique is orthogonal to event filtering: it would consider activity B to be nonchaotic and would not filter out anything. However, when a log L contains a chaotic activity X, then event filtering techniques are not able to remove all events of this chaotic activity. One of the contexts of X will by chance be more frequent than other contexts, i.e., for some activity A, it will hold that ∀B ∈Activities(L) : #(〈A, X〉, L) > #(〈B, X〉, L), even though 〈A, X〉 might only be slightly more frequent. This will result in X events after a B being removed, while the X events after an A remain in the log. Applying a process discovery technique to this filtered log will then result in a process model where activity X is misleadingly positioned after activity A, while in fact X can happen anywhere in the process. The activity filtering technique presented in this paper will instead detect that activity X is chaotic, and completely remove it from the event log, preventing the misleading effect of event filtering.
7.2 Process discovery techniques with integrated filtering
Several process discovery algorithms offer integrated filtering mechanisms as part of the approach. The αMiner (van der Aalst et al. 2004) is one of the early foundational techniques for process discovery, which starts by inferring causal, exclusive and parallel relations between pairs of activities, which are converted into a Petri net in a second step of the algorithm. In later work, Maruster et al. (2006) explored the use of supervised learning techniques to extract those causal, exclusive and parallel relations from the event log, allowing the αMiner to disregard “noisy” events that would have heavily impacted those relations in the original approach.
The Inductive Miner (IM) (Leemans et al. 2013a) is a process discovery algorithm which first discovers a directlyfollows graph from the event logs, where activities are connected that directly follow each other in the log, from which in a second step a process model is discovered. The directlyfollows relations are affected by the presence of a chaotic activity X: sequence 〈…, A, X, C,… 〉 leads to false directlyfollows relations between A and X and between X and C, while the directlyfollows relation between A and C is obfuscated by X. The Inductive Miner infrequent (IMf) (Leemans et al. 2013b) is an extension of the IM where infrequent directlyfollows relations are filtered out from the set of directlyfollows relations that are used to generate to process models. The filtering mechanism of IMf can help to filter out the directlyfollows relations between A and X and between X and C, but it does not help to recover the obfuscated directlyfollows relation between A and C. Instead, the activity filtering technique presented in this paper filters out the chaotic activity X, leading to sequence 〈…, A, X, C,… 〉 being transformed into 〈…, A, C,… 〉, thereby recovering the directly follows relation between A and C.
The Heuristics Miner (Weijters and Ribeiro 2011) and the Fodina algorithm (Vanden Broucke and De Weerdt 2017), in addition to the directlyfollows relation, defines an eventuallyfollows relation between activities and allows the process analyst to filter out infrequent directlyfollows and eventually follows relations. Two activities A and B are in an eventuallyfollows relation when A is eventually followed by B, before the next appearance of A or B. The eventuallyfollows relation, unlike the directlyfollows relation, is not impacted by the presence of chaotic activities. The Heuristic Miner (Weijters and Ribeiro 2011) and Fodina (Vanden Broucke and De Weerdt 2017) both include filtering methods for the directlyfollows and eventuallyfollows relations that are similar in nature to the filtering mechanism that is used in the Inductive Miner infrequent (Leemans et al. 2013b). However, the use of sequential orderings and parallel constructs in the mining approaches of the Heuristic Miner (De Weerdt et al. 2011) and Fodina (Vanden Broucke and De Weerdt 2017) is based on the directlyfollows relations only, with the eventually follows relations being used for the mining of longterm dependencies. Furthermore, in contrast to the Inductive Miner, the process models discovered with the Heuristic Miner (Weijters and Ribeiro 2011) or Fodina (Vanden Broucke and De Weerdt 2017) can be unsound, i.e., the can contain deadlocks.
The ILPminer (van der Werf et al. 2009) is a process discovery algorithm where a set of behavioral constraints over activities is discovered for each prefix (called the prefixclosure) of the event log, based on which a process model is discovered that satisfies these constraints using Integer Linear Programming (ILP). van Zelst et al. (2015) proposed a filtering technique for the ILPminer where the prefix closure of the event log is filtered prior to solving the ILP problem by removing infrequently observed prefixes. It is easy to see that a chaotic activity X affect the prefixclosure that is discovered from the event log: given log consisting of two traces 〈A, X, C〉 and 〈X, A, C〉, activity X causes the prefixes closures of the two traces to have no overlap in states, while without activity X the two traces are identical. This makes the filtering method of the prefixclosure proposed by van Zelst et al. (2015) less effective, as frequent prefixes randomly get distributed over several infrequent prefixes when chaotic activities are present. Instead, the chaotic activity filtering technique presented in this paper would remove chaotic activity X, leading to traces 〈A, X, C〉 and 〈X, A, C〉 becoming identical after filtering, therefore leading to a simpler process model while still describing the behavior of the event log accurately.
The Fuzzy Miner (Günther and van der Aalst 2007) is a process discovery algorithm that aims at mining models from flexible processes, and it discovers a process model without formal semantics. The Fuzzy Miner discovers this graph by extracting the eventually follows relation from the event log, which is not affected by chaotic activities. Similar to the Heuristics Miner (Weijters and Ribeiro 2011) and Fodina (Vanden Broucke and De Weerdt 2017) the Fuzzy Miner allows to filter out infrequent eventuallyfollows relations between activities. In practice, the lack of formal semantics of the Fuzzy Miner models hinders the usability of the models, as the models are not precise on what behavior is allowed in the process under analysis.
7.3 Trace filtering
Ghionna et al. (2008) proposed a technique to identify outlier traces from the event log that consists of two steps: 1) mining frequent patterns from the event log, and 2) applying MCL clustering (Van Dongen 2008) on the traces, where the similarity measure for traces is defined on the number of patterns that jointly characterize the execution of the traces. Traces that are not assigned to a cluster by the MCL clustering algorithm are considered to be outlier traces and are filtered from the event log.
Cheng and Kumar (2015) propose a supervised approach to filter out noisy traces from an event logs, where they assume a marked sublog on which they assume a process worker to have manually inspected and labeled traces into clean and noisy traces. Additionally, there is an unmarked sublog for which it is unknown which traces are noisy. The use the PRISM ruleinduction algorithm (Cendrowska 1987) to extract classification rules to differentiate between clean and noisy traces by training on the marked sublog, and then apply these classification rules to identify and filter out noisy traces from the unmarked sublog.
It is easy to see that trace filtering techniques address a fundamentally different problem than chaotic activity filtering: in the event log shown in Fig. 2b there are only two traces that do not contain an instance of chaotic activity X, therefore, even if a trace filtering technique would be able to perfectly filter out traces that contain a chaotic event, the number of remaining traces will become too small to mine a fitting and precise process model when the chaotic activity is frequent.
7.4 Activity filtering
The modus operandi for filtering activities is to simply filter out infrequent activities from the event log. The plugin ‘Filter Log using Simple Heuristics’ in the ProM process mining toolkit (Van Dongen et al. 2005) offers tool support for this type of filtering. The Inductive Visual Miner (Leemans et al. 2014) is an interactive process discovery tool that implements the Inductive Miner (Leemans et al. 2013b) process discovery algorithm in an interactive way: the process analyst can filter the event log using sliders and is then shown the process model that is discovered from this filtered log. One of the available sliders in the Inductive Visual Miner offers the same frequencybased activity filtering functionality. The working assumption behind filtering out infrequent activities is that when there are just a few occurrences of an activity, there is probably not enough evidence to establish their relation to other activities to model their behavior. However, as we have shown in this paper, for frequent but chaotic activities, while they are frequent enough to establish their relation to other activities, complicate the process discovery task by lowering directlyfollows counts between other activities in the event log. The activity filtering technique presented in this paper is able to filter out chaotic activities, thereby reconstructing the directlyfollows relations between the nonchaotic activities of the event log, at the expense of losing the chaotic activities.
8 Conclusion & future work
In this paper, we have shown the possible detrimental effect of the presence of chaotic activities in event logs on the quality of process models produced by process discovery techniques. We have shown through synthetic experiments that frequencybased techniques for filtering activities from event logs, which is currently the modus operandi for activity filtering in the process mining field, do not necessarily handle chaotic activities well. As shown, chaotic activities can be frequent or infrequent. We have proposed four novel techniques for filtering chaotic from event logs, which find their roots in information theory and Bayesian statistics. Through experiments on seventeen reallife datasets, we have shown that all four proposed activity filtering techniques outperform frequencybased filtering on real data. The indirect entropybased activity filter has been found to be the best performing activity filter overall averaged over all datasets used in the experiments; however, the performance of the four proposed activity filtering techniques is highly dependent on the characteristics of the event log.
Because the performance of the filtering techniques was found to be logdependent, we propose the use the activity filtering techniques in a sliderbased approach where the user can filter activities interactively and directly see the process model discovered from the filtered event log. Ultimately, only the user can decide which activities to include. In future work, we aim to construct a hybrid activity filtering technique that combines the four techniques proposed in this paper by using supervised learning techniques from the data mining field to predict the effect of removing a particular activity.
References
 Adriansyah, A., van Dongen, B.F., van der Aalst, W.M.P. (2011). Conformance checking using costbased fitness analysis. In Proceedings of the 15 IEEE international enterprise distributed object computing conference (EDOC) (pp. 55–64). IEEE.Google Scholar
 Bruno, B., Mastrogiovanni, F., Sgorbissa, A., Vernazza, T., Zaccaria, R. (2013). Analysis of human behavior recognition algorithms based on acceleration data. In Proceedings of the IEEE international conference on robotics and automation (pp. 1602–1607). IEEE.Google Scholar
 Buijs, J.C.A.M. (2014). Receipt phase of an environmental permit application process (WABO), CoSeLoG project. https://doi.org/10.4121/uuid:a07386a57be34367953570bc9e77dbe6.
 Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P. (2012). A genetic algorithm for discovering process trees. In Proceedings of the 2012 IEEE congress on evolutionary computation (pp. 1–8). IEEE.Google Scholar
 Cendrowska, J. (1987). PRISM: an algorithm for inducing modular rules. International Journal of ManMachine Studies, 27(4), 349–370.CrossRefzbMATHGoogle Scholar
 Cheng, H.J., & Kumar, A. (2015). Process mining on noisy logscan log sanitization help to improve performance? Decision Support Systems, 79, 138–149.CrossRefGoogle Scholar
 Conforti, R., La Rosa, M., ter Hofstede, A.H.M. (2017). Filtering out infrequent behavior from business process event logs. IEEE Transactions on Knowledge and Data Engineering, 29(2), 300–314.CrossRefGoogle Scholar
 Cook, D.J., Crandall, A.S., Thomas, B.L., Krishnan, N.C. (2013). CASAS: a smart home in a box. Computer, 46(7), 62–69.CrossRefGoogle Scholar
 De Leoni, M., & Mannhardt, F. (2015). Road traffic fine management process. https://doi.org/10.4121/uuid:270fd44010574fb989a9b699b47990f5.
 De Weerdt, J., De Backer, M., Vanthienen, J., Baesens, B. (2011). A robust Fmeasure for evaluating discovered process models. In Proceedings of the IEEE symposium on computational intelligence and data mining (CIDM) (pp. 148–155). IEEE.Google Scholar
 Dimaggio, M., Leotta, F., Mecella, M., Sora, D. (2016). Processbased habit mining: experiments and techniques. In: Proceedings of the international IEEE conference on ubiquitous intelligence & computing (pp. 145–152). IEEE.Google Scholar
 Fani Sani, M., van Zelst, S.J., van der Aalst, W.M.P. (2017). Improving process discovery results by filtering outliers using conditional behavioural probabilities. In Proceedings of the international workshop on business process intelligence. Springer.Google Scholar
 Ghionna, L., Greco, G., Guzzo, A., Pontieri, L. (2008). Outlier detection techniques for process mining applications. In International symposium on methodologies for intelligent systems (pp. 150–159). Springer.Google Scholar
 Goedertier, S., Martens, D., Vanthienen, J., Baesens, B. (2009). Robust process discovery with artificial negative events. Journal of Machine Learning Research, 10, 1305–1340.MathSciNetzbMATHGoogle Scholar
 Günther, CW, & van der Aalst, W.M.P. (2007). Fuzzy mining–adaptive process simplification based on multiperspective metrics. In International conference on business process management (pp. 328–343). Springer.Google Scholar
 Herbst, J. (2000). A machine learning approach to workflow management. In European conference on machine learning (pp. 183–194). Springer.Google Scholar
 Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P. (2013a). Discovering blockstructured process models from event logs  a constructive approach. In International conference on applications and theory of petri nets and concurrency (pp. 311–329). Springer.Google Scholar
 Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P. (2013b). Discovering blockstructured process models from event logs containing infrequent behaviour. In International conference on business process management (pp. 66–78). Springer.Google Scholar
 Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P. (2014). Process and deviation exploration with inductive visual miner. In Proceedings of the BPM Demo Track (Vol. 1295, p. 46). CEURWS.org.Google Scholar
 Leotta, F., Mecella, M., Mendling, J. (2015). Applying process mining to smart spaces: perspectives and research challenges. In International conference on advanced information systems engineering (pp. 298–304). Springer.Google Scholar
 Lohmann, N., Verbeek, E., Dijkman, R. (2009). Petri net transformations for business processes–a survey. In Transactions on petri nets and other models of concurrency II (pp. 46–63). Springer.Google Scholar
 Lu, X., Fahland, D., van der Aalst, W.M.P. (2014). Conformance checking based on partially ordered event data. In International conference on business process management (pp. 75–88). Springer.Google Scholar
 Lu, X., Fahland, D., van den Biggelaar, F.J.H.M., van der Aalst, W.M.P. (2015). Detecting deviating behaviors without models. In Proceedings of the international workshop on business process intelligence (pp. 126–139). Springer.Google Scholar
 Mannhardt, F. (2016). Sepsis cases  event log. https://doi.org/10.4121/uuid:915d2bfb7e8449ada286dc35f063a460.
 Maruster, L., Weijters, A.J.M.M., Aalst, W.M.P, Bosch, A. (2006). A rulebased approach for process discovery: dealing with noise and imbalance in process logs. Data Mining & Knowledge Discovery, 13(1), 67–87.MathSciNetCrossRefGoogle Scholar
 McCurdy, T., Glen, G., Smith, L., Lakkadi, Y. (2000). The national exposure research laboratory’s consolidated human activity database. Journal of Exposure Analysis and Environmental Epidemiology, 10(6), 566–578.CrossRefGoogle Scholar
 Murata, T. (1989). Petri nets: properties, analysis and applications. Proceedings of the IEEE, 77(4), 541–580.CrossRefGoogle Scholar
 Object Management Group. (2011). Notation (BPMN) version 2.0. OMG Specification.Google Scholar
 Ordónez, FJ, de Toledo, P., Sanchis, A. (2013). Activity recognition using hybrid generative/discriminative models on home environments using binary sensors. Sensors, 13(5), 5460–5477.CrossRefGoogle Scholar
 Qin, T., Liu, T.Y., Xu, J., Li, H. (2010). LETOR: a benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13(4), 346–374.CrossRefGoogle Scholar
 Solé, M, & Carmona, J. (2013). Regionbased foldings in process discovery. IEEE Transactions on Knowledge and Data Engineering, 25(1), 192–205.CrossRefGoogle Scholar
 Suriadi, S., Andrews, R., ter Hofstede, A.H.M., Wynn, M.T. (2017). Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Information Systems, 64, 132– 150.CrossRefGoogle Scholar
 Sztyler, T., Völker, J, Carmona Vargas, J., Meier, O., Stuckenschmidt, H. (2015). Discovery of personal processes from labeled sensor data: an application of process mining to personalized health care. In Proceedings of the international workshop on algorithms & theories for the analysis of event data, CEURWS.org (pp. 31–46).Google Scholar
 Tapia, E.M., Intille, S.S., Larson, K. (2004). Activity recognition in the home using simple and ubiquitous sensors. In International conference on pervasive computing (pp. 158–175). Springer.Google Scholar
 Tax, N., Bockting, S., Hiemstra, D. (2015). A crossbenchmark comparison of 87 learning to rank methods. Information Processing & Management, 51(6), 757–772.CrossRefGoogle Scholar
 Tax, N., Sidorova, N., Haakma, R., van der Aalst, W.M.P. (2016). Mining local process models. Journal of Innovation in Digital Ecosystems, 3(2), 183–196.CrossRefGoogle Scholar
 Tax, N., Sidorova, N., Haakma, R., van der Aalst, W.M.P. (2017). Mining process model descriptions of daily life through event abstraction. In Intelligent systems and applications (p To appear). Springer.Google Scholar
 Van Dongen, S. (2008). Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications, 30(1), 121–141.MathSciNetCrossRefzbMATHGoogle Scholar
 Van Dongen, B. (2012). BPI challenge 2012. https://doi.org/10.4121/uuid:3926db30f7124394aebc75976070e91f.
 Van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., Van Der Aalst, W.M.P. (2005). The ProM framework: a new era in process mining tool support. In International conference on application and theory of petri nets (pp. 444–454). Springer.Google Scholar
 van der Aalst, W.M.P. (2016). Process mining: data science in action. Springer.Google Scholar
 van der Aalst, W.M.P., Weijters, A.J.M.M., Maruster, L. (2004). Workflow mining: discovering process models from event logs. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1128–1142.CrossRefGoogle Scholar
 van der Aalst, W.M.P., Bolt, A., van Zelst, S.J. (2017). RapidProM: mine your processes and not just your data. In Hofmann, M, & Klinkenberg, R (Eds.) RapidMiner: data mining use cases and business analytics applications (p To Appear): Chapman & Hall/CRC Data Mining and Knowledge Discovery Series.Google Scholar
 van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A. (2009). Process discovery using integer linear programming. Fundamenta Informaticae, 94(3), 387–412.MathSciNetzbMATHGoogle Scholar
 van Kasteren, T., Noulas, A., Englebienne, G., Kröse, B. (2008). Accurate activity recognition in a home setting. In Proceedings of the 10th international conference on ubiquitous computing (pp. 1–9). ACM.Google Scholar
 van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P. (2015). Avoiding overfitting in ILPbased process discovery. In International conference on business process management (pp. 163–171). Springer International Publishing.Google Scholar
 Vanden Broucke, S.K.L.M., De Weerdt, J., Vanthienen, J., Baesens, B. (2013). Determining process model precision and generalization with weighted artificial negative events. IEEE Transactions on Knowledge and Data Engineering.Google Scholar
 Vanden Broucke, S.K.L.M., & De Weerdt, J. (2017). Fodina: a robust and flexible heuristic process discovery technique. Decision Support Systems.Google Scholar
 Weijters, A.J.M.M., & Ribeiro, JTS. (2011). Flexible heuristics miner (FHM). In Proceedings of the IEEE symposium on computational intelligence and data mining (CIDM) (pp. 310–317). IEEE.Google Scholar
 Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2), 179–214.CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.