Online conformance checking: relating event streams to process models using prefix-alignments

Companies often specify the intended behaviour of their business processes in a process model. Conformance checking techniques allow us to assess to what degree such process models and corresponding process execution data correspond to one another. In recent years, alignments have proven extremely useful for calculating conformance checking statistics. Existing techniques to compute alignments have been developed to be used in an offline, a posteriori setting. However, we are often interested in observing deviations at the moment they occur, rather than days, weeks or even months later. Hence, we need techniques that enable us to perform conformance checking in an online setting. In this paper, we present a novel approach to incrementally compute prefix-alignments, paving the way for real-time online conformance checking. Our experiments show that the reuse of previously computed prefix-alignments enhances memory efficiency, whilst preserving prefix-alignment optimality. Moreover, we show that, in case of computing approximate prefix-alignments, there is a clear trade-off between memory efficiency and approximation error.


Introduction
Today's information systems track, in great detail, the execution of business processes within companies. Often, a company has an idea, or even a formal specification, of how their business process is required to be executed. In other cases, laws, regulations and/or legislations dictate the exact way the process is to be executed. Such process specification is often recorded in a process model, i.e. a behavioural specification. However, in many cases, the actual execution of the process, as recorded by the information system, is not in line with the behaviour described by the corresponding process model. Conformance checking, a sub-field of process mining [1], aims at assessing to what degree the behaviour described by a process model is in line with behaviour captured in an event log. In particular, the techniques are able to check conformance based on process modelling formalisms that allow for describing concurrency, i.e. the possibility to specify order-independent execution of activities.
Early conformance checking techniques, e.g. "tokenbased replay" [2], often lead to ambiguous and/or unpredictable results. Hence, alignments [3] were developed with the specific goal in mind to explain and quantify deviations in a non-ambiguous manner. Alignments have rapidly developed into the de facto standard conformance checking technique. Moreover, alignments serve as a basis for techniques that link event data to process models, e.g. they support performance analysis, decision mining [4], business process model repair [5] and prediction techniques.  Techniques to compute alignments are only defined in an offline setting. However, early process-oriented deviation detection is critical for many organizations in different domains. For example, within hospitals, deviating process executions often lead to higher costs, and/or delays in examination time. Similarly, within highly complex administrative processes, e.g. provision of mortgages, notary processes and unemployment administration, deviant behaviour often leads to excessive process execution time and costs. Upon detection of a deviation, a process owner, or the supporting information system, is able to take adequate actions such as blocking the current process instance, assigning a case manager for additional specialized intervention and restarting the process instance.
In this paper, we present a new approach to compute alignment-based conformance checking statistics in an online setting. Instead of conventionally used event logs, i.e. a static data source describing past process execution behaviour, we rely on event streams. An event stream is a continuous data stream that describes a potentially unbounded sequence of events. The fundamental difference of event streams w.r.t. event logs is related to the fact that the knowledge of executed events for a case changes over time, i.e. new events related to the same case can occur in the future. Hence, at any point in time, we are unaware whether the sequence of events observed for a certain case is complete or not. As a consequence, we aim at computing prefix-alignments rather than conventional alignments as they describe the events observed for a case in the best possible way w.r.t. the reference model without requiring explicit termination. In Fig. 1, we present a schematic overview of online conformance checking. We have two main sources of input, i.e. an event stream generated by an information system and a reference process model. Over time, we observe events emitted on the event stream which tell us what activity has been performed in context of what case. For each case we maintain a prefixalignment. Whenever we receive a new event for a case, we recompute its prefix-alignment. We try to recompute prefixalignments greedily; however, in some cases we need to resort to solving a shortest path problem. The focus of this paper is mainly towards the efficiency of solving such shortest problem.
Our proposed approach entails an incremental algorithm that allows for computing both optimal and approximate prefix-alignments. We additionally show that the cost of an optimal prefix-alignment is always an underestimate for the cost of a conventional alignment of any of its possible suffixes. As a consequence, when computing optimal prefixalignments, our approach underestimates alignment costs for completed cases. This implies that once we detect a deviation from the reference model, we are guaranteed that the behaviour related to the case is not compliant with the reference model. Computing approximate prefix-alignments leads to an increase in memory efficiency, however, at the cost of losing prefix-alignment optimality. We experimentally assess the trade-off between memory efficiency and optimality loss using several artificially generated process models. We additionally assess the applicability of our technique using a real data set originating from a hospital information system. Our experiments show that reusing previously computed prefixalignments positively impacts the efficiency of computing new prefix-alignments. Moreover, in case of approximation, we observe a clear trade-off between memory usage and prefix alignment optimality loss.
The remainder of this paper is structured as follows: In Sect. 2, we present related work. In Sect. 3, we present background concepts. In Sect. 4, we introduce event streams and motivate the need for computing prefix-alignments. In Sect. 5, we present our incremental algorithm for prefixalignment computation together with two memory optimization techniques. In Sect. 6, we evaluate the proposed approach in terms of performance and approximation accuracy. In Sect. 7, we provide a discussion of the proposed approach. Section 8 concludes the paper.

Related work
A plethora of different process mining techniques exists, ranging from discovery to prediction. However, given the focus of this paper, we limit related work to the field of alignment computation and online process mining. Hence, we refer to [1] for an overview of different process mining techniques.
Early work in conformance checking uses token-based replay [2]. The techniques replay a trace of executed events in a process model (Petri net) and add missing tokens if transitions are not able to fire. After replay, remaining tokens are counted and a conformance statistic is computed based on missing and remaining tokens. Alignments were introduced in [3] and have rapidly developed into the de facto standard for conformance checking. In [6,7], decomposition techniques are proposed together with computing alignments. Using decomposition techniques greatly enhances computation time, i.e. the techniques successfully apply the divide-and-conquer paradigm; however, the techniques provide lower bounds on conformance checking statistics, rather than computing alignments. More recently, general approximation schemes for alignments, i.e. computation of near-optimal alignments, have been proposed in [8].
A relatively limited amount of work has been done in the area of online process mining. In [9], a first design of an online process discovery algorithm was proposed, based on the Heuristic Miner [10]. The approach was improved in [11] by adopting a different internal storage model. In [12], the work was generalized and converted in an architecture that covers a wide range of process discovery algorithms. In [13], an alternative discovery approach is presented for the purpose of discovering declarative process models.
To the best of our knowledge this paper is the first work that covers conformance checking/alignments in online/ incremental settings.

Background
In this section, we briefly present key process mining concepts such as event logs, workflow nets and alignments. We assume the reader to be familiar with mathematical concepts such as sets, multisets, functions and sequences. We only present notational conventions regarding these concepts, as used in this paper.
Given set X and y / ∈ X we write X y for X ∪ {y}. N denotes the set of natural numbers, N 0 includes 0. A multiset generalizes the concept of a set and allows elements to have a multiplicity exceeding one. Let X = {e 1 , e 2 , . . . , e n } be a set, a multiset M over X is a function M : X → N 0 . We write a multiset as M = [e k 1 1 , e k 2 2 , . . . , e k m m ] (m ≤ n), where for each i ∈ {1, . . . , m} we have M(e i ) = k i . If M(e i ) = 0, we omit e i from multiset notation, and, if M(e i ) = 1 we omit e i 's superscript. We write sequence σ of length n as σ = σ (1), σ (2), . . . , σ (n) , where for 1 ≤ i ≤ n, σ (i) denotes the ith element of σ . The set of all possible sequences over X is written as X * . Concatenation of sequences σ 1 and σ 2 is written as σ 1 · σ 2 . An empty sequence is written as . Given a sequence σ ∈ (X 1 × X 2 × · · · × X n ) * we define,

Event logs and process models
The execution of business processes within a company generates traces of event data in its supporting information system. Typically we are able to extract such data from the company's information system describing, for specific cases, e.g. an insurance claim, what activities have been performed over time. We often refer to a collection of such data as an event log. From a formal perspective, an event log is considered to be a multiset of sequences of executed process activities, or simply events. Consider Table 1, which depicts a simplified view of an event log. The event log describes the execution of activities related to a fictional compensation request process for concert tickets. For example, consider all events related to case 13 (represented by Case-id 13), i.e. a new request is registered by Luke, Harry subsequently examines the request, Pete checks the corresponding ticket, etc. For each event recorded in the event log, we have information regarding its id, what activity it related to, which resource executed the activity and at what time it was executed. In general, we are able to obtain even more event data, for example what is the corresponding ticket id, what concert the ticket belongs to, what is the ticket price, etc. However, for the sake of simplicity we abstract from such data. In particular, in context of this paper, we are only interested in actual activities performed, instead of all possible data/resource aspects involved in the activity execution, i.e. we focus on the control-flow perspective. For example, based on Table 1 we deduce that for case 13, the sequence register request, examine, check ticket, decide, pay compensation was performed. We assume the execution of activities to be atomic; hence, given the universe of activities A, an event log L is a multiset over sequences of A, i.e. L : A * → N 0 .
A process model describes the intended behaviour of a process. Although many process modelling formalisms exist, we focus on (labelled) Petri nets [14], which allow us to explicitly model concurrency in a concise and compact manner. In Fig. 2, we depict an example Petri net. The Petri net, like the example event log in Table 1, describes handling of a compensation request. It dictates that the first activity to be performed should always be register request. Subsequently, the examine and check ticket activities can be performed concurrently. However, we are also allowed to only perform the check ticket activity and subsequently make a decision, i.e. to skip the examination. After a decision is made, we are able to redo the examination and ticket check. However, such decision, i.e. to redo these activities, is not explicitly captured by the system. Eventually, either the pay compensation or the reject request activity is performed. A Petri net consists of places and transitions. Places are used to represent the state of the described process, whereas transitions represent possible executable activities, subject to the state. The Petri net in Fig. 2 consists of 7 places (denoted P), i.e. P = {p i , p 1 , . . . , p 5 , p o }, visualized as circles. Formally, we represent the state of a Petri net in terms of a marking M, which is a multiset of places, i.e. M : P → N 0 . For example, in Fig. 2 place p i is marked with a token, visualized by a black dot. Thus, the marking of the Petri net in Fig. 2, as visualized, is [ p i ]. The Petri net furthermore contains 8 transitions (denoted T ), i.e. {t 1 , . . . , t 8 }, visualized as boxes. Transitions allow us to manipulate a Petri net's marking. A transition t ∈ T is enabled if all places p that have an outgoing arc to t contain a token. If a transition is enabled in marking M, we write M[t . An enabled transition is able to fire. If we fire a transition t, it consumes a token from each place that has an outgoing arc to t. Subsequently, a token is produced in each place that has an incoming arc from t. For example, in Fig. 2, t 1 is the only enabled transition in marking [ p 0 ], and, if it fires we obtain new marking [ p 1 , p 2 ]. In marking [ p 1 , p 2 ], we are able to fire both t 2 and t 3 , in any order. We are thus able to generate sequences of fired transitions, e.g. t 1 , t 2 , t 3 and t 1 , t 3 , t 2 , which both yield marking [ p 3 , p 4 ]. Note that, in marking [ p 1 , p 2 ], transition t 4 is not (yet) enabled. If we fire t 3 , leading to marking [ p 1 , p 4 ], transition t 4 is enabled. When executing a sequence σ ∈ T * of transitions from marking M results in M , we write M σ − → M . We let M denote the universe of all possible markings.
All transitions, except transition t 6 , have a single character label, e.g. transition t 1 has label a. Typically, these labels represent actual activities that can be executed in the process described by the Petri net. For convenience, we also added more descriptive names, e.g. register request. Transition t 6 is an invisible transition, i.e. it is able to manipulate the marking of the Petri net, without being noticed by the outside world. Also, there are two transitions with label d, i.e. t 4 and t 5 .
We formally define a Petri net N as a tuple N = (P, T, F, λ), where P represents its places, T its transitions, F ⊆ (P × T ) ∪ (T × P) represents the flow relation, i.e. the arcs in Fig. 2. Observe that a place is only connected to a transition and vice versa, i.e. there is never an arc between two places/transitions. Finally, given a set of activity labels Λ and τ / ∈ Λ, λ : T → Λ τ represents the labelling function of N . For example, λ(t 1 ) = a and λ(t 6 ) = τ .
We assume that a reference model of a process is designed by a human business process analyst/designer. We therefore assume that a process model has a certain level of quality, e.g. the Petri net is a sound workflow net [15, Definition 7]. We do not introduce the characteristics of such models, however, soundness guarantees that we are always able to reach a designated final marking M f , from any marking M reachable from designated initial marking M i . For the purpose of this paper, we assume that the Petri nets we consider also consist of this property, which we deem the proper termination assumption.

Alignments
When we reconsider the example sequence of activities related to case 13, i.e. written short-hand as a, b, c, d, e , we observe that indeed by firing transitions t 1 , t 2 , t 3 , t 5 and t 7 , such sequence of activities is produced by N 1 in Fig. 2. If we consider another example trace, i.e. x, a, d, e, z , we observe some problems. For example, activities x and z are not labels of N 1 . Furthermore, according to N 1 , at least c must be executed in-between a and d.
Alignments allow us to identify and quantify the aforementioned problems and moreover allow us to express these deviations in terms of the reference model. Conceptually, an alignment is a mapping between the execution of transitions in the process model and the activities observed in a trace σ in a given event log L. Consider Fig. 3, in which we present alignments for traces a, b, c, d, e and x, a, d, e, z w.r.t. Petri net N 1 . Alignments are sequences of pairs, e.g.
. Each pair within an alignment is referred to as a move. The first element of a move refers to an activity of the trace, whereas the second element refers to a transition. The goal is to create pairs of the form (a, t) s.t. λ(t) = a, e.g. all moves in γ 1 are of this form. The sequence of activity labels in the alignment needs to equal the input trace. The sequence of transitions in the alignment needs to correspond to a σ ∈ T * s.t., given a designated initial marking M i and final marking M f , we have M i In some cases, we are not able to construct a move of the form (a, t) s.t. λ(t) = a. In case of trace x, a, d, e, z , we are not able to map x and z to any transition in N 1 with an equally valued label. Furthermore, we at least need to execute transition t 3 in order to form a sequence of transitions that generates marking M f from M i . In some cases we need to fire a transition t with λ(t) = τ , for which we again are not able to construct a labeltransition mapping. In such cases, we use skip-symbol in either the activity or the transition part of a move. For example, consider γ 2 in Fig. 3, which contains three skip symbols. Verify that again, when ignoring skip symbols, the sequence of activity labels equals the input trace, and the sequence of transitions is valid for M i and M f . If a move is of the form (a, t), we call this a synchronous move, (a, ) is an activity move and ( , a) is a model move.
We let Γ denote the universe of alignments, and let Γ (N , Given the definition of alignments as presented in Definition 1, several alignments, i.e. sequences of moves adhering to Definition 1, exist for a given trace and Petri net. For example, consider alignment γ 3 depicted in Fig. 4 which, according to Definition 1, is an alignment of x, a, d, e, z and N 1 as well. The main difference between γ 2 and γ 3 , i.e. both aligning trace x, a, d, e, z with N 1 , is the fact that γ 2 binds the execution of t 4 to the observed activity d, whereas γ 3 binds the execution of t 5 to the observed activity d. Clearly, both explanations are possible, however, to be able to bind executed activity d to t 5 , and alignment γ 3 requires the explicit execution of transition t 2 as well. Since activity b is not observed in the given trace, we observe the presence of move ( , t 2 ) in γ 3 , which is not needed in γ 2 . Both alignments are feasible; however, we prefer alignment γ 2 over γ 3 as it minimizes non-synchronous moves, i.e. moves of the form (a, ) or ( , t).
As exemplified by alignments γ 2 and γ 3 , we need means to be able to rank and compare alignments and somehow express our preference of certain alignments w.r.t. others. To this end, we define a cost function over the moves of an alignment. The cost of an alignment is simply the sum of the costs of its individual moves. Typically synchronous moves are assigned a low, or even 0, cost. The costs of model and activity moves are usually higher than the costs of synchronous moves. Assume we assign cost 0 to synchronous moves and cost 1 to activity/model moves. In this case, the cost of γ 1 is 0. The cost of alignment γ 2 is 3, whereas the cost of alignment γ 3 is 4. Hence, the cost of γ 2 is lower than the cost of γ 3 , and we prefer it over γ 2 . Formally, we define the cost of a move as a function δ : A × T → R 0 . The costs κ δ of a sequence of moves γ , given move cost function δ, are defined as . In general, we are able to use an arbitrary instantiation of δ; however, in the remainder of the paper, we adopt the unit-cost function: Since we assume unit-costs throughout the paper, we omit δ as superscript and simply refer to κ(γ ). We write γ opt to refer to an optimal alignment, i.e. γ opt = arg min γ ∈Γ (N ,σ,M i ,M f ) κ(γ ). Consequently, computing an optimal alignment is simply defined as a minimization problem. In [3], it is shown that computing an optimal alignment is equivalent to solving a shortest path problem on the state space of the synchronous product net of N and σ . The exact nature of such synchronous product net and an equivalence proof of the two problems is outside the scope of this paper. Hence, we refer to [3] for these definitions and proofs. In this paper, we use the fact that an algorithm to find an optimal alignment exists and we use it as a black box. Finally, it is important to note that multiple optimal alignments exist.

Computing prefix-alignments on event streams
We aim at computing conformance checking statistics in an online fashion in order to observe deviations at the moment they occur. Hence, we define the notion of event streams. Subsequently, we motivate why computing conventional alignments on event streams overestimate the potential deviation severity, and therefore, we resort to computing prefix-alignments.

Event streams
Formally, an event stream is a, possibly infinite, sequence of events. An event is a pair consisting of a case-identifier and an activity. An event describes what activity is performed in context of what process instance (represented by the caseidentifier). Event streams differ from event logs in two ways: (1) an event stream is potentially infinite and (2) behaviour seen for a case is incomplete, i.e. in future new events may be executed in context of a case.
Definition 2 (Event stream) Let C denote the universe of case identifiers. Let A denote the universe of possible activities. An event stream S is an infinite sequence over A pair (c, a) ∈ C × A represents an event, i.e. activity a was executed in context of case c. S(1) denotes the first event that we receive, whereas S(i) denotes the ith event. Consider stream S 1 in Fig. 5 as an example where we show the emission of activities based on the process model in Fig. 2 using shorthand activity names. Observe that event a) is emitted second, etc. Our knowledge after receiving the third event, i.e. S 1 (3) = (5, a), w.r.t. case 5, is different from our knowledge after receiving the fifth event. After the third event, for case 5, we observed a , whereas after the fifth event this is a, b, c .
We aim at computing alignments for the cases emitted onto an event stream, as they allow us to quantify deviations in a clear manner. Assume that we have only seen the first three events, i.e. (3, d), (4, a) and (5, a), of S 1 . The only activity seen for case 5 is activity a. An optimal alignment for case 5 is (a, t 1 ), ( , t 3 ), ( , t 4 ), ( , t 7 ) . After receiving the fourth event, i.e. (5, b), an optimal alignment for case 5 is (a, t 1 ), (b, t 2 ), ( , t 3 ), ( , t 5 ), ( , t 7 ) . In both cases, the costs of the alignments are 3. However, after receiving the first nine events on stream S 1 we obtain activity sequence a, b, c, d, e for case 5 with corresponding optimal alignment (a, , (e, t 7 ) with costs 0. Thus, since the knowledge we possess about cases changes over time, computing conventional alignments prior to case completion is expected to lead to an overestimation of the true alignment costs. We do not assume explicit knowledge of case termination w.r.t. the events observed on the stream. Moreover, we aim at detecting potential behaviour during case execution as opposed to computing such figures after case completion. As indicated, doing so with conventional alignments is expected to lead to cost overestimation, and as a consequence, false positives from a deviation perspective.  Fig. 6 Two prefix-alignments for a, c, d and N 1 Therefore, we aim at computing prefix-alignments, which are specifically designed to incorporate trace incompleteness.

Prefix-alignments
Prefix-alignments are a relaxed alternative to conventional alignments [3,Sect. 4.5]. In essence, they relax requirement two of Definition 1 in such way that after executing the Tpart of the alignment, projected onto T , the final marking M f can still be reached. Formally, we rephrase requirement Fig. 6 in which we depict two example prefix-alignments of incomplete trace a, c, d and N 1 . Observe that, for both alignments we need to append either t 7 or t 8 to obtain marking M f , and thus, the relaxed requirement is satisfied. Similar to conventional alignments, several prefix-alignments exist that correctly align a prefix and a Petri net. Hence, we again need means to rank and compare prefix-alignments. For example, in Fig. 6 we prefer γ 1 over γ 2 , since it only contains synchronous moves, whereas γ 2 contains a model move. Again, we define a cost function for prefix-alignments. Since a prefix-alignment, like a conventional alignment, is a sequences of moves, the cost of a prefix alignment is defined in the exact same manner to the costs of conventional alignments, i.e. it is simply the sum of the costs of its individual moves. Observe that cost function κ is defined over a sequence of moves, and thus, given some prefix-alignment γ , κ(γ ) is readily defined. As a consequence, we again have the notion of optimality. For example, γ 1 is an optimal prefix-alignment for a, c, d and N 1 . We let Γ denote the universe of possible prefix-alignments and let Γ (N , σ, M i , M f ) denote all possible prefix-alignments of σ and N given M i and M f .
The underestimating property is useful since, in an online setting, once an optimal prefix-alignment has nonzero costs, it guarantees that a deviation from the reference model is present. On the other hand, if a case is not properly terminated and will never terminate, yet the sequence of activities seen so far has a prefix-alignment cost of zero, we do not observe this type of deviation until we compute a corresponding conventional (optimal) alignment.
Any shortest path algorithm to compute conventional alignments, i.e. as briefly discussed in Sect.3.2, is easily altered to compute prefix-alignments. In fact, in line with the relaxation of requirement two of Definition 1, such alteration only consists of adding more states to the set of final states of the search problem. Hence, to compute optimal (prefix-)alignments we are able to use any algorithm designed for finding shortest paths in a graph. However, in [3] the A * algorithm [16] is proposed and evaluated. In this paper, we simply assume that we are able to use an algorithm α, i.e.: it is optimal. Observe that the proper termination assumption, w.r.t. the process models considered, guarantees that α always find an optimal prefix-alignment.

Computing prefix-alignments incrementally
In this section, we present an incremental algorithm for the purpose of online prefix-alignment computation. Subsequently, we present effective parametrization of the algorithm that allows us to reduce memory usage and computation time.

An incremental framework
We aim at computing a prefix-alignment for each sequence of events seen so far for each case c ∈ C. In this paper, we primarily focus on the performance of prefix-alignment computation in an incremental setting; we therefore do not consider (the impact of) storing the information seen on the event stream in great detail. Henceforth, we assume the existence of a case administration D C : C × N 0 → Γ , where, for i ≥ 1, D C (c, i) represents the currently known prefix-alignment related to case c after receiving events S(1), S(2), . . . , S(i). Initially, we have D C (c, 0) = , ∀c ∈ C. For now, we assume that D C is able to store all most recent prefix-alignments for all cases. Such assumption, theoretically, requires infinite memory; hence in Sect. 5.3, we briefly To compute prefix-alignments based on the event stream, we conceptually perform the following steps. When we receive an event related to a certain case, we check whether we previously computed a prefix-alignment for that case. In case we are guaranteed that the event refers to an activity move, i.e. because the activity simply has no corresponding label in the reference model, we append such activity move to the prefix-alignment. If this is not the case, we fetch the marking in the reference model, corresponding to the previous prefix-alignment. For example, given prefix alignment (a, t 1 ) based on N 1 (Fig. 2), the corresponding marking is [ p 1 , p 2 ]. If the event is the first event received for the case, we simply obtain marking M i . In case we are able to directly fire a transition within the obtained marking with the same label as the activity that the event refers to, we append a corresponding synchronous move to the previously computed prefix-alignment. Otherwise we use a shortest path algorithm, of which we present some parametrization in Sect. 5.2, to find a new (optimal) prefix-alignment. In Algorithm 1, we present an algorithmic description of the aforementioned rationale.
The algorithm expects a Petri net, initial-and final marking, an algorithm that computes optimal prefix-alignments and an event stream as an input. Note that, after receiving a new event, the case administration for index i − 1 is copied into the ith version, i.e. line 6. This operation is O(1) in practice. Since optimal prefix-alignments underestimate conventional alignment costs (Proposition 1), we are interested to what extent Algorithm 1 guarantees optimality of the prefix-alignments stored in D C .
Theorem 1 (Optimality of Algorithm 1) We let D C : C × N 0 → Γ , with D C (c, 0) = , ∀c ∈ C, and, assume D C is updated according to Algorithm 1. For any c ∈ C, i ∈ N and γ = D C (c, i) we have γ ∈ Γ and γ is optimal for (π 1 (γ )) ↓ A .

Proof (Induction on i)
• Base Case I : i = 0 All alignments are .
In case we are able to fire some t with λ(t) = a in M 0 , we obtain alignment (a, t) , which, under the unit-cost function, is optimal. 1 In case t∈T (λ(t) = a) we obtain (a, ) which is trivially an optimal prefix-alignment for trace a . In any other case, we compute α (N , M i , M f , a ) which is optimal by definition.
• Induction Hypothesis Let i > 1. For any c ∈ C, we assume that for γ = D C (c, i), we have γ ∈ Γ and γ is optimal. • Inductive Step We prove that, for any c ∈ C, for γ = D C (c, i + 1), we have γ ∈ Γ and γ is optimal. Let (c, a) be S(i + 1). In case D C (c, i) = we know that γ is optimal (Base Case i = 1). Let D C (c, i) = γ s.t. γ = . In case we are able to fire some t with λ(t) = a in M 0 we obtain γ = γ · (a, t) . Since, under unitcost function, δ(a, t) = 0, if γ is non-optimal, then also γ is non-optimal which contradicts the IH. A similar rationale holds in case t∈T (λ(t) = a). In any other case, we compute α(N , M i , M f , σ · a ) which is optimal by definition.
Theorem 1 proves that Algorithm 1 always computes optimal prefix-alignments for (π A (γ )) ↓ A , i.e. the sequence of activities currently stored within D C for some c ∈ C. Hence, combining this result with Proposition 1, we conclude that whenever the algorithm observes certain alignment costs exceeding 0, the corresponding conventional alignment has at least the same costs, or higher.

Parametrization
In the previous section, we used α completely as a black box and always solved a shortest path problem starting from M i . In this section, we show that we are able to exploit the previously calculated alignment for a case c in order to prune the search state-space. Moreover, we show means to limit the search by changing its starting point.

Cost upper-bounds
Assume that we receive the ith event (c, a) on the stream and we let γ = D C (c, i − 1) and γ = D C (c, i). Let us write Fig. 7 Partially reverting (k = 2) the prefix-alignment of a, b, x, c, d and N 1 in case of receiving new activity b. The grey coloured moves are not considered when computing the new alignment the corresponding sequence of activities as σ = σ · a . By Proposition 1, we know that γ is an optimal prefix-alignment for σ . It is easy to see that the costs of γ together with an activity move on a are an upper bound for the costs of γ , i.e. κ(γ ) ≤ κ(γ ) + δ(a, ). We are able to utilize this knowledge within the shortest path search algorithm α. Whenever we encounter a path within the search that is (guaranteed to be) exceeding κ(γ ) + δ(a, ), we simply ignore it, and all paths extending it. As indicated, in alignment computation, the A * algorithm is often used as an instantiation for α. The A * algorithm traverses the state space in an implicit manner, i.e. it expects each state it visits to tell which states are their neighbours, and, at what distance. Moreover, it assumes that each state is able to estimate its distance to the closest final state, i.e. each state has a heuristic distance estimation to the closest final state. For the purpose of computing (prefix-)alignments, there are two of these heuristic distance functions defined [3,Chapter 4]. The exact characterization of these heuristic functions is out of this paper's scope, i.e. it suffices to know that we are able to, for each marking in the synchronous product net, compute the estimated distance (in terms of alignment costs) to final marking M f . Moreover, such estimation is always underestimating the true distance. Thus, whenever we encounter a marking M in the state space of which the distance to reach M from M i , combined with the estimated distance to M f , exceeds κ(γ ) + δ(a, ), we ignore it and all of its possible subsequent markings.

Limiting the search
Again, assume we receive the ith event (c, a) and we let marking M be the marking obtained by executing the transitions of γ = D C (c, i − 1). In case there exist transitions t with λ(t) = a, yet none of these transitions are enabled in M, the basic algorithm simply utilizes α (N , σ · a , M i , M f ). In general, the shortest path algorithm does not need M i as a start state, i.e. we are able to choose any marking of N as a start state. Hence, we propose to partially revert alignment γ up to a maximal revert distance k and start the shortest path search from the corresponding marking. Doing so however no longer guarantees optimality as we are no longer searching for a global optimum in the state space.
Consider Fig. 7, where we depict a prefix-alignment for  a, b, x, c, d and N 1 (Fig. 2). Assume we receive a new event that states that activity b follows a, b, x, c, d and we use a revert window size of k = 2. Note that the marking related to the alignment is [ p 5 ]. In this marking, no transition with label b is enabled and the algorithm normally calls α (N 1 , a, b, x, c, d, b , [ p i ], [ p o ]). However, we revert the alignment two moves, i.e. we revert (d, t 5 ) and (c, t 3 ) and call α (N 1 , c, d, b , [ p 2 , p 3 ( , t 6 ), (b, t 2 ) , depicted on the right-hand side of Fig. 7. Note that after this call, the window shifts, i.e. the call appended two moves, and thus, (c, t 3 ) and (d, t 5 ) are no longer considered upon receiving of new events.

Administering cases in finite memory
Thus far, we assumed D C to be of infinite memory, an infeasible assumption in practice. In an online setting, case administration D C needs to deploy some form of memory management that removes entries based on some case characteristic, e.g. age, relative frequency on the stream, etc. Examples of such mechanisms are reservoirs [17,18], time decay-based data structures [19] and frequency approximation algorithms [20]. These techniques, at some point, remove prefix-alignments related to some, seemingly inactive, case. Note that this no longer guarantees that a prefix-alignment maintained in D C is always an underestimation for all activities emitted on the stream for case c. In fact, Theorem 1 actually does not prove this, as it proves optimality for (π 1 (γ )) ↓ A , which under assumption of infinite memory is equivalent to the previous statement. Moreover, if we receive activities related to a case that was previously deleted, we are falsely starting to compute new prefix-alignments for the case, i.e. there was some past behaviour that we no longer possess.
A way to solve this problem is by tracking what cases are removed from case administration. If new events appear related to such case, we simply ignore them. In such way, any element of the case administration truly relates to all behaviour emitted on the stream related to the case. However, again, at some point in time we need to drop cases from the secondary storage component. It is, however, reasonable to assume that the number of distinct cases is orders of magnitudes smaller than the number of events emitted onto the event stream.

Evaluation
We have evaluated the proposed algorithm, including its parametrization, using the RapidProM [21] extension for RapidMiner. 2 As a search algorithm, we use a the A * algorithm provided by hipster4j [22]. To evaluate the proposed algorithm, we generated several process models with different characteristics, i.e. different degrees of parallelism, choice and loops. Additionally we evaluated our approach using real event data, related to the treatment of hospital patients suspected of having sepsis. In this experiment, we additionally compare computing prefix-alignments with repeatedly computing conventional alignments on an event stream.

Experimental set-up
We used a scientific workflow implemented in RapidProM which, conceptually, performs the following steps: 1. Generate a (block-structured) workflow net with k labelled transitions, where k is drawn from a triangular distribution with parameters {10, 20, 30}, for increasing levels of Parallelism, Choice and Loops (from 0 to 50% in steps of 10%) [23]. 2. For each workflow net, generate an event log with 1000 cases. 3. For each event log, add increasing levels (from 0 to 50% in steps of 10%) of one type of noise, i.e. remove activity, add activity or swap activities. 4. For each "noisy" event log, do incremental conformance checking against the workflow net it was generated from, using all parameter combinations presented in Table 2.
Observe that within the experiments we mimic event streams by visiting each event in a trace, one by one, e.g. if we have event log L = [ a, b, c , a, c, b ], we generate event stream (1, a), (1, b), (1, c), (2, a), (2, c), (2, b) . Moreover, we align every trace-variant once, i.e. if a, b, c occurs multiple times in the event log, we only align it once. In total, we have generated 18 different models, for which we generate 18 different event logs, each containing 1000 traces, yielding 18.000 noise-free traces. After applying the different types of noise, we obtain a total of 324.000 traces. Clearly, the number of events per trace greatly varies depending on the generated model; however, within our experiments, in total 44.537.728 events were processed (with varying algorithm parametrization). Out of these events, 12.151.510 state-space searches were performed.

Results
Here, we present the results of the experiments, in line with the parametrization options as described in Sect. 5.2. We first present results related to using cost upper-bounds; later we present the results related to limited search.

Cost upper-bounds
In this section, we present the results related to the performance of using cost upper-bounds. Within these results, we only focus on execution of α with M 0 as a start state, i.e. we do not incorporate results related to varying window sizes. In Fig. 8, we present, in terms of the level of introduced noise, the average number of enqueued states, queue size, visited nodes and traversed arcs for each search in the state space.
Clearly, using the upper bound as a cut-off value in statespace traversal greatly enhances the memory efficiency. We observe that, when using the upper bound defined in Sect. 5.2, the average number of states enqueued during the search is less than half compared to not using the upper bound. The average queue size, i.e. average number of states in the queue throughout the search, is much lower in case of using a lower bound. We observe that the search efficiency (Fig. 8c, d) is positively affected by using the upper bound; however, the difference is less severe and in some cases negligible (0% noise level). Thus, using previously computed prefix-alignment values for a case allows effective statesspace pruning in the shortest path algorithm.
In Fig. 9, we show the effect of the length of the prefix that needs to be aligned in terms of memory consumption. We only show results for length ≤ 100. Both in case of using and not using the upper bound, we observe a linear increase in number of states queued and average queue size. However, the rate of growth is much lower when using an upper bound. We observe a small region of spikes in both charts around prefix-length 20-25. After investigating the distribution of prefix-length w.r.t. type of process model, i.e. containing loops vs. not containing loops, we observed that most traces exceeding such length are related to the models containing loops. As the other group of models contains relatively more parallelism and/or choices, the complexity of the underlying shortest path search is expected to be slightly more complex, which explains the spikes for relatively short prefix-lengths.

Reverting alignments
In this section, we present results related to the performance and approximation quality of using revert windows as described in Sect. 5.2.2. In Fig.10, we present performance results in terms of memory efficiency and approximation error, plotted against noise level. In Fig. 10a, we show the average number of states enqueued when using different revert window sizes. Clearly, the memory usage increases when we increase the window size. Interestingly, this increase seems linear. The approximation error (Fig. 10b) shows an inverse pattern; however, the decrease in approximation error seems nonlinear, when the window size increases. Moreover, in case we set the window size to 5 we observe that the approximation error, within this experiment, is negligible, whereas memory-wise window sizes of 10 and 20 use much more memory while hardly improving the quality of the result.
In Fig. 11, we present performance results in terms of memory efficiency and approximation error, plotted against prefix length. We observe that in terms of enqueued nodes (Fig. 11a), at first a rapid increase appears, after which a steep decline is present. Stabilization around lengths ≥ 25 is again due to the fact that all traces of such length originate from models with loops. The peak and decline behaviour is explained by the fact that the complexity of solving state- space-based search within the models is most likely to be most complex in the middle of the trace. Towards the end of a model's behaviour, we expect less state-space complexity, which explains the decline in the chart around prefix length 10-20.
In Fig. 11b, we observe similar results as observed in Fig. 10b. A window size of 1 is simply too small and it seems that, when the prefix length increases, the costs increase linearly. However, when using a window ≥ 2 we observe asymptotic behaviour in terms of approximation error. Again we observe that using a window of at least size 5 leads to negligible approximation errors.

Evaluation using real event data
In this section, we discuss the results of applying incremental alignment calculation based on real event data. We focus on the under/overestimation of true eventual conventional alignment cost, as well as the method's performance. As a baseline, we compute conventional alignments every time we receive a new event. We use an event log originating from a Dutch hospital related to the treatment of patients suspected of having sepsis [24]. Since we do not have a reference model, we generated one based on a subset of the data. This generated process model still describes around 90% of the behaviour within the event log (computed using conventional alignments). The data set contains 15.214 events divided over 1.050 cases. Prefix-alignments were computed for 13.775 different events. We plot all results w.r.t. the aligned prefix length as noise percentages, i.e. used in Figs. 8 and 10, are unknown when using real event data. Finally note that the distribution of trace length within the data is heavily skewed and has a long infrequent tail. The majority of the trace's length is below 30, hence, figures for prefix lengths above this value refer to a relatively limited set of cases. Nonetheless, we plot all results for all possible prefix lengths observed.
In Fig. 12, we present results related to computed alignment costs. We show results for using the incremental scheme proposed in this paper with window sizes 5, 10 and 20, and, the baseline ("Conventional"). In Fig. 12a, we show the average absolute alignment costs per prefix length. We observe that using a window size of 5 in general leads to higher alignment costs. This is explained by the fact that the relatively little window size does not allow us to revert any choices made in previous alignments, which consequently does not allow us to find an eventual global optimum. Inter-  Fig. 12 Average cost results per prefix-length, with different revert window sizes. a Average (prefix-) alignment cost and b average cost difference w.r.t. eventual optimal conventional alignment estingly, both window sizes 10 and 20 lead to, on average, comparable alignment costs to simply computing conventional alignments. However, in the beginning of cases, i.e. for small prefixes, as expected, computing conventional alignments leads to higher values. In Fig. 12b, we show the average cost difference w.r.t. the eventual alignment costs, i.e. after case completion. Interestingly, after initially overestimating eventual costs, conventional alignments underestimate the costs of conventional alignments quite severely. This can be explained by the fact that partial traces are aligned by a short path of model moves through the model combined with a limited set of activity moves.
In order to quantify the potential business impact of applying the (prefix-)alignment approach, we derive several different measures of relevance for the three different window sizes and the baseline. These figures are presented in Table 3. To obtain the results as presented, for each received event we define: -If the difference of the current (prefix-)alignment cost with the eventual alignment cost is zero, and the eventual costs exceed zero, we define a We acknowledge that alternative definitions of True/False Positives/Negatives are possible. Therefore, the results obtained are specific for the definition provided, as well as the data set of use. We observe that computing conventional alignments, for every event received, leads to better recall, i.e. T P T P+F N . This implies that the ratio of correctly observed deviations w.r.t. neglected deviations is better for the conventional approach. However, using the incremental scheme leads to significantly higher specificity ( T N T N+F P ) and precision values ( T P T P+F P ). Specifically for window sizes 10 and 20 we observe very high precision values. This in fact is in line with Proposition 1 and, moreover, shows that the results obtained with these window sizes are close to results for an infinite window size. Finally, we observe that the accuracy of window sizes 10 and 20 is comparable and higher than the alternative approaches, i.e. window size 5 and conventional. However, in terms of F1-score, simply calculating conventional alignments outperforms using the incremental scheme as proposed.
In Fig. 13, we show the performance of the different approaches in terms of enqueued states and visited states. Note that the results presented consider the full incremental scheme, i.e. if we are able to execute a synchronous move directly, queued/visited states equals 0. As expected, using a window size of 5 is most efficient. Window sizes 10 and 20 are less efficient yet for longer prefix lengths, they outperform computing conventional alignments. For window size 20, we do observe a peak in terms of computational complexity for prefix lengths of 10-20. Such peak is explained by the relatively inaccurate heuristic used within the A * -searches performed for prefix-alignment computation. The drops in the chart relate to purely incremental alignment updates. We observe that computational complexity of conventional alignment computation is in general increasing when prefix length increases. The incremental-based approach seems not to suffer from this and shows relatively stabilizing behaviour.
Based on the experiments using real hospital data, we conclude that, for this specific data set, a window size of 10 is appropriate. As opposed to computing conventional  alignments, it achieves precise results, i.e. whenever a deviation is detected it is reasonable to assume that this is indeed the case. Moreover, it outperforms computing conventional alignments in terms of computational complexity and memory usage.

Discussion
The aim of the incremental technique presented in this paper is to compute approximations of alignments, by means of utilizing the concept of prefix-alignments, based on event streams. In particular, we aim at computing these approximations more efficiently w.r.t. simply computing conventional alignments, whilst at the same time limiting the loss in result accuracy.
In general, we conclude that the use of the technique proposed is justified in cases where computational resources are limited, and/or there is a need for high precision, i.e. we aim at high degrees of certainty when we observe a deviation. In cases where computational complexity is not an urgent issue, and/or high recall is more preferable, one can resort to computing conventional alignments. However, recall that conventional alignments initially overestimate alignment costs and thus are not able to properly detect deviations in early stages of a case. Hence, when resorting to using conventional alignments, a warm-up period is advisable.
In the experiments performed using real data, we observe a certain unpredictability w.r.t. memory usage/computational efficiency of prefix-alignment computation, i.e. consider the peeks for window size 20 in Fig. 13. In general, although the search algorithm used in prefix-alignment computation is A * [16], the practical search performance is, however, equal to the performance of Dijkstra's shortest path algorithm [25]. This is mainly related to the fact that when computing prefixalignments we need to resort to a rather inaccurate heuristic function. By partially reverting the alignments, combined with applying the upper-bound pruning, we are able to reduce the search complexity, however, at the cost of losing accuracy. When computing conventional alignments, we are able to resort to a more accurate heuristic which explains the more predictable computational efficiency trends in Fig. 13.

Conclusion
In this paper, we proposed an online, event stream-based, conformance checking technique based on the use of prefixalignments. The algorithm only performs a state-space search to compute a new prefix-alignment if no direct label-or synchronous move is possible. We presented two techniques to increase the search efficiency of the underlying shortest path problems solved. The first technique preserves optimality and allows for effective state-space pruning. The second technique uses an approximation scheme providing a balance between optimality and memory usage. In our evaluation, we primarily focussed on the performance of the underlying shortest path problems solved. Our results show that we are able to effectively prune the state space by using previously computed alignment results. Particularly in terms of memory efficiency, these results are promising. When using our approximation approach, we observe a linear trend in needed memory when increasing window sizes. However, the approximation error seems to decrease more rapidly, i.e. in a nonlinear fashion when increasing window sizes. Future Work We aim to extend our work as follows. We plan to extend our experiments, using more levels of choice/parallelism/loops, more models per level and larger data sets. Moreover, we plan to perform more experiments using real event data. We also plan to define alternative accuracy measures regarding under/overestimation of conventional alignments to more accurately measure the indicative behaviour of prefix-alignments. Finally, the state of a prefix-alignment, in terms of the underlying reference model, carries some predictive value w.r.t. case termination. Thus, in cases we do not know explicit case termination, it is interesting to study the effect of using prefix-alignments as a case termination predictor.