KeywordsConformance Checking Token Replay Process Model Repair Artiﬁcial Negative Events Unseen Behavior
Given an event log and a process model from the same process, conformance checking compares the recorded event data with the model to identify commonalities and discrepancies. The conformance between a log and model can be quantified with respect to different quality dimensions: fitness, precision, and generalization.
Conformance checking compares an event log with a process model of the same process (Munoz-Gama 2016). An event log is composed of a series of log traces where each log trace relates to the sequence of observed events of a process instance, i.e., a case. An event can be related to a particular activity in the process but can also record many other process information such as time stamp, resource, and cost. In a real-life context, event logs can be extracted from Process-Aware Information Systems (PAIS) such as workflow management (WFM) systems, business process management (BPM) systems, or typical relational databases, such as SAP database. Similarly, process models can often be extracted from the organization’s information systems. These can be normative models that the organization uses to manage their process, or descriptive, created by hand or automatically discovered to gain insight into their processes (van der Aalst 2013).
Depending on the nature of the model, discrepancies between the log and model can have different interpretations (van der Aalst 2016). For a normative model, deviations indicate violations of imposed constraints. For example, a banking process may require the processing and approval of a loan to be done by different employees to avoid the risk of misconduct (four-eyes principle). Clearly, conformance checking between an event log of the handled loan applications and the process model can be applied to assess compliance. On the other hand, for a descriptive model, deviations indicate that the model is not fully capturing all the observed behavior in the log. For example, process analysts might perform conformance checking on the models discovered by different process discovery algorithms before selecting the ones that are of sufficient quality for further analysis.
Dimensions of Conformance
Through conformance checking, commonalities and discrepancies between a log and model are quantified. One simple idea would be to consider that a log and model are conforming with each other if the observed behavior in the log is captured by the model. This means that a log and model are perfectly conforming if all the log traces can be fitted to the model. However, this can be easily achieved with a model that allows any behavior. Such models do not provide much information to the data analyst about the process. This shows that there is a need to consider conformance with respect to different dimensions.
Currently, conformance is generally considered with respect to three dimensions – fitness, precision, and generalization.
Fitness relates to how well a model and log fit each other. A log trace perfectly fits the model if it can be replayed onto the model and corresponds to a complete model trace. For example, 〈Start Processing, Evaluate Project, Evaluate Academic Record, Evaluate Advisor CV, Final Evaluation, Accept, Notify Results〉 perfectly fits the model in Fig. 1 since each of the observed steps can be sequentially replayed at the model, and the trace corresponds to a particular possible way to execute the process model. However, the trace 〈Start Processing, Evaluate Project, Evaluate Academic Record, Final Evaluation, Reject, Notify Results〉 does not fit the model because the advisor’s CV (Evaluate Advisor CV) is never evaluated. This suggests that the corresponding application has been rejected without proper evaluation.
Some authors consider a fourth dimension called simplicity, relating to the model complexity, i.e., simple models should be preferred over complex models if both describe the same behavior. However this quality dimension relates only to the model and therefore is not normally measured by conformance checking techniques. This dimension is covered in the Automated Process Discovery entry of this encyclopedia.
Overall, while the three quality dimensions are orthogonal to each other, in a real-life context, one is unlikely to find a pair of log and model that are in perfect conformance (i.e., perfectly fitting, precise, and generalizing). Often times, different scenarios may require different conformance levels and prioritization of the quality dimensions. For example, to analyze the well-established execution paths of a process, an analyst might prioritize fitness over the other dimensions. On the other hand, if an event log only contains a small number of cases, generalization would likely to be prioritized over the other dimensions to account for possible future behavior.
Types of Conformance
Conformance checking techniques can be applied to understand and quantify these relationships between a log and model. There is a large collection of approaches and metrics that are based on different ways to compare a log and model.
Key Research Findings
In this section, two conformance checking approaches and three conformance metrics are presented.
Token replay is a replay-based conformance checking approach that measures the fitness between a log and model by replaying log traces onto process models denoted in the Petri net notation (Rozinat and van der Aalst 2008).
This fitness metric can be extended to the log level by considering the number of produced, consumed, missing, and remaining tokens from the token replay of all log traces.
The token replay approach can easily identify deviating traces in an event log. Moreover, the deviation severity can be compared using a fitness metric computed from the number of produced, consumed, missing, and remaining tokens. However, the token replay approach is prone to creating too many tokens for highly deviating traces so that any behavior is allowed. This can lead to an overestimation of the fitness. The approach is also specific to the Petri net notation. More importantly, in the case of a deviating trace, the approach does not provide a model explanation of the log trace. For example, the deviations in trace t2 = 〈a, c, b, e, f, h〉 can be explained if it was considered with respect to the complete model trace 〈a, c, b, d, e, f, h〉. From this mapping, it is clear that the log trace is missing the execution of activity d (Evaluate Advisor CV). These mappings from log traces to model traces were introduced as alignments to address this limitation (van der Aalst et al. 2012).
Alignments can also be used to compute conformance metrics with respect to the different quality dimensions.
Cost-Based Fitness Metric
Escaping Arc Precision
Artificial Negative Events
Another approach to measure precision is through artificial negative events. Artificial negative events are induced by observing events that did not occur in the event log. These unobserved events (i.e., negative events) give information about things that are not allowed to occur in the process. Assuming that the event log gives a complete view of the process (i.e., a log completeness assumption), the precision of the process model can be computed using artificial negative events and the concepts of precision and recall in data mining.
Artificial negative events can be induced by grouping similar traces and then observing the events that did not occur for every event in the traces. Under the log completeness assumption, this means that these unobserved events are negative events that are not allowed to happen by the process (Goedertier et al. 2009).
The use of artificial negative events can also be extended to quantify generalization and to compute a precision metric that is more robust against less complete event logs. This is achieved by extending the artificial negative event induction strategy to assign weights to the induced negative events (vanden Broucke et al. 2014).
Examples of Application
All of the conformance checking techniques presented in the previous section have been implemented and are applicable to most real-life scenarios. In the following, the A∗ cost-based alignment technique is applied to a real-life dataset to illustrate how conformance checking can be applied to gain insights about a process. This example utilizes the data presented in the “Conformance Checking: What does your process do when you are not watching?” tutorial by de Leoni, van Dongen, and Munoz-Gama at the “15th International Conference on Business Process Management (BPM17).”
As previously presented, a process model and an event log are required to perform conformance checking. The real-life event log is taken from a Dutch financial institute and is of an application process for a personal loan or overdraft within a global financial organization (van Dongen 2012). This means that each case in the log records the occurred events of a particular loan application. The log contains some 262,200 events in 13,087 cases. Apart from some anonymization, the data is presented as it is recorded in the financial institute. The log is merged from three intertwined subprocesses so that the originating subprocess of each event can be identified by the first letter of the activity recorded by the event. In this example, the log is filtered so that it only contains events from two of the subprocesses: the process which records the state of the application (identifiable by “A_”) and the process which records the state of an offer communicated to the applicant (identifiable by “O_”). The model has been created with the help of domain experts and can be assumed to be a realistic representation of the underlying process.
Future Directions for Research
While there have been significant advances in the research of conformance checking over the recent years, there are still many open challenges and research opportunities. Some of them include:
The proposed three quality dimensions (fitness, precision, and generalization) have been widely accepted, but there is still a need for further understanding on how to interpret and quantify them through metrics. Furthermore, conformance can be extended beyond the current three dimensions, e.g., log completeness to quantify whether if the observed event data gives the full picture of the underlying process (Janssenswillen et al. 2017).
Big Data and Real Time
Process mining techniques and tools are getting applied to larger and more complex processes. This means that they have to be scalable to handle the increased size and complexity. In fact, much of the recent research efforts in conformance checking have been focused on this issue. Related research lines include decomposed conformance checking (Munoz-Gama 2016) and online conformance checking for event streams (Burattin 2015).
Conformance Diagnosis and Process Model Repair
It is not enough to just identify conformance issues; good diagnostic and visualization tools are crucial in helping the analyst identify and understand the root causes of the conformance issues. While there has been work done in this aspect of conformance checking, e.g., Buijs and Reijers (2014) and Munoz-Gama et al. (2014), there is much more to be done to provide better conformance diagnosis technology, e.g., new techniques and user study. Finally, once the differences between the model and the log have been diagnosed, the user may wish to repair the model in order to fix such differences and achieve a model that better describes the real process executed. This topic is extensively covered in the Process Model Repair entry of this encyclopedia.
- Adriansyah A (2014) Aligning observed and modeled behavior. Ph.D. thesis, Eindhoven University of TechnologyGoogle Scholar
- Adriansyah A, Munoz-Gama J, Carmona J, van Dongen BF, van der Aalst WMP (2012) Alignment based precision checking. In: Rosa ML, Soffer P (eds) Business process management workshops – BPM 2012 international workshops, Tallinn, Estonia, 3 Sept 2012. Revised papers. Lecture notes in business information processing, vol 132. Springer, pp 137–149CrossRefGoogle Scholar
- Buijs, JCAM, Reijers HA (2014) Comparing business process variants using models and event logs. In: Enterprise, business-process and information systems modeling – 15th international conference, BPMDS 2014, 19th international conference, EMMSAD 2014, held at CAiSE 2014, Thessaloniki, Greece, 6–17 June 1 2014. Proceedings, pp 154–168Google Scholar
- van der Aalst WMP (2013) Mediating between modeled and observed behavior: the quest for the “right” process: keynote. In: RCIS, pp 1–12. IEEEGoogle Scholar
- van Dongen BF (2012) BPI challenge 2012. 4TU Datacentrum. Eindhoven University of TechnologyGoogle Scholar