Towards Confirmatory Process Discovery: Making Assertions About the Underlying System
Abstract
The focus in the field of process mining, and process discovery in particular, has thus far been on exploring and describing event data by the means of models. Since the obtained models are often directly based on a sample of event data, the question whether they also apply to the real process typically remains unanswered. As the underlying process is unknown in real life, there is a need for unbiased estimators to assess the system-quality of a discovered model, and subsequently make assertions about the process. In this paper, an experiment is described and discussed to analyze whether existing fitness, precision and generalization metrics can be used as unbiased estimators of system fitness and system precision. The results show that important biases exist, which makes it currently nearly impossible to objectively measure the ability of a model to represent the system.
Keywords
Process mining Process discovery Process quality Fitness Precision Generalization Exploratory data analysis Confirmatory data analysisNotes
Acknowledgements
The computational resources and services used in this work for both process discovery and process conformance tasks were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government.
References
- Adriansyah A, Munoz-Gama J, Carmona J, van Dongen BF, van der Aalst WM (2015) Measuring precision of modeled behavior. Inf Syst e-Bus Manag 13(1):37–67CrossRefGoogle Scholar
- Agrawal R, Gunopulos D, Leymann F (1998) Mining process models from workflow logs. In: Schek HJ, Saltor F, Ramos I, Alonso G (eds) Adv Database Technol - EDBT ’98, vol 1377. Springer, Berlin, pp 467–483CrossRefGoogle Scholar
- Buijs JCAM (2014) Flexible evolutionary algorithms for mining structured process models. Ph.D. thesis, Technische Universiteit Eindhoven, EindhovenGoogle Scholar
- Buijs JCAM, van Dongen BF, van der Aalst WMP (2012) On the role of fitness, precision, generalization and simplicity in process discovery. In: On the move to meaningful internet systems: OTM 2012, Springer, Berlin, pp 305–322Google Scholar
- Cook JE, Wolf AL (1995) Automating process discovery through event-data analysis. In: 17th international conference on software engineering, 1995. ICSE 1995, IEEE, pp 73–73Google Scholar
- Datta A (1998) Automating the discovery of as-is business process models: probabilistic and algorithmic approaches. Inf Syst Res 9(3):275–301CrossRefGoogle Scholar
- Erickson B, Nosanchuk T (1992) Understanding data. McGraw-Hill Education, New YorkGoogle Scholar
- Gelman A (2004) Exploratory data analysis for complex models. J Comput Gr Stat 13(4):755–779CrossRefGoogle Scholar
- Goedertier S, Martens D, Vanthienen J, Baesens B (2009) Robust process discovery with artificial negative events. J Mach Learn Res 10:1305–1340Google Scholar
- Greco G, Guzzo A, Ponieri L, Sacca D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027CrossRefGoogle Scholar
- Janssenswillen G, Depaire B, Jouck T (2016) Calculating the number of unique paths in a block-structured process model. In: Proceedings of the international workshop on algorithms and theories for the analysis of event data 2016Google Scholar
- Janssenswillen G, Donders N, Jouck T, Depaire B (2017) A comparative study of existing quality measures for process discovery. Inf Syst 71:1–15CrossRefGoogle Scholar
- Jouck T, Depaire B (Mar 2016) Generating artificial data for empirical analysis of process discovery algorithms: a process tree and log generator. Technical report, Universiteit Hasselt, HasseltGoogle Scholar
- Kunze M, Luebbe A, Weidlich M, Weske M (2011) Towards understanding process modeling-the case of the BPM academic initiative. In: International workshop on business process modeling notation, Springer, Berlin, pp 44–58Google Scholar
- Leemans SJJ, Fahland D, van der Aalst WMP (2013) Discovering block-structured process models from event logs-a constructive approach. Appl Theory Petri Nets Concurr. Springer, Berlin, pp 311–329CrossRefGoogle Scholar
- Maruster L (2003) A machine learning approach to understand business processes. Technische Universiteit EindhovenGoogle Scholar
- de Medeiros AKA, Weijters AJ, van der Aalst WMP (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2):245–304CrossRefGoogle Scholar
- de Medeiros AKA (2006) Genetic process mining. Ph.D. thesis, Technische Universiteit Eindhoven, EindhovenGoogle Scholar
- Muñoz-Gama J, Carmona J (2010) A fresh look at precision in process conformance. In: Business process management. vol 6336, Springer, Hoboken, pp 211–226Google Scholar
- Rogge-Solti A, Senderovich A, Weidlich M, Mendling J, Gal A (2016) In log and model we trust? In: EMISA, pp 91–94Google Scholar
- Rozinat A, De Medeiros AA, Günther CW, Weijters A, Van der Aalst WM (2007) Towards an evaluation framework for process mining algorithms, vol 123Google Scholar
- Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95CrossRefGoogle Scholar
- Tukey JW (1977) Exploratory data analysis, vol 2. Addison-Wesley, Reading, MAGoogle Scholar
- Tukey JW, Wilk MB (1966) Data analysis and statistics: an expository overview. In: Proceedings of the November 7-10, 1966, fall joint computer conference, ACM, New York, pp 695–709Google Scholar
- van der Aalst WMP, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192CrossRefGoogle Scholar
- van der Aalst WMP (2013) Mediating between modeled and observed behavior: the quest for the “right” process. In: IEEE international conference on research challenges in information science (RCIS 2013), pp 31–43Google Scholar
- van der Aalst WMP (2016) Process mining: data science in action. Springer, BerlinCrossRefGoogle Scholar
- van der Werf JME, van Dongen BF, Hurkens CA, Serebrenik A (2008) Process discovery using integer linear programming. In: International conference on applications and theory of petri nets. Springer, Berlin, pp 368–387Google Scholar
- van Dongen BF, Carmona J, Chatain T (2016) A unified approach for measuring precision and generalization based on anti-alignments. In: International conference on business process management. Springer, ChamGoogle Scholar
- vandenBroucke SKLM, DeWeerdt J, Vanthienen Jan B, Baesens B (2014) Determining process model precision and generalization with weighted artificial negative events. IEEE Trans Knowl Data Eng 26(8):1877–1889CrossRefGoogle Scholar
- Weidlich M, Polyvyanyy A, Desai N, Mendling J, Weske M (2011) Process compliance analysis based on behavioural profiles. Inf Syst 36(7):1009–1025CrossRefGoogle Scholar
- Weijters AJMM, van Der Aalst WMP, De Medeiros AKA (2006) Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP vol 166, pp 1–34Google Scholar