Abstract
The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich field of proposals, state-of-the-art automated process discovery methods suffer from two recurrent deficiencies when applied to real-life logs: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or over-generalize it (low precision). Striking a trade-off between these quality dimensions in a robust and scalable manner has proved elusive. This paper presents an automated process discovery method, namely Split Miner, which produces simple process models with low branching complexity and consistently high and balanced fitness and precision, while achieving considerably faster execution times than state-of-the-art methods, measured on a benchmark covering twelve real-life event logs. Split Miner combines a novel approach to filter the directly-follows graph induced by an event log, with an approach to identify combinations of split gateways that accurately capture the concurrency, conflict and causal relations between neighbors in the directly-follows graph. Split Miner is also the first automated process discovery method that is guaranteed to produce deadlock-free process models with concurrency, while not being restricted to producing block-structured process models.
This is a preview of subscription content, access via your institution.










Notes
In the empirical evaluation, we use \(k=3\) because existing measures of fitness and precision are slow to compute, making high k values impractical.
BPMN allows process models to have multiple start and multiple end events, but such process models can be rewritten as process models with a single start and a single end event; hence, we can restrict ourselves to process models with a single start and a single end event without loss of generality.
Each node of the graph represents a task.
We favor self-loops over concurrency.
First-in first-out.
A split-task represents a syntactical error in the BPMN model.
This happens because by construction a task cannot belong to its own future.
An OR-join is said trivial when its semantic is equivalent to the semantic of an XOR or AND join.
By \(\mathbb {N}_0\), we denote the set of all natural numbers including zero.
Available at http://apromore.org/platform/tools.
We did not hyperparameter-optimize ETM due to the prohibitively high execution times of this method.
References
A new semantics for the inclusive converging gateway in safe processes. In: International conference on business process management, pp 294–309. Springer (2010)
Adriansyah A, Muñoz-Gama J, Carmona J, van Dongen BF, van der Aalst WMP (2015) Measuring precision of modeled behavior. ISeB 13(1):37–67
Adriansyah A, van Dongen BF, van der Aalst WMP (2011) Conformance checking using cost-based fitness analysis. In: Proceedings of EDOC, IEEE
Augusto A, Conforti R, Dumas M, La Rosa M (2017) Split miner: Discovering accurate and simple business process models from event logs. In: Proceedings of the 17th IEEE international conference on data mining. IEEE Computer Society
Augusto A, Conforti R, Dumas M, La Rosa M, Bruno G (2016) Automated discovery of structured process models: Discover structured vs. discover and structure. In Proceedings of ER, LNCS 9974. Springer
Augusto A, Conforti R, Dumas M, La Rosa M, Maggi FM, Marrella A, Mecella M, Soo A (2017) Automated discovery of process models from event logs: Review and benchmark. CoRR, abs/1705.02288
Buijs J, van Dongen B, van der Aalst W (2012) On the role of fitness, precision, generalization and simplicity in process discovery. In: Proceedings of CoopIS, LNCS 7565. Springer
Cardoso JS (2008) Business process control-flow complexity: metric, evaluation, and validation. Int J Web Serv Res 5(2):49–76
Chen W, Lu J, Keech M (2010) Discovering exclusive patterns in frequent sequences. Int J Data Min Model Manag 2(3):252–267
Conforti R, Dumas M, García-Bañuelos L, La Rosa M (2016) BPMN miner: automated discovery of BPMN process models with hierarchical structure. Inf Syst 56:284–303
Conforti R, La Rosa M, ter Hofstede AHM (2017) Filtering out infrequent behavior from business process event logs. IEEE Trans Knowl Data Eng 29(2):300–314
De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271
Ding B, Lo D, Han J, Khoo S-C (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of the international conference on data engineering (ICDE), pp 1024–1035. IEEE
Dumas M, García-Bañuelos L, Polyvyanyy A (2010) Unraveling unstructured process models. In: BPMN Workshop, volume 67 of lecture notes in business information processing, pp 1–7. Springer
Dumas M, La Rosa M, Mendling J, Mäaesalu R, Reijers HA, Semenenko N (2012) Understanding business process models: the costs and benefits of structuredness. In Proceedings of CAiSE. Springer
Leemans SJJ, Fahland D, van der Aalst WM (2014) Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann N, Song M, Wohed P (eds) Business process management workshops: BPM 2013 international workshops. Beijing, China, August 26, 2013, Revised Papers. Springer, Cham, pp 66–78
Leemans S, Fahland D, van der Aalst W (2013) Discovering block-structured process models from event logs - a constructive approach. In: Proceedings of Petri Nets, LNCS. Springer
Lu J, Chen W, Adjei O, Keech M (2008) Sequential patterns postprocessing for structural relation patterns mining. Strategic Advancements in utilizing data mining and warehousing Tech: New Concepts and Developments, p 216
Lu J, Chen W, Keech M (2011) Graph-based modelling of concurrent sequential patterns. Expl Adv in Interdiscip Data Mining and Anal: New Trends, p 110
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289
Mendling J (2008) Metrics for process models: empirical foundations of verification, error prediction, and guidelines for correctness. Springer
Mendling J, Reijers HA, van der Aalst WMP (2010) Seven process modeling guidelines (7PMG). Inf Softw Technol 52(2):127–136
Molka T, Redlich D, Gilani W, Zeng X, Drobek M (2015) Evolutionary computation based discovery of hierarchical business process models. In: Proceedings of BIS. Springer
Pei Jian, Wang Haixun, Liu Jian et al (2006) Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11):1467–1481
Polyvyanyy A (2012) Structuring process models. Ph.d. thesis, Universität Potsdam
Polyvyanyy A, García-Bañuelos L, Dumas M (2012) Structuring acyclic process models. Inf Syst 37(6):518–538
Polyvyanyy A, García-Bañuelos L, Fahland D, Weske M (2014) Maximal structuring of acyclic process models. Comput J 57(1):12–35
Polyvyanyy A, Vanhatalo J, Völzer H (2010) Simplified computation and generalization of the refined process structure tree. In: Web services and formal methods, volume 6551 of LNCS, pp 25-41. Springer
Robotic process automation comes of age. BPTrends Newsletter, (2017)
Tong Y, Zhao L, Yu D, Ma S, Cheng Z, Xu K (2009) Mining compressed repetitive gapped sequential patterns efficiently. Advanced data mining and applications, pp 652–660
van der Aalst WMP (2016) Process mining - data science in action. Springer, Berlin
van der Aalst WMP, Weijters T, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142
van der Aalst WMP, van Hee K, ter Hofstede A, Sidorova N, Verbeek H, Voorhoeve M, Wynn M (2011) Soundness of workflow nets: classification, decidability, and analysis. Form Asp Comput 23(3):333–363
vanden Broucke SK, De Weerdt J (2017) Fodina: a robust and flexible heuristic process discovery technique. Decis Support Syst 100:109–118
Weijters A, Ribeiro J (2011) Flexible Heuristics Miner (FHM). In: Proceedings of CIDM, IEEE
Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60
Acknowledgements
This research is partly funded by the Australian Research Council (Grant DP180102839) and the Estonian Research Council (Grant IUT20-55).
Reproducibility. Links to all tools and datasets required to reproduce the experiments are given in Sections IV.B-IV.C.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Augusto, A., Conforti, R., Dumas, M. et al. Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst 59, 251–284 (2019). https://doi.org/10.1007/s10115-018-1214-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1214-x
Keywords
- Process mining
- Automated process discovery
- Event log
- BPMN