Skip to main content

Split miner: automated discovery of accurate and simple business process models from event logs


The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich field of proposals, state-of-the-art automated process discovery methods suffer from two recurrent deficiencies when applied to real-life logs: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or over-generalize it (low precision). Striking a trade-off between these quality dimensions in a robust and scalable manner has proved elusive. This paper presents an automated process discovery method, namely Split Miner, which produces simple process models with low branching complexity and consistently high and balanced fitness and precision, while achieving considerably faster execution times than state-of-the-art methods, measured on a benchmark covering twelve real-life event logs. Split Miner combines a novel approach to filter the directly-follows graph induced by an event log, with an approach to identify combinations of split gateways that accurately capture the concurrency, conflict and causal relations between neighbors in the directly-follows graph. Split Miner is also the first automated process discovery method that is guaranteed to produce deadlock-free process models with concurrency, while not being restricted to producing block-structured process models.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10



  2. In the empirical evaluation, we use \(k=3\) because existing measures of fitness and precision are slow to compute, making high k values impractical.

  3. BPMN allows process models to have multiple start and multiple end events, but such process models can be rewritten as process models with a single start and a single end event; hence, we can restrict ourselves to process models with a single start and a single end event without loss of generality.

  4. Each node of the graph represents a task.

  5. We favor self-loops over concurrency.

  6. First-in first-out.

  7. A split-task represents a syntactical error in the BPMN model.

  8. This happens because by construction a task cannot belong to its own future.

  9. An OR-join is said trivial when its semantic is equivalent to the semantic of an XOR or AND join.

  10. By \(\mathbb {N}_0\), we denote the set of all natural numbers including zero.

  11. Available at


  13. We did not hyperparameter-optimize ETM due to the prohibitively high execution times of this method.


  1. A new semantics for the inclusive converging gateway in safe processes. In: International conference on business process management, pp 294–309. Springer (2010)

  2. Adriansyah A, Muñoz-Gama J, Carmona J, van Dongen BF, van der Aalst WMP (2015) Measuring precision of modeled behavior. ISeB 13(1):37–67

    Article  Google Scholar 

  3. Adriansyah A, van Dongen BF, van der Aalst WMP (2011) Conformance checking using cost-based fitness analysis. In: Proceedings of EDOC, IEEE

  4. Augusto A, Conforti R, Dumas M, La Rosa M (2017) Split miner: Discovering accurate and simple business process models from event logs. In: Proceedings of the 17th IEEE international conference on data mining. IEEE Computer Society

  5. Augusto A, Conforti R, Dumas M, La Rosa M, Bruno G (2016) Automated discovery of structured process models: Discover structured vs. discover and structure. In Proceedings of ER, LNCS 9974. Springer

  6. Augusto A, Conforti R, Dumas M, La Rosa M, Maggi FM, Marrella A, Mecella M, Soo A (2017) Automated discovery of process models from event logs: Review and benchmark. CoRR, abs/1705.02288

  7. Buijs J, van Dongen B, van der Aalst W (2012) On the role of fitness, precision, generalization and simplicity in process discovery. In: Proceedings of CoopIS, LNCS 7565. Springer

  8. Cardoso JS (2008) Business process control-flow complexity: metric, evaluation, and validation. Int J Web Serv Res 5(2):49–76

    Article  Google Scholar 

  9. Chen W, Lu J, Keech M (2010) Discovering exclusive patterns in frequent sequences. Int J Data Min Model Manag 2(3):252–267

    MATH  Google Scholar 

  10. Conforti R, Dumas M, García-Bañuelos L, La Rosa M (2016) BPMN miner: automated discovery of BPMN process models with hierarchical structure. Inf Syst 56:284–303

    Article  Google Scholar 

  11. Conforti R, La Rosa M, ter Hofstede AHM (2017) Filtering out infrequent behavior from business process event logs. IEEE Trans Knowl Data Eng 29(2):300–314

    Article  Google Scholar 

  12. De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676

    Article  Google Scholar 

  13. Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271

    Article  MathSciNet  MATH  Google Scholar 

  14. Ding B, Lo D, Han J, Khoo S-C (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of the international conference on data engineering (ICDE), pp 1024–1035. IEEE

  15. Dumas M, García-Bañuelos L, Polyvyanyy A (2010) Unraveling unstructured process models. In: BPMN Workshop, volume 67 of lecture notes in business information processing, pp 1–7. Springer

  16. Dumas M, La Rosa M, Mendling J, Mäaesalu R, Reijers HA, Semenenko N (2012) Understanding business process models: the costs and benefits of structuredness. In Proceedings of CAiSE. Springer

  17. Leemans SJJ, Fahland D, van der Aalst WM (2014) Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann N, Song M, Wohed P (eds) Business process management workshops: BPM 2013 international workshops. Beijing, China, August 26, 2013, Revised Papers. Springer, Cham, pp 66–78

  18. Leemans S, Fahland D, van der Aalst W (2013) Discovering block-structured process models from event logs - a constructive approach. In: Proceedings of Petri Nets, LNCS. Springer

  19. Lu J, Chen W, Adjei O, Keech M (2008) Sequential patterns postprocessing for structural relation patterns mining. Strategic Advancements in utilizing data mining and warehousing Tech: New Concepts and Developments, p 216

  20. Lu J, Chen W, Keech M (2011) Graph-based modelling of concurrent sequential patterns. Expl Adv in Interdiscip Data Mining and Anal: New Trends, p 110

  21. Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289

    Article  Google Scholar 

  22. Mendling J (2008) Metrics for process models: empirical foundations of verification, error prediction, and guidelines for correctness. Springer

  23. Mendling J, Reijers HA, van der Aalst WMP (2010) Seven process modeling guidelines (7PMG). Inf Softw Technol 52(2):127–136

    Article  Google Scholar 

  24. Molka T, Redlich D, Gilani W, Zeng X, Drobek M (2015) Evolutionary computation based discovery of hierarchical business process models. In: Proceedings of BIS. Springer

  25. Pei Jian, Wang Haixun, Liu Jian et al (2006) Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11):1467–1481

    Article  Google Scholar 

  26. Polyvyanyy A (2012) Structuring process models. Ph.d. thesis, Universität Potsdam

  27. Polyvyanyy A, García-Bañuelos L, Dumas M (2012) Structuring acyclic process models. Inf Syst 37(6):518–538

    Article  Google Scholar 

  28. Polyvyanyy A, García-Bañuelos L, Fahland D, Weske M (2014) Maximal structuring of acyclic process models. Comput J 57(1):12–35

    Article  Google Scholar 

  29. Polyvyanyy A, Vanhatalo J, Völzer H (2010) Simplified computation and generalization of the refined process structure tree. In: Web services and formal methods, volume 6551 of LNCS, pp 25-41. Springer

  30. Robotic process automation comes of age. BPTrends Newsletter, (2017)

  31. Tong Y, Zhao L, Yu D, Ma S, Cheng Z, Xu K (2009) Mining compressed repetitive gapped sequential patterns efficiently. Advanced data mining and applications, pp 652–660

  32. van der Aalst WMP (2016) Process mining - data science in action. Springer, Berlin

    Book  Google Scholar 

  33. van der Aalst WMP, Weijters T, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142

    Article  Google Scholar 

  34. van der Aalst WMP, van Hee K, ter Hofstede A, Sidorova N, Verbeek H, Voorhoeve M, Wynn M (2011) Soundness of workflow nets: classification, decidability, and analysis. Form Asp Comput 23(3):333–363

    Article  MathSciNet  MATH  Google Scholar 

  35. vanden Broucke SK, De Weerdt J (2017) Fodina: a robust and flexible heuristic process discovery technique. Decis Support Syst 100:109–118

    Article  Google Scholar 

  36. Weijters A, Ribeiro J (2011) Flexible Heuristics Miner (FHM). In: Proceedings of CIDM, IEEE

  37. Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60

    Article  MATH  Google Scholar 

Download references


This research is partly funded by the Australian Research Council (Grant DP180102839) and the Estonian Research Council (Grant IUT20-55).

Reproducibility. Links to all tools and datasets required to reproduce the experiments are given in Sections IV.B-IV.C.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Adriano Augusto.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Augusto, A., Conforti, R., Dumas, M. et al. Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst 59, 251–284 (2019).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Process mining
  • Automated process discovery
  • Event log
  • BPMN