Knowledge and Information Systems

, Volume 59, Issue 2, pp 251–284 | Cite as

Split miner: automated discovery of accurate and simple business process models from event logs

  • Adriano AugustoEmail author
  • Raffaele Conforti
  • Marlon Dumas
  • Marcello La Rosa
  • Artem Polyvyanyy
Regular Paper


The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich field of proposals, state-of-the-art automated process discovery methods suffer from two recurrent deficiencies when applied to real-life logs: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or over-generalize it (low precision). Striking a trade-off between these quality dimensions in a robust and scalable manner has proved elusive. This paper presents an automated process discovery method, namely Split Miner, which produces simple process models with low branching complexity and consistently high and balanced fitness and precision, while achieving considerably faster execution times than state-of-the-art methods, measured on a benchmark covering twelve real-life event logs. Split Miner combines a novel approach to filter the directly-follows graph induced by an event log, with an approach to identify combinations of split gateways that accurately capture the concurrency, conflict and causal relations between neighbors in the directly-follows graph. Split Miner is also the first automated process discovery method that is guaranteed to produce deadlock-free process models with concurrency, while not being restricted to producing block-structured process models.


Process mining Automated process discovery Event log BPMN 



This research is partly funded by the Australian Research Council (Grant DP180102839) and the Estonian Research Council (Grant IUT20-55).

Reproducibility. Links to all tools and datasets required to reproduce the experiments are given in Sections IV.B-IV.C.


  1. 1.
    A new semantics for the inclusive converging gateway in safe processes. In: International conference on business process management, pp 294–309. Springer (2010)Google Scholar
  2. 2.
    Adriansyah A, Muñoz-Gama J, Carmona J, van Dongen BF, van der Aalst WMP (2015) Measuring precision of modeled behavior. ISeB 13(1):37–67CrossRefGoogle Scholar
  3. 3.
    Adriansyah A, van Dongen BF, van der Aalst WMP (2011) Conformance checking using cost-based fitness analysis. In: Proceedings of EDOC, IEEEGoogle Scholar
  4. 4.
    Augusto A, Conforti R, Dumas M, La Rosa M (2017) Split miner: Discovering accurate and simple business process models from event logs. In: Proceedings of the 17th IEEE international conference on data mining. IEEE Computer SocietyGoogle Scholar
  5. 5.
    Augusto A, Conforti R, Dumas M, La Rosa M, Bruno G (2016) Automated discovery of structured process models: Discover structured vs. discover and structure. In Proceedings of ER, LNCS 9974. SpringerGoogle Scholar
  6. 6.
    Augusto A, Conforti R, Dumas M, La Rosa M, Maggi FM, Marrella A, Mecella M, Soo A (2017) Automated discovery of process models from event logs: Review and benchmark. CoRR, abs/1705.02288Google Scholar
  7. 7.
    Buijs J, van Dongen B, van der Aalst W (2012) On the role of fitness, precision, generalization and simplicity in process discovery. In: Proceedings of CoopIS, LNCS 7565. SpringerGoogle Scholar
  8. 8.
    Cardoso JS (2008) Business process control-flow complexity: metric, evaluation, and validation. Int J Web Serv Res 5(2):49–76CrossRefGoogle Scholar
  9. 9.
    Chen W, Lu J, Keech M (2010) Discovering exclusive patterns in frequent sequences. Int J Data Min Model Manag 2(3):252–267zbMATHGoogle Scholar
  10. 10.
    Conforti R, Dumas M, García-Bañuelos L, La Rosa M (2016) BPMN miner: automated discovery of BPMN process models with hierarchical structure. Inf Syst 56:284–303CrossRefGoogle Scholar
  11. 11.
    Conforti R, La Rosa M, ter Hofstede AHM (2017) Filtering out infrequent behavior from business process event logs. IEEE Trans Knowl Data Eng 29(2):300–314CrossRefGoogle Scholar
  12. 12.
    De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676CrossRefGoogle Scholar
  13. 13.
    Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Ding B, Lo D, Han J, Khoo S-C (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of the international conference on data engineering (ICDE), pp 1024–1035. IEEEGoogle Scholar
  15. 15.
    Dumas M, García-Bañuelos L, Polyvyanyy A (2010) Unraveling unstructured process models. In: BPMN Workshop, volume 67 of lecture notes in business information processing, pp 1–7. SpringerGoogle Scholar
  16. 16.
    Dumas M, La Rosa M, Mendling J, Mäaesalu R, Reijers HA, Semenenko N (2012) Understanding business process models: the costs and benefits of structuredness. In Proceedings of CAiSE. SpringerGoogle Scholar
  17. 17.
    Leemans SJJ, Fahland D, van der Aalst WM (2014) Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann N, Song M, Wohed P (eds) Business process management workshops: BPM 2013 international workshops. Beijing, China, August 26, 2013, Revised Papers. Springer, Cham, pp 66–78Google Scholar
  18. 18.
    Leemans S, Fahland D, van der Aalst W (2013) Discovering block-structured process models from event logs - a constructive approach. In: Proceedings of Petri Nets, LNCS. SpringerGoogle Scholar
  19. 19.
    Lu J, Chen W, Adjei O, Keech M (2008) Sequential patterns postprocessing for structural relation patterns mining. Strategic Advancements in utilizing data mining and warehousing Tech: New Concepts and Developments, p 216Google Scholar
  20. 20.
    Lu J, Chen W, Keech M (2011) Graph-based modelling of concurrent sequential patterns. Expl Adv in Interdiscip Data Mining and Anal: New Trends, p 110Google Scholar
  21. 21.
    Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289CrossRefGoogle Scholar
  22. 22.
    Mendling J (2008) Metrics for process models: empirical foundations of verification, error prediction, and guidelines for correctness. SpringerGoogle Scholar
  23. 23.
    Mendling J, Reijers HA, van der Aalst WMP (2010) Seven process modeling guidelines (7PMG). Inf Softw Technol 52(2):127–136CrossRefGoogle Scholar
  24. 24.
    Molka T, Redlich D, Gilani W, Zeng X, Drobek M (2015) Evolutionary computation based discovery of hierarchical business process models. In: Proceedings of BIS. SpringerGoogle Scholar
  25. 25.
    Pei Jian, Wang Haixun, Liu Jian et al (2006) Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11):1467–1481CrossRefGoogle Scholar
  26. 26.
    Polyvyanyy A (2012) Structuring process models. Ph.d. thesis, Universität PotsdamGoogle Scholar
  27. 27.
    Polyvyanyy A, García-Bañuelos L, Dumas M (2012) Structuring acyclic process models. Inf Syst 37(6):518–538CrossRefGoogle Scholar
  28. 28.
    Polyvyanyy A, García-Bañuelos L, Fahland D, Weske M (2014) Maximal structuring of acyclic process models. Comput J 57(1):12–35CrossRefGoogle Scholar
  29. 29.
    Polyvyanyy A, Vanhatalo J, Völzer H (2010) Simplified computation and generalization of the refined process structure tree. In: Web services and formal methods, volume 6551 of LNCS, pp 25-41. SpringerGoogle Scholar
  30. 30.
    Robotic process automation comes of age. BPTrends Newsletter, (2017)Google Scholar
  31. 31.
    Tong Y, Zhao L, Yu D, Ma S, Cheng Z, Xu K (2009) Mining compressed repetitive gapped sequential patterns efficiently. Advanced data mining and applications, pp 652–660Google Scholar
  32. 32.
    van der Aalst WMP (2016) Process mining - data science in action. Springer, BerlinCrossRefGoogle Scholar
  33. 33.
    van der Aalst WMP, Weijters T, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142CrossRefGoogle Scholar
  34. 34.
    van der Aalst WMP, van Hee K, ter Hofstede A, Sidorova N, Verbeek H, Voorhoeve M, Wynn M (2011) Soundness of workflow nets: classification, decidability, and analysis. Form Asp Comput 23(3):333–363MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    vanden Broucke SK, De Weerdt J (2017) Fodina: a robust and flexible heuristic process discovery technique. Decis Support Syst 100:109–118CrossRefGoogle Scholar
  36. 36.
    Weijters A, Ribeiro J (2011) Flexible Heuristics Miner (FHM). In: Proceedings of CIDM, IEEEGoogle Scholar
  37. 37.
    Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1):31–60CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.The University of MelbourneMelbourneAustralia
  2. 2.University of TartuTartuEstonia

Personalised recommendations