How Much Event Data Is Enough? A Statistical Framework for Process Discovery

  • Martin Bauer
  • Arik Senderovich
  • Avigdor Gal
  • Lars Grunske
  • Matthias WeidlichEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10816)


With the increasing availability of business process related event logs, the scalability of techniques that discover a process model from such logs becomes a performance bottleneck. In particular, exploratory analysis that investigates manifold parameter settings of discovery algorithms, potentially using a software-as-a-service tool, relies on fast response times. However, common approaches for process model discovery always parse and analyse all available event data, whereas a small fraction of a log could have already led to a high-quality model. In this paper, we therefore present a framework for process discovery that relies on statistical pre-processing of an event log and significantly reduce its size by means of sampling. It thereby reduces the runtime and memory footprint of process discovery algorithms, while providing guarantees on the introduced sampling error. Experiments with two public real-world event logs reveal that our approach speeds up state-of-the-art discovery algorithms by a factor of up to 20 .


Process discovery Log pre-processing Log sampling 


  1. 1.
    van der Aalst, W.M.P.: Process Mining - Data Science in Action. Springer, Heidelberg (2016)CrossRefGoogle Scholar
  2. 2.
    Augusto, A., Conforti, R., Dumas, M., Rosa, M.L., Maggi, F.M., Marrella, A., Mecella, M., Soo, A.: Automated discovery of process models from event logs: review and benchmark. CoRR abs/1705.02288 (2017)Google Scholar
  3. 3.
    Wen, L., Wang, J., van der Aalst, W.M.P., Huang, B., Sun, J.: A novel approach for process mining based on event types. J. Intell. Inf. Syst. 32(2), 163–190 (2009)CrossRefGoogle Scholar
  4. 4.
    Senderovich, A., Weidlich, M., Gal, A.: Temporal network representation of event logs for improved performance modelling in business processes. In: Carmona, J., Engels, G., Kumar, A. (eds.) BPM 2017. LNCS, vol. 10445, pp. 3–21. Springer, Cham (2017). Scholar
  5. 5.
    Weijters, A.J.M.M., van der Aalst, W.M.P.: Rediscovering workflow models from event-based data using little thumb. Integr. Comput. Aided Eng. 10(2), 151–162 (2003)Google Scholar
  6. 6.
    Solé, M., Carmona, J.: Process mining from a basis of state regions. In: Lilius, J., Penczek, W. (eds.) PETRI NETS 2010. LNCS, vol. 6128, pp. 226–245. Springer, Heidelberg (2010). Scholar
  7. 7.
    Conforti, R., Dumas, M., García-Bañuelos, L., Rosa, M.L.: BPMN miner: automated discovery of BPMN process models with hierarchical structure. Inf. Syst. 56, 284–303 (2016)CrossRefGoogle Scholar
  8. 8.
    van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: Avoiding over-fitting in ILP-based process discovery. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 163–171. Springer, Cham (2015). Scholar
  9. 9.
    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann, N., Song, M., Wohed, P. (eds.) BPM 2013. LNBIP, vol. 171, pp. 66–78. Springer, Cham (2014). Scholar
  10. 10.
    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 311–329. Springer, Heidelberg (2013). Scholar
  11. 11.
    van der Aalst, W.M.P.: Data scientist: the engineer of the future. In: Mertins, K., Bénaben, F., Poler, R., Bourrières, J.-P. (eds.) Enterprise Interoperability VI. PIC, vol. 7, pp. 13–26. Springer, Cham (2014). Scholar
  12. 12.
    van der Aalst, W.M.P.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011, Part I. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). Scholar
  13. 13.
    Solé, M., Carmona, J.: Region-based foldings in process discovery. IEEE Trans. Knowl. Data Eng. 25(1), 192–205 (2013)CrossRefGoogle Scholar
  14. 14.
    van der Aalst, W.M.P., Verbeek, H.M.W.: Process discovery and conformance checking using passages. Fundam. Inform. 131(1), 103–138 (2014)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Scalable process discovery with guarantees. In: Gaaloul, K., Schmidt, R., Nurcan, S., Guerreiro, S., Ma, Q. (eds.) CAiSE 2015. LNBIP, vol. 214, pp. 85–101. Springer, Cham (2015). Scholar
  16. 16.
    Wang, S., Lo, D., Jiang, L., Maoz, S., Budi, A.: Scalable Parallelization of Specification Mining Using Distributed Computing, pp. 623–648. Morgan Kaufmann, Boston (2015)Google Scholar
  17. 17.
    Evermann, J.: Scalable process discovery using map-reduce. IEEE TSC 9(3), 469–481 (2016)Google Scholar
  18. 18.
    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Using life cycle information in process discovery. In: Reichert, M., Reijers, H.A. (eds.) BPM 2015. LNBIP, vol. 256, pp. 204–217. Springer, Cham (2016). Scholar
  19. 19.
    Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2013)zbMATHGoogle Scholar
  20. 20.
  21. 21.
  22. 22.
    Verbeek, E., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Prom 6: the process mining toolkit. In: BPM Demos. CEUR, vol. 615. (2010)Google Scholar
  23. 23.
    van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.F.: Replaying history on process models for conformance checking and performance analysis. Wiley Interdisc. Rew. Data Mining Knowl. Discov. 2(2), 182–192 (2012)CrossRefGoogle Scholar
  24. 24.
    Burattin, A., Sperduti, A., van der Aalst, W.M.P.: Control-flow discovery from event streams. In: IEEE CEC, pp. 2420–2427. IEEE (2014)Google Scholar
  25. 25.
    Burattin, A., Cimitile, M., Maggi, F.M., Sperduti, A.: Online discovery of declarative process models from event streams. IEEE TSC 8(6), 833–846 (2015)Google Scholar
  26. 26.
    van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: Event stream-based process discovery using abstract representations. CoRR abs/1704.08101 (2017)Google Scholar
  27. 27.
    Van der Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. IEEE TKDE 16(9), 1128–1142 (2004)Google Scholar
  28. 28.
    Badouel, E., Schlachter, U.: Incremental process discovery using petri net synthesis. Fundam. Inform. 154(1–4), 1–13 (2017)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Solé, M., Carmona, J.: Incremental process discovery. TOPNOC 5, 221–242 (2012)zbMATHGoogle Scholar
  30. 30.
    Carmona, J., Cortadella, J.: Process mining meets abstract interpretation. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 184–199. Springer, Heidelberg (2010). Scholar
  31. 31.
    Busany, N., Maoz, S.: Behavioral log analysis with statistical guarantees. In: ICSE, pp. 877–887. ACM (2016)Google Scholar
  32. 32.
    Biermann, A.W., Feldman, J.A.: On the synthesis of finite-state machines from samples of their behavior. IEEE Trans. Comput. 21(6), 592–597 (1972)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Cohen, H., Maoz, S.: Have we seen enough traces? In: ASE. IEEE CS, pp. 93–103 (2015)Google Scholar
  34. 34.
    Wen, L., van der Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2), 145–180 (2007)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Song, W., Jacobsen, H., Ye, C., Ma, X.: Process discovery from dependence-complete event logs. IEEE TSC 9(5), 714–727 (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Martin Bauer
    • 1
  • Arik Senderovich
    • 2
  • Avigdor Gal
    • 2
  • Lars Grunske
    • 1
  • Matthias Weidlich
    • 1
    Email author
  1. 1.Humboldt-Universität zu BerlinBerlinGermany
  2. 2.Technion – Israel Institute of TechnologyHaifaIsrael

Personalised recommendations