Advertisement

Business & Information Systems Engineering

, Volume 61, Issue 6, pp 695–712 | Cite as

Generating Artificial Data for Empirical Analysis of Control-flow Discovery Algorithms

A Process Tree and Log Generator
  • Toon JouckEmail author
  • Benoît Depaire
Research Paper

Abstract

Within the process mining domain, research on comparing control-flow (CF) discovery techniques has gained importance. A crucial building block of empirical analysis of CF discovery techniques is obtaining the appropriate evaluation data. Currently, there is no answer to the question of how to collect such evaluation data. The paper introduces a methodology for generating artificial event data (GED) and an implementation called the Process Tree and Log Generator. The GED methodology and its implementation provide users with full control over the characteristics of the generated event data and an integration within the ProM framework. Unlike existing approaches, there is no tradeoff between including long-term dependencies and soundness of the process. The contributions of the paper provide a solution for a necessary step in the empirical analysis of CF discovery algorithms.

Keywords

Artificial event logs Process discovery Empirical analysis 

Notes

Acknowledgements

The authors would like to thank Massimiliano de Leoni and Alfredo Bolt for their advice and support to implement the PTandLogGenerator.

References

  1. Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17:1–10. http://jmlr.org/papers/v17/benavoli16a.html. Accessed 21 Oct 2017
  2. Box GE, Hunter JS, Hunter WG (2005) Statistics for experimenters: design, innovation, and discovery, vol 2. Wiley, New York. http://stats.cwslive.wiley.com/details/book/2791421/Statistics-for-Experimenters-Design-Innovation-and-Discovery-2nd-Edition.html. Accessed 16 Jan 2017
  3. Buijs JCAM (2014) Flexible evolutionary algorithms for mining structured process models. PhD thesis, Technische Universiteit Eindhoven, Eindhoven. http://alexandria.tue.nl/extra2/780920.pdf. Accessed 23 Feb 2015
  4. Burattin A (2015) PLG2: Multiperspective processes randomization and simulation for online and offline settings. Tech. rep., University of Innsbruck. https://arxiv.org/abs/1506.08415. Accessed 28 July 2016
  5. Burattin A, Sperduti A (2011) PLG: a framework for the generation of business process models and their execution logs. In: business process management workshops, Springer, Heidelberg, pp 214–219. http://link.springer.com/chapter/10.1007/978-3-642-20511-8sps20. Accessed 06 Jan 2015
  6. De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676.  https://doi.org/10.1016/j.is.2012.02.004. http://www.sciencedirect.com/science/article/pii/S0306437912000464. Accessed 10 Dec 2013
  7. de Leoni M, van der Aalst WM, Dees M (2016) A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf Syst 56:235–257. http://www.sciencedirect.com/science/article/pii/S0306437915001313. Accessed 19 Dec 2016
  8. de Medeiros AKA, Weijters AJ, van der Aalst WM (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2):245–304. http://link.springer.com/article/10.1007/s10618-006-0061-7. Accessed 26 May 2014
  9. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30. http://dl.acm.org/citation.cfm?id=1248548. Accessed 26 May 2014
  10. Dumas M, La Rosa M, Mendling J, Reijers HA (2013) Fundamentals of business process management. Springer, Heidelberg. http://link.springer.com/content/pdf/10.1007/978-3-642-33143-5.pdf. Accessed 15 Sept 2015
  11. Günther CW (2009) Process mining in flexible environments. PhD thesis, Technische Universiteit Eindhoven. http://www.narcis.nl/publication/RecordID/oai:library.tue.nl:644335. Accessed 04 Apr 2016
  12. Jensen K, Kristensen LM, Wells L (2007) Coloured petri nets and CPN tools for modelling and validation of concurrent systems. Int J Softw Tools Technol Transf 9(3-4):213–254. http://link.springer.com/article/10.1007/s10009-007-0038-x. Accessed 26 May 2014
  13. Jin T, Wang J, Wen L (2011) Efficiently querying business process models with BeehiveZ. In: BPM (Demos), http://ceur-ws.org/Vol-820/Demo1.pdf. Accessed 06 Nov 2013
  14. Johannesson P, Perjons E (2014) An introduction to design science. Springer, Heidelberg. http://books.google.be/books?hl=nl&lr=&id=ovvFBAAAQBAJ&oi=fnd&pg=PR5&dq=an+introduction+to+design+science&ots=r45U7mRMl4&sig=FuiaKJ1PoRCdgTev-IQAIebgcfE. Accessed 05 Dec 2014
  15. Jouck T, Depaire B (2016) PTandLogGenerator: a generator for artificial event data. In: Proceedings of the BPM Demo Track 2016 (BPMD 2016), CEUR workshop proceedings, Rio de Janeiro, vol 1789, pp 23–27. http://ceur-ws.org/Vol-1789/. Accessed 05 Jan 2018
  16. Jouck T, Depaire B (2017) Simulating process trees using discrete-event simulation. Technical Report, Hasselt University. https://uhdspace.uhasselt.be/dspace/handle/1942/23130. Accessed 05 Jan 2018
  17. Kataeva V, Moscow RF, Kalenkova AA (2014) Applying graph grammars for the generation of process models and their logs. In: Proceedings of the spring/summer young researchers colloquium on software engineering, http://syrcose.ispras.ru/2014/files/submissions/12spssyrcose2014.pdf. Accessed 19 Dec 2014
  18. Leemans SJ, Fahland D, van der Aalst WM (2014) Discovering block-structured process models from event logs containing infrequent behaviour. In: Business process management workshops, Springer, Heidelberg, pp 66–78. http://link.springer.com/chapter/10.1007/978-3-319-06257-0sps6. Accessed 27 Oct 2015
  19. Matloff N (2008) Introduction to discrete-event simulation and the simpy language. Davis, CA Dept of Computer Science University of California at Davis Retrieved on August 2:2009. http://web.cs.ucdavis.edu/~matloff/matloff/publicspshtml/SimCourse/PLN/DESimIntro.pdf. Accessed 20 Mar 2016
  20. Mitsyuk AA, Shugurov IS, Kalenkova AA, van der Aalst WM (2017) Generating event logs for high-level process models. Simul Model Pract Theor 74:1–16. http://www.sciencedirect.com/science/article/pii/S1569190X17300047. Accessed 05 Jan 2018
  21. Robinson S (2014) Simulation: the practice of model development and use. Palgrave Macmillan, Basingstoke. https://books.google.be/books?hl=nl&lr=&id=TEMdBQAAQBAJ&oi=fnd&pg=PP1&dq=Simulation+%E2%80%93+The+practice+of+model+development+and+use&ots=XIP9NsOH2J&sig=ASAxxwYB2hSCFVqaAAuJFe4nBbs. Accessed 25 Aug 2016
  22. Rozinat A, de Medeiros AA, Gnther CW, Weijters A, van der Aalst WM (2007) Towards an evaluation framework for process mining algorithms. BPM Center Report BPM-07-06, http://alexandria.tue.nl/repository/books/630086.pdf. Accessed 04 Feb 2014
  23. Russell N, ter Hofstede AHM, van der Aalst WMP, Mulyar N (2006) Workflow controlflow patterns: a revised view. Tech. Rep. 06-22. https://www.bpmcenter.org/. Accessed 20 Feb 2015
  24. Shannon RE (1977) Introduction to simulation languages. In: Proceedings of the 9th conference on winter simulation-Volume 1, winter simulation conference, pp 14–20. http://dl.acm.org/citation.cfm?id=807515. Accessed 22 Nov 2016
  25. Stocker T, Accorsi R (2013) Secsy: security-aware synthesis of process event logs. In: Workshop on enterprise modelling and information systems architectures, pp 71–84. https://pdfs.semanticscholar.org/fa29/18da96fa73fe6430233ea6b9403c86fd6797.pdf. Accessed 05 Jan 2018
  26. Van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192. http://onlinelibrary.wiley.com/doi/10.1002/widm.1045/full. Accessed 06 Feb 2015
  27. van derWerf JME, van Dongen BF, Hurkens CA, Serebrenik A (2009) Process discovery using integer linear programming. Fund Inf 94(3):387–412. http://iospress.metapress.com/index/CX85102T26280611.pdf. Accessed 26 May 2014
  28. van Dongen BF, De Medeiros AA, Wen L (2009) Process mining: overview and outlook of petri net discovery algorithms. In: Transactions on petri nets and other models of concurrency II, Springer, Heidelberg, pp 225–242. http://link.springer.com/chapter/10.1007/978-3-642-00899-3sps13. Accessed 01 June 2014
  29. van Hee KM, Liu Z (2010) Generating benchmarks by random stepwise refinement of petri nets. In: ACSD/Petri Nets Workshops, pp 403–417. http://ceur-ws.org/Vol-827/31spsKeesHeespsarticle.pdf?origin=publicationspsdetail. Accessed 25 Nov 2014
  30. van der Aalst W (2016) Process mining: data science in action. Springer, Heidelberg. https://books.google.be/books?hl=nl&lr=&id=hUEGDAAAQBAJ&oi=fnd&pg=PR5&dq=process+mining+data+science+in+action&ots=ZBhPEo-BpL&sig=Ahy9qBgJGES4kWX3NnsNeGu6ekY. Accessed 17 Nov 2016
  31. van der Aalst WMP (1998) The application of Petri nets to workflow management. J Circ Syst Comput 8(01):21–66. http://www.worldscientific.com/doi/abs/10.1142/S0218126698000043. Accessed 20 Feb 2015
  32. van der Aalst W, Buijs J, Van Dongen B (2012) Towards improving the representational bias of process mining. In: Data-driven process discovery and analysis, Springer, Heidelberg, pp 39–54. http://link.springer.com/chapter/10.1007/978-3-642-34044-4_3. Accessed 20 Feb 2015
  33. vanden Broucke SK, Delvaux C, Freitas J, Rogova T, Vanthienen J, Baesens B (2014) Uncovering the relationship between event log characteristics and process discovery techniques. In: Business process management workshops, Springer, Heidelberg, pp 41–53. http://link.springer.com/chapter/10.1007/978-3-319-06257-0sps4. Accessed 19 Sept 2014
  34. Verbeek HMW, Buijs JCAM, Van Dongen BF, Van Der Aalst WMP (2011) Xes, xesame, and prom 6. In: Soffer P, Proper E (eds) Information systems evolution, Springer, Heidelberg, pp 60–75. http://link.springer.com/chapter/10.1007/978-3-642-17722-4sps5. Accessed 20 Feb 2015
  35. Wang J, Wong RK, Ding J, Guo Q, Wen L (2012) On recommendation of process mining algorithms. In: IEEE 19th International conference on web services (ICWS), IEEE, pp 311–318. http://ieeexplore.ieee.org/xpls/absspsall.jsp?arnumber=6257822. Accessed 30 Dec 2015
  36. Weber P, Bordbar B, Tino P (2013) A framework for the analysis of process mining algorithms. IEEE Transact Syst Man Cybern Syst 43(2):303–317. http://ieeexplore.ieee.org/xpls/absspsall.jsp?arnumber=6202711. Accessed 16 Jan 2017
  37. Weijters A, Ribeiro J (2011) Flexible heuristics miner (FHM). In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), pp 310–317Google Scholar
  38. Wen L, van der Aalst WM, Wang J, Sun J (2007) Mining process models with non-free-choice constructs. Data Min Knowl Discov 15(2):145–180. http://link.springer.com/article/10.1007/s10618-007-0065-y. Accessed 26 May 2014

Copyright information

© Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Business EconomicsHasselt UniversityDiepenbeekBelgium

Personalised recommendations