Data Mining and Knowledge Discovery

, Volume 14, Issue 2, pp 245–304

Genetic process mining: an experimental evaluation

  • A. K. A. de Medeiros
  • A. J. M. M. Weijters
  • W. M. P. van der Aalst
Open Access
Article

Abstract

One of the aims of process mining is to retrieve a process model from an event log. The discovered models can be used as objective starting points during the deployment of process-aware information systems (Dumas et al., eds., Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, New York, 2005) and/or as a feedback mechanism to check prescribed models against enacted ones. However, current techniques have problems when mining processes that contain non-trivial constructs and/or when dealing with the presence of noise in the logs. Most of the problems happen because many current techniques are based on local information in the event log. To overcome these problems, we try to use genetic algorithms to mine process models. The main motivation is to benefit from the global search performed by this kind of algorithms. The non-trivial constructs are tackled by choosing an internal representation that supports them. The problem of noise is naturally tackled by the genetic algorithm because, per definition, these algorithms are robust to noise. The main challenge in a genetic approach is the definition of a good fitness measure because it guides the global search performed by the genetic algorithm. This paper explains how the genetic algorithm works. Experiments with synthetic and real-life logs show that the fitness measure indeed leads to the mining of process models that are complete (can reproduce all the behavior in the log) and precise (do not allow for extra behavior that cannot be derived from the event log). The genetic algorithm is implemented as a plug-in in the ProM framework.

Keywords

Process mining Genetic mining Genetic algorithms Petri nets Workflow nets 

References

  1. van der Aalst WMP, Alves de Medeiros AK, Weijters AJMM (2005) Genetic process mining. In: Proceedings of the 26th international conference on applications and theory of Petri nets. Lecture notes in computer science, vol 3536. Springer, MiamiGoogle Scholar
  2. van der Aalst WMP, van Dongen BF (2002) Discovering workflow performance models from timed logs. In: Han Y, Tai S, Wikarski D (eds) International conference on engineering and deployment of cooperative information systems (EDCIS 2002). Lecture notes in computer science, vol 2480. Springer, Berlin, pp 45–63Google Scholar
  3. van der Aalst WMP, van Dongen BF, Herbst J, Maruster L, Schimm G, Weijters AJMM (2003) Workflow mining: a survey of issues and approaches. Data Knowl Eng 47(2):237–267CrossRefGoogle Scholar
  4. van der Aalst WMP, Song M (2004) Mining social networks: uncovering interaction patterns in business processes. In: Desel J, Pernici B, Weske M (eds) International conference on business process management (BPM 2004) Lecture notes in computer science, vol 3080. Springer, Berlin, pp 244–260Google Scholar
  5. van der Aalst WMP, Weijters AJMM (eds) (2004) Process mining. Special issue of computers in industry, vol 53. Elsevier, AmsterdamGoogle Scholar
  6. van der Aalst WMP, Weijters AJMM, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142CrossRefGoogle Scholar
  7. Agrawal R, Gunopulos D, Leymann F (1998) Mining process models from workflow logs. In: Ramos I, Alonso G, Schek H-J, Saltor F (eds) Advances in database technology—EDBT’98: 6th international conference on extending database technology. Lecture notes in computer science, vol 1377. Springer-Verlag, London, UK, pp 469–483 (ISBN: 3-540-64264-1)Google Scholar
  8. Alves de Medeiros AK, van Dongen BF, van der Aalst WMP, Weijters AJMM (2004a) Process mining: extending the α-algorithm to mine short loops. BETA working paper series, WP 113, Eindhoven University of Technology, EindhovenGoogle Scholar
  9. Alves de Medeiros AK, Weijters AJMM, van der Aalst WMP (2004b) Using genetic algorithms to mine process models: representation, operators and results. BETA working paper series, WP 124, Eindhoven University of Technology, EindhovenGoogle Scholar
  10. Alves de Medeiros AK, Weijters AJMM, van der Aalst WMP (2006) Genetic process mining: a basic approach and its challenges. In: Business process management 2005 workshops. Lecture notes in computer science, vol 3812. Springer, Berlin, pp 203–215Google Scholar
  11. Alves de Medeiros AK, van der Aalst WMP, Weijters AJMM (2003) Workflow mining: current status and future directions. In: Meersman R, Tari Z, Schmidt DC (eds) On the move to meaningful internet systems 2003: CoopIS, DOA, and ODBASE. Lecture notes in computer science, vol 2888. Springer, Berlin, pp 389–406Google Scholar
  12. Angluin D, Smith CH (1983) Inductive inference: theory and methods. Comput Surv 15(3):237–269CrossRefMathSciNetGoogle Scholar
  13. Bourdeaud’huy T, Yim P (2002) Petri net controller synthesis using genetic search. In: Proceedings of the 2nd IEEE international conference on systems, man and cybernetics (SMC’02), vol 1, IEEE Computer Society Press, Hammamet, Tunisia, 6–9 October 2002, pp 528–533Google Scholar
  14. Cook JE, Du Z, Liu C, Wolf AL (2004) Discovering models of behavior for concurrent workflows. Comput Ind 53(3):297–319CrossRefGoogle Scholar
  15. Cook JE, Wolf AL (1998b) Event-based detection of concurrency. In: Proceedings of the 6th international symposium on the foundations of software engineering (FSE-6). ACM Press, New York, NY, USA, pp 35–45Google Scholar
  16. Cook JE, Wolf AL (1999) Software process validation: quantitatively measuring the correspondence of a process to a model. ACM Trans Softw Eng Methodol 8(2):147–176CrossRefGoogle Scholar
  17. Cook JE, Wolf AL (1998a) Discovering models of software processes from event-based data. ACM Trans Softw Eng Methodol 7(3):215–249CrossRefGoogle Scholar
  18. Desel J, Esparza J (1995) Free choice Petri nets. Cambridge tracts in theoretical computer science, vol 40. Cambridge University Press, Cambridge UKGoogle Scholar
  19. Dumas M, van der Aalst WMP, ter Hofstede AH (eds) (2005) Process-aware information systems: bridging people and software through process technology. Wiley, New YorkGoogle Scholar
  20. Eder J, Olivotto GE, Gruber W (2002) A data warehouse for workflow logs. In: Han Y, Tai S, Wikarski D (eds) International conference on engineering and deployment of cooperative information systems (EDCIS 2002). Lecture notes in computer science, vol 2480. Springer, Berlin, pp 1–15Google Scholar
  21. Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Natural computing. Springer, BerlinGoogle Scholar
  22. van Glabbeek RJ, Weijland WP (1996) Branching time and abstraction in bisimulation semantics. J ACM 43(3):555–600CrossRefMathSciNetGoogle Scholar
  23. Gold EM (1978) Complexity of automaton identification from given data. Inform Control 37(3):302–320MATHCrossRefMathSciNetGoogle Scholar
  24. Greco G, Guzzo A, Pontieri L (2005) Mining hierarchies of models: from abstract views to concrete specifications. In: van der Aalst WMP, Benatallah B, Casati F, Curbera F (eds) Business process management. Lectures notes in computer science, vol 3649. Springer-Verlag, Berlin, Nancy, France, 5–8 September, 2005, pp 32–47Google Scholar
  25. Greco G, Guzzo A, Pontieri L, Saccà D (2004) Mining expressive process models by clustering workflow traces. In: Dai H, Srikant R, Zhang C (eds) PAKDD. Lecture notes in computer science, vol 3056. Springer, Berlin, pp 52–62Google Scholar
  26. Greco G, Guzzo A, Pontieri L, Sacca D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027CrossRefGoogle Scholar
  27. Grigori D, Casati F, Dayal U, Shan MC (2001) Improving business process quality through exception understanding, prediction, and prevention. In: Apers P, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass R (eds) Proceedings of 27th international conference on very large data Bases (VLDB’01). Morgan Kaufmann, Los Atlos, CA, pp 159–168Google Scholar
  28. Grunwald PD, Myung IJ, Pitt M (eds) (2005) Advances in minimum description length theory and applications. MIT Press, Cambridge, MAGoogle Scholar
  29. Herbst J (2000) Dealing with concurrency in workflow induction. In: Baake U, Zobel R, Al-Akaidi M (eds) European concurrent engineering conference. SCS, EuropeGoogle Scholar
  30. Herbst J (2001) Ein induktiver Ansatz zur Akquisition und Adaption von Workflow-Modellen. Ph.D. thesis, Universität UlmGoogle Scholar
  31. Herbst J, Karagiannis D (2000) Integrating machine learning and workflow management to support acquisition and adaptation of workflow models. Int J Intell Syst Account Finance Manag 9:67–92CrossRefGoogle Scholar
  32. Herbst J, Karagiannis D (2004) Workflow mining with InWoLvE. Comput Ind 53(3):245–264CrossRefGoogle Scholar
  33. IDS Scheer (2002) ARIS process performance manager (ARIS PPM). http://www.ids-scheer.comGoogle Scholar
  34. Malpathak S, Saitou K, Qvam H (2002) Robust design of flexible manufacturing systems using, colored Petri net and genetic algorithm. J Int Manufact 13(5):339–351CrossRefGoogle Scholar
  35. Maruster L (2003) A machine learning approach to understand business processes. Ph.D. thesis, Eindhoven University of Technology, Eindhoven, The NetherlandsGoogle Scholar
  36. Maruster L, Weijters AJMM, van der Aalst WMP, van den Bosch A (2002) Process mining: discovering direct successors in process logs. In: Proceedings of the 5th international conference on discovery science (discovery science 2002). Lecture notes in artificial intelligence, vol 2534. Springer, Berlin, pp 364–373Google Scholar
  37. Mauch H (2003) Evolving Petri nets with a genetic algorithm. In: Cantú-Paz E, Foster JA, Deb K, Davis L, Roy R, O’Reilly U, Beyer H, Standish RK, Kendall G, Wilson SW, Harman M, Wegener J, Dasgupta D, Potter MA, Schultz AC, Dowsland KA, Jonoska N, Miller JF (eds) Genetic and evolutionary computation—GECCO 2003, genetic and evolutionary computation conference, Chicago, IL, USA, 12–16 July 2003. Proceedings, Part II. Lecture notes in computer science, vol 2724. Springer, Berlin, pp 1810–1811Google Scholar
  38. Maxeiner MK, Küspert K, Leymann F (2001) Data mining von workflow-protokollen zur teilautomatisierten konstruktion von prozeßmodellen. In: Proceedings of datenbanksysteme in Büro, technik und Wissenschaft. Informatik Aktuell Springer, Berlin, Germany, pp 75–84Google Scholar
  39. Milner R, Parrow J, Walker D (1992) A calculus of mobile processes. Inform Comput 100(1):1–77MATHCrossRefMathSciNetGoogle Scholar
  40. Moore JH, Hahn LW (2004) An improved grammatical evolution strategy for hierarchical Petri net modeling of complex genetic systems. In: Raidl GR et al (eds) Applications of evolutionary computing, Evo Workshops 2004. Lecture notes in computer science, vol 3005. Springer, Berlin, pp 63–72Google Scholar
  41. Moore JH, Hahn LW (2003a) Grammatical evolution for the discovery of Petri net models of complex genetic systems. In: Cantú-Paz E, Foster JA, Deb K, Davis L, Roy R, O’Reilly U, Beyer H, Standish RK, Kendall G, Wilson SW, Harman M, Wegener J, Dasgupta D, Potter MA, Schultz AC, Dowsland KA, Jonoska N, Miller JF (eds) Genetic and evolutionary computation—GECCO 2003, genetic and evolutionary computation conference, Chicago, IL, USA, 12–16 July 2003. Proceedings, Part II. Lecture notes in computer science, vol 2724. Springer, Berlin, pp 2412–2413.Google Scholar
  42. Moore JH, Hahn LW (2003b) Petri net modeling of high-order genetic systems using grammatical evolution. BioSystems 72(2):177–186CrossRefGoogle Scholar
  43. zur Mühlen M (2001) Process-driven management information systems combining data warehouses and workflow technology. In: Gavish B (ed) Proceedings of the international conference on electronic commerce research (ICECR-4). IEEE Computer Society Press, Los Alamitos, CA, pp 550–566Google Scholar
  44. zur Mühlen M, Rosemann M (2000) Workflow-based process monitoring and controlling–technical and organizational issues. In: Sprague R (ed) Proceedings of the 33rd Hawaii international conference on system science (HICSS-33). IEEE Computer Society Press, Los Alamitos, CA, pp 1–10Google Scholar
  45. Murata T (1989) Petri nets: properties, analysis and applications. Proc IEEE 77(4):541–580CrossRefGoogle Scholar
  46. Nummela J, Julstrom BA (2005) Evolving Petri nets to represent metabolic pathways. In: Beyer H, O’Reilly U (eds) GECCO. ACM, New York, pp 2133–2139Google Scholar
  47. Pinter SS, Golani M (2004) Discovering workflow models from activities lifespans. Comput Ind 53(3):283–296Google Scholar
  48. Pitt L (1889) Inductive inference, DFAs, and computational complexity. In: Jantke KP (ed) Proceedings of international workshop on analogical and inductive inference (AII). Lecture notes in computer science, vol 397. Springer, Berlin, pp 18–44Google Scholar
  49. Reddy JP, Kumanan S, Chetty OVK (2001) Application of Petri nets and a genetic algorithm to multi-mode multi-resource constrained project scheduling. Int J Adv Manufact Technol 17(4):305–314CrossRefGoogle Scholar
  50. Reisig W, Rozenberg G (ed) (1998) Lectures on Petri nets I: basic models. Lecture notes in computer science, vol 1491. Springer, BerlinGoogle Scholar
  51. Rozinat A, van der Aalst WMP (2005) Conformance testing: measuring the fit and appropriateness of event logs and process models. In: Bussler C, Haller A (eds) Business process management workshops. Lectures notes in computer science, vol 3812. Springer-Verlag, Berlin, pp 163–176Google Scholar
  52. Schimm G. Process mining. http://www.processmining.de/Google Scholar
  53. Schimm G (2002) Process miner—a tool for mining process schemes from event-based data. In: Flesca S, Ianni G (eds) Proceedings of the 8th European conference on artificial intelligence (JELIA). Lecture notes in computer science, vol 2424. Springer, Berlin, pp 525–528Google Scholar
  54. Schimm G (2004) Mining exact models of concurrent workflows. Comput Ind 53(3):265–281CrossRefGoogle Scholar
  55. Staffware (2002) Staffware process monitor (SPM). http://www.staffware.comGoogle Scholar
  56. Tohme H, Nakamura M, Hachiman E, Onaga K (1999) Evolutionary Petri net approach to periodic job-shop-scheduling. In: Proceedings of the IEEE international conference on systems, man, and cybernetics, vol 4, pp 441–446Google Scholar
  57. Weijters AJMM, van der Aalst WMP (2003) Rediscovering workflow models from event-based data using little thumb. Integr Comput Aided Eng 10(2):151–162Google Scholar
  58. Wen L, Wang J, Sun J (2006) Detecting implicit dependencies between tasks from event logs. In: Zhou X, Li J, Shen HT, Kitsuregawa M, Zhang Y (eds) APWeb. Lecture notes in computer science, vol 3841. Springer, Berlin, pp 591–603Google Scholar

Copyright information

© Springer Science+Business Media, LLC (omit copyright symbol) 2007

Authors and Affiliations

  • A. K. A. de Medeiros
    • 1
  • A. J. M. M. Weijters
    • 1
  • W. M. P. van der Aalst
    • 1
  1. 1.Department of Technology ManagementEindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations