Data Mining and Knowledge Discovery

, Volume 14, Issue 2, pp 245–304

Genetic process mining: an experimental evaluation

Authors

    • Department of Technology ManagementEindhoven University of Technology
  • A. J. M. M. Weijters
    • Department of Technology ManagementEindhoven University of Technology
  • W. M. P. van der Aalst
    • Department of Technology ManagementEindhoven University of Technology
Open AccessArticle

DOI: 10.1007/s10618-006-0061-7

Cite this article as:
de Medeiros, A.K.A., Weijters, A.J.M.M. & van der Aalst, W.M.P. Data Min Knowl Disc (2007) 14: 245. doi:10.1007/s10618-006-0061-7

Abstract

One of the aims of process mining is to retrieve a process model from an event log. The discovered models can be used as objective starting points during the deployment of process-aware information systems (Dumas et al., eds., Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, New York, 2005) and/or as a feedback mechanism to check prescribed models against enacted ones. However, current techniques have problems when mining processes that contain non-trivial constructs and/or when dealing with the presence of noise in the logs. Most of the problems happen because many current techniques are based on local information in the event log. To overcome these problems, we try to use genetic algorithms to mine process models. The main motivation is to benefit from the global search performed by this kind of algorithms. The non-trivial constructs are tackled by choosing an internal representation that supports them. The problem of noise is naturally tackled by the genetic algorithm because, per definition, these algorithms are robust to noise. The main challenge in a genetic approach is the definition of a good fitness measure because it guides the global search performed by the genetic algorithm. This paper explains how the genetic algorithm works. Experiments with synthetic and real-life logs show that the fitness measure indeed leads to the mining of process models that are complete (can reproduce all the behavior in the log) and precise (do not allow for extra behavior that cannot be derived from the event log). The genetic algorithm is implemented as a plug-in in the ProM framework.

Keywords

Process miningGenetic miningGenetic algorithmsPetri netsWorkflow nets
Download to read the full article text

Copyright information

© Springer Science+Business Media, LLC (omit copyright symbol) 2007