Process mining: a two-step approach to balance between underfitting and overfitting

  • W. M. P. van der AalstEmail author
  • V. Rubin
  • H. M. W. Verbeek
  • B. F. van Dongen
  • E. Kindler
  • C. W. Günther
Open Access
Regular Paper


Process mining includes the automated discovery of processes from event logs. Based on observed events (e.g., activities being executed or messages being exchanged) a process model is constructed. One of the essential problems in process mining is that one cannot assume to have seen all possible behavior. At best, one has seen a representative subset. Therefore, classical synthesis techniques are not suitable as they aim at finding a model that is able to exactly reproduce the log. Existing process mining techniques try to avoid such “overfitting” by generalizing the model to allow for more behavior. This generalization is often driven by the representation language and very crude assumptions about completeness. As a result, parts of the model are “overfitting” (allow only for what has actually been observed) while other parts may be “underfitting” (allow for much more behavior without strong support for it). None of the existing techniques enables the user to control the balance between “overfitting” and “underfitting”. To address this, we propose a two-step approach. First, using a configurable approach, a transition system is constructed. Then, using the “theory of regions”, the model is synthesized. The approach has been implemented in the context of ProM and overcomes many of the limitations of traditional approaches.


Business Process Transition System Process Mining Customer Relationship Management Region Theory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research is supported by EIT, NWO-EW, and the Technology Foundation STW. Moreover, we would like to thank the many people involved in the development of ProM.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution,and reproduction in any medium, provided the original author(s) and source are credited.


  1. 1.
    van der Aalst W.M.P.: The application of Petri nets to workflow management. J. Circ. Syst. Comput. 8(1), 21–66 (1998)CrossRefGoogle Scholar
  2. 2.
    van der Aalst, W.M.P., van Dongen, B.F., Günther, C.W., Mans, R.S., Alves de Medeiros, A.K., Rozinat, A., Rubin, V., Song, M., Verbeek, H.M.W., Weijters, A.J.M.M.: ProM 4.0: comprehensive support for real process analysis. In: Kleijn, J., Yakovlev, A., (eds.) Application and Theory of Petri Nets and Other Models of Concurrency (ICATPN 2007). Lecture Notes in Computer Science, vol. 4546, pp. 484–494. Springer, Berlin (2007)Google Scholar
  3. 3.
    van der Aalst W.M.P., Alves de Medeiros A.K., Weijters A.J.M.M.: Genetic process mining. In: Ciardo, G., Darondeau, P. (eds) Applications and Theory of Petri Nets 2005. Lecture Notes in Computer Science, vol. 3536, pp. 48–69. Springer, Berlin (2005)Google Scholar
  4. 4.
    van der Aalst W.M.P., Reijers H.A., Weijters A.J.M.M., van Dongen B.F., Alves de Medeiros A.K., Song M., Verbeek H.M.W.: Business process mining: an industrial application. Inf. Syst. 32(5), 713–732 (2007)CrossRefGoogle Scholar
  5. 5.
    van der Aalst, W.M.P., Rubin, V., van Dongen, B.F., Kindler, E., Günther, C.W.: Process mining: a two-step approach using transition systems and regions. BPM Center Report BPM-06-30, (2006)Google Scholar
  6. 6.
    van der Aalst W.M.P., van Dongen B.F., Herbst J., Maruster L., Schimm G., Weijters A.J.M.M.: Workflow mining: a survey of issues and approaches. Data Knowl. Eng. 47(2), 237–267 (2003)CrossRefGoogle Scholar
  7. 7.
    van der Aalst W.M.P., Weijters A.J.M.M., Maruster L.: Workflow mining: discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)CrossRefGoogle Scholar
  8. 8.
    Agrawal, R., Gunopulos, D., Leymann, F.: Mining process models from workflow logs. In: Proceedings of the Sixth International Conference on Extending Database Technology, pp. 469–483 (1998)Google Scholar
  9. 9.
    Badouel E., Bernardinello L., Darondeau P.: The synthesis problem for elementary net systems is NP-complete. Theor. Comput. Sci. 186(1–2), 107–134 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Badouel E., Darondeau P.: Theory of regions. In: Reisig, W., Rozenberg, G. (eds) Lectures on Petri Nets I: Basic Models. Lecture Notes in Computer Science, vol. 1491, pp. 529–586. Springer, Berlin (1998)Google Scholar
  11. 11.
    Bergenthum R., Desel J., Lorenz R., Mauser S.: Process mining based on regions of languages. In: Alonso, G., Dadam, P., Rosemann, M. (eds) International Conference on Business Process Management (BPM 2007). Lecture Notes in Computer Science, vol. 4714, pp. 375–383. Springer, Berlin (2007)Google Scholar
  12. 12.
    Cook J.E., Wolf A.L.: Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7(3), 215–249 (1998)CrossRefGoogle Scholar
  13. 13.
    Cortadella J., Kishinevsky M., Kondratyev A., Lavagno L., Yakovlev A.: Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers. IEICE Trans. Inf. Syst. E 80-D(3), 315–325 (1997)Google Scholar
  14. 14.
    Cortadella, J., Kishinevsky, M., Lavagno, L., Yakovlev, A.: Synthesizing Petri Nets from state-based models. In: Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’95), pp. 164–171. IEEE Computer Society, Los Alamitos (1995)Google Scholar
  15. 15.
    Cortadella J., Kishinevsky M., Lavagno L., Yakovlev A.: Deriving Petri nets from finite transition systems. IEEE Trans. Comput. 47(8), 859–882 (1998)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Datta A.: Automating the discovery of As-Is business process models: probabilistic and algorithmic approaches. Inf. Syst. Res. 9(3), 275–301 (1998)CrossRefGoogle Scholar
  17. 17.
    Desel, J., Esparza, J.: Free choice Petri nets. In: Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, Cambridge (1995)Google Scholar
  18. 18.
    Desel J., Reisig W.: The synthesis problem of Petri nets. Acta Inf. 33(4), 297–315 (1996)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Desel, J., Reisig, W., Rozenberg, G.: (eds) Lectures on Concurrency and Petri Nets. Lecture Notes in Computer Science, vol. 3098, Springer-Verlag, Berlin (2004)Google Scholar
  20. 20.
    van Dongen B.F., van der Aalst W.M.P.: Multi-phase process mining: building instance graphs. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.W. (eds) International Conference on Conceptual Modeling (ER 2004). Lecture Notes in Computer Science, vol. 3288., pp. 362–376. Springer, Berlin (2004)Google Scholar
  21. 21.
    van Dongen, B.F., van der Aalst, W.M.P.: Multi-phase mining: aggregating instances graphs into EPCs and Petri nets. In: Marinescu, D. (ed.) Proceedings of the Second International Workshop on Applications of Petri Nets to Coordination, Workflow and Business Process Management, pp. 35–58. Florida International University, Miami, Florida, USA (2005)Google Scholar
  22. 22.
    Dumas M., van der Aalst W.M.P., ter Hofstede A.H.M.: Process-Aware Information Systems: Bridging People and Software through Process Technology. Wiley, London (2005)Google Scholar
  23. 23.
    Ehrenfeucht A., Rozenberg G.: Partial (Set) 2-structures—part 1 and part 2. Acta Inf. 27(4), 315–368 (1989)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Grigori D., Casati F., Castellanos M., Dayal U., Sayal M., Shan M.C.: Business process intelligence. Comput. Ind. 53(3), 321–343 (2004)CrossRefGoogle Scholar
  25. 25.
    Günther C., van der Aalst W.M.P.: A generic import framework for process event logs. In: Eder, J., Dustdar, S. (eds) Business Process Management Workshops, Workshop on Business Process Intelligence (BPI 2006). Lecture Notes in Computer Science, vol. 4103, pp. 81–92. Springer, Berlin (2006)Google Scholar
  26. 26.
    Herbst, J.: A machine learning approach to workflow management. In: Proceedings 11th European Conference on Machine Learning. Lecture Notes in Computer Science, vol. 1810, pp. 183–194. Springer, Berlin (2000)Google Scholar
  27. 27.
    IDS Scheer, ARIS Process Performance Manager (ARIS PPM): Measure, Analyze and Optimize Your Business Process Performance (whitepaper). IDS Scheer, Saarbruecken, Gemany. (2002)
  28. 28.
    Kindler E., Rubin V., Schäfer W.: Process mining and Petri net synthesis. In: Eder, J., Dustdar, S. (eds) Business Process Management Workshops. Lecture Notes in Computer Science, vol 4103., pp. 105–116. Springer, Berlin (2006)Google Scholar
  29. 29.
    Lorenz R., Bergenthum R., Desel J., Mauser S.: Synthesis of Petri nets from finite partial languages. In: Basten, T., Juhás, G., Shukla, S.K. (eds) International Conference on Application of Concurrency to System Design (ACSD 2007)., pp. 157–166. IEEE Computer Society, Los Alamitos (2007)CrossRefGoogle Scholar
  30. 30.
    Lorenz, R., Juh’as, G.: How to synthesize nets from languages: a survey. In: Henderson, S.G., Biller, B., Hsieh, M., Shortle, J., Tew, J.D., Barton, R.R. (eds.) Proceedings of the Winter simulation Conference (WSC 2007), pp. 637–647. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  31. 31.
    de Medeiros, A.K.A.: Genetic Process Mining. PhD thesis, Eindhoven University of Technology, Eindhoven (2006)Google Scholar
  32. 32.
    de Medeiros A.K.A., van der Aalst W.M.P., Weijters A.J.M.M.: Workflow mining: current status and future directions. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. Lecture Notes in Computer Science, vol. 2888, pp. 389–406. Springer, Berlin (2003)Google Scholar
  33. 33.
    zur Mühlen, M., Rosemann, M.: Workflow-based process monitoring and controlling—technical and organizational issues. In: Sprague, R. (ed.) Proceedings of the 33rd Hawaii International Conference on System Science (HICSS-33), pp. 1–10. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  34. 34.
    Reisig, W., Rozenberg, G. (eds): Lectures on Petri Nets I: Basic Models Lecture Notes in Computer Science, vol. 1491. Springer-Verlag, Berlin (1998)Google Scholar
  35. 35.
    Rozinat A., van der Aalst W.M.P.: Conformance testing: measuring the fit and appropriateness of event logs and process models. In: Bussler, C. et al. (eds) BPM 2005 Workshops (Workshop on Business Process Intelligence). Lecture Notes in Computer Science, vol. 3812, pp. 163–176. Springer, Berlin (2006)Google Scholar
  36. 36.
    Rozinat A., van der Aalst W.M.P.: Decision mining in ProM. In: Dustdar, S., Faideiro, J.L., Sheth, A. (eds) International Conference on Business Process Management (BPM 2006). Lecture Notes in Computer Science, vol. 4102, pp. 420–425. Springer, Berlin (2006)Google Scholar
  37. 37.
    Rozinat A., van der Aalst W.M.P.: Conformance checking of processes based on monitoring real behavior. Inf. Syst. 33(1), 64–95 (2008)CrossRefGoogle Scholar
  38. 38.
    Sayal M., Casati, F., Dayal, U., Shan, M.C.: Business process cockpit. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDBG’02), pp. 880–883. Morgan Kaufmann, USA (2002)Google Scholar
  39. 39.
    TIBCO. TIBCO Staffware Process Monitor (SPM)., 2005
  40. 40.
    Weijters A.J.M.M., van der Aalst W.M.P.: Rediscovering workflow models from event-based data using little thumb. Integr. Comput. Aided Eng. 10(2), 151–162 (2003)Google Scholar
  41. 41.
    Wen L., van der Aalst W.M.P., Wang J., Sun J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2), 145–180 (2007)CrossRefMathSciNetGoogle Scholar
  42. 42.
    van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Process discovery using integer linear programming. In: van Hee, K., Valk, R., (eds.) Proceedings of the 29th International Conference on Applications and Theory of Petri Nets (Petri Nets 2008). Lecture Notes in Computer Science, vol. 5062 , pp. 368–387. Springer, Berlin (2008)Google Scholar

Copyright information

© The Author(s) 2008

Authors and Affiliations

  • W. M. P. van der Aalst
    • 1
    Email author
  • V. Rubin
    • 2
  • H. M. W. Verbeek
    • 1
  • B. F. van Dongen
    • 1
  • E. Kindler
    • 3
  • C. W. Günther
    • 1
  1. 1.Eindhoven University of TechnologyEindhovenThe Netherlands
  2. 2.Software Design and Management (sd&m AG)Offenbach am MainGermany
  3. 3.Technical University of DenmarkInformatics and Mathematical ModellingLyngbyDenmark

Personalised recommendations