Advertisement

Clustering Traces Using Sequence Alignment

  • Joerg Evermann
  • Tom Thaler
  • Peter Fettke
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 256)

Abstract

Process mining discovers process models from event logs. Logs containing heterogeneous sets of traces can lead to complex process models that try to account for very different behaviour in a single model. Trace clustering identifies homogeneous sets of traces within a heterogeneous log and allows for the discovery of multiple, simpler process models. In this paper, we present a trace clustering method based on local alignment of sequences, subsequent multidimensional scaling, and k-means clustering. We describe its implementation and show that its performance compares favourably to state-of-the-art clustering approaches on two evaluation problems.

Keywords

Process mining Process discovery Trace clustering Sequence alignment 

References

  1. 1.
    Thaler, T., Ternis, S.F., Fettke, P., Loos, P.: A comparative analysis of process instance cluster techniques. In: Thomas, O., Teuteberg, F., (eds.) Smart Enterprise Engineering: 12. Internationale Tagung Wirtschaftsinformatik, WI 2015, Osnabrück, Germany, 4–6 March 2015, pp. 423–437 (2015)Google Scholar
  2. 2.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009)CrossRefzbMATHGoogle Scholar
  3. 3.
    De Weerdt, J., Vanthienen, J., Baesens, B., et al.: Active trace clustering for improved process discovery. IEEE Trans. Knowl. Data Eng. 25(12), 2708–2720 (2013)CrossRefGoogle Scholar
  4. 4.
    Weijters, A., Ribeiro, J.: Flexible heuristics miner (FHM). In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining CIDM 2011, Paris, France (2011)Google Scholar
  5. 5.
    van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer, Heidelberg (2011)CrossRefzbMATHGoogle Scholar
  6. 6.
    De Weerdt, J., De Backer, M., Vanthienen, J., Baesens, B.: A robust f-measure for evaluating discovered process models. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011, Part of the IEEE Symposium Series on Computational Intelligence 11–15 2011, Paris, France, pp. 148–155. IEEE (2011)Google Scholar
  7. 7.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  8. 8.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRefGoogle Scholar
  9. 9.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)CrossRefGoogle Scholar
  10. 10.
    Cox, T.F., Cox, M.A.: Multidimensional Scaling. CRC Press, Boca Raton (2000)zbMATHGoogle Scholar
  11. 11.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2014)Google Scholar
  12. 12.
    van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W.E., Weijters, A.J.M.M.T., van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 444–454. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    de Medeiros, A.K.A., Guzzo, A., Greco, G., van der Aalst, W.M.P., Weijters, A.J.M.M.T., van Dongen, B.F., Saccà, D.: Process mining based on clustering: a quest for precision. In: Hofstede, A.H.M., Benatallah, B., Paik, H.-Y. (eds.) BPM Workshops 2007. LNCS, vol. 4928, pp. 17–29. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Rozinat, A., van der Aalst, W.M.: Conformance checking of processes based on monitoring real behavior. Inf. Syst. 33(1), 64–95 (2008)CrossRefGoogle Scholar
  15. 15.
    van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.F.: Replaying history on process models for conformance checking and performance analysis. Wiley Interdisc. Rev.: Data Min. Knowl. Discovery 2(2), 182–192 (2012)Google Scholar
  16. 16.
    Adriansyah, A., Munoz-Gama, J., Carmona, J., van Dongen, B.F., van der Aalst, W.M.P.: Alignment based precision checking. In: La Rosa, M., Soffer, P. (eds.) BPM Workshops 2012. LNBIP, vol. 132, pp. 137–149. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  17. 17.
    Veiga, G.M., Ferreira, D.R.: Understanding spaghetti models with sequence clustering for ProM. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 92–103. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) Business Process Management Workshops. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  19. 19.
    Van Dongen, B., Weber, B., Ferreira, D., De Weerdt, J.: Business process intelligence challenge (BPIC 2014) (2014)Google Scholar
  20. 20.
    Van Dongen, B., Weber, B., Ferreira, D.: Business process intelligence challenge (BPIC 2012) (2012)Google Scholar
  21. 21.
    Thaler, T., Fettke, P., Loos, P.: Process mining - Fallstudie leginda.de. HMD Praxis der Wirtschaftsinformatik 293, 56–66 (2013)CrossRefGoogle Scholar
  22. 22.
    Melcher, J.: Process Measurement in Business Process Management- Theoretical Framework and Analysis of Several Aspects. KIT Scientific Publishing, Karlsruhe, Germany (2012)Google Scholar
  23. 23.
    Bose, R.P.J.C., van der Aalst, W.M.P.: Process diagnostics using trace alignment: opportunities, issues, and challenges. Inf. Syst. 37(2), 117–141 (2012)CrossRefGoogle Scholar
  24. 24.
    Bose, R.P.J.C., van der Aalst, W.M.P.: Trace alignment in process mining: opportunities for process diagnostics. In: Hull, R., Mendling, J., Tai, S. (eds.) BPM 2010. LNCS, vol. 6336, pp. 227–242. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  25. 25.
    Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  26. 26.
    Bose, R.P.J.C., van der Aalst, W.M.P.: Context aware trace clustering: towards improving process mining results. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2009, 30 April–2 May 2009, Sparks, Nevada, USA, pp. 401–412. SIAM (2009)Google Scholar
  27. 27.
    Sellers, P.H.: On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26(4), 787–793 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Ferreira, D.R.: Applied sequence clustering techniques for process mining. In: Cardoso, J., van der Aalst, W. (eds.) Handbook of Research on Business Process Modeling, pp. 481–502. Information Science Reference, Hershey, PA (2009)CrossRefGoogle Scholar
  29. 29.
    Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Memorial University of NewfoundlandSt. John’sCanada
  2. 2.Deutsches Forschungszentrum für Künstliche IntelligenzSaarbrückenGermany
  3. 3.Universität des SaarlandesSaarbrückenGermany

Personalised recommendations