Data Mining and Knowledge Discovery

, Volume 31, Issue 3, pp 774–808 | Cite as

Explaining clusterings of process instances

  • Pieter De KoninckEmail author
  • Jochen De Weerdt
  • Seppe K. L. M. vanden Broucke


This paper presents a technique that aims to increase human understanding of trace clustering solutions. The clustering techniques under scrutiny stem from the process mining domain, where the clustering of process instances is deemed a useful technique to analyse process data with a large variety of behaviour. Until now, the most often used method to inspect clustering solutions in this domain is visual inspection of the clustering results. This paper proposes a more thorough approach based on the post hoc application of supervised learning with support vector machines on cluster results. Our approach learns concise rules to describe why a specific instance is included in a certain cluster based on specific control-flow based feature variables. An extensive experimental evaluation is presented showing that our technique outperforms alternatives. Likewise, we are able to identify features that lead to shorter and more accurate explanations.


Process discovery Trace clustering Human understanding Instance-level explanations Support vector machines 


  1. Abello J, van Ham F, Krishnan Neeraj (2006) ASK-GraphView: A Large Scale Graph Visualization System. IEEE Trans Vis Comput Graph 12(5):669–676. doi: 10.1109/TVCG.2006.120 CrossRefGoogle Scholar
  2. Adriansyah A, van Dongen BF, van der Aalst WMP (2011) Conformance checking using cost-based fitness analysis. In: Proc. IEEE Enterprise Computing Conf. (EDOC-11), pp 55–64. doi: 10.1109/EDOC.2011.12
  3. Andrews R, Diederich J, Tickle AB (1995) Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl-Based Syst 8(6):373–389CrossRefGoogle Scholar
  4. Appice A, Malerba D (2015) A co-training strategy for multiple view clustering in process mining. IEEE Trans Serv Comput (99): 1–1. doi: 10.1109/TSC.2015.2430327 CrossRefGoogle Scholar
  5. Bose RPJC, van der Aalst WMP (2009) Context aware trace clustering: Towards improving process mining results. In: Proc. SIAM Int. Conf. on Data Mining (SDM-09), pp 401–412. doi: 10.1137/1.9781611972795.35 CrossRefGoogle Scholar
  6. Bose RPJC, van der Aalst WMP (2010) Trace clustering based on conserved patterns: towards achieving better process models. In: Lecture Notes in Business Information Processing, LNBIP, vol 43, pp 170–181. doi: 10.1007/978-3-642-12186-9_16 CrossRefGoogle Scholar
  7. Buijs J (2014) Environmental permit application process (wabo), coselog project. Eindhoven University of Technology, Dataset. doi: 10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270
  8. Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Discov 7(4):399–424. doi: 10.1023/A:1024992613384 MathSciNetCrossRefGoogle Scholar
  9. Chesani F, Lamma E, Mello P, Montali M, Riguzzi F, Storari S (2009) Exploiting inductive logic programming techniques for declarative process mining. In: Jensen K, van der Aalst WMP (eds.) Transactions on petri nets and other models of concurrency II: special issue on concurrency in process-aware information systems, Springer, Berlin, pp 278–295. doi: 10.1007/978-3-642-00899-3_16 CrossRefGoogle Scholar
  10. Cohen W (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds.) Proceedings of the 12th international conference on machine learning. Morgan Kaufmann Publishers, Tahoe City, pp 115–123CrossRefGoogle Scholar
  11. Collins C, Carpendale S (2007) VisLink: Revealing relationships amongst visualizations. IEEE Trans Vis Comput Graph 13(6):1192–1199. doi: 10.1109/TVCG.2007.70521 CrossRefGoogle Scholar
  12. Cook JE, Wolf AL (1998) Discovering models of software processes from event-based data. ACM Trans Softw Eng Methodol 7(3):215–249CrossRefGoogle Scholar
  13. de Medeiros AKA, Weijters AJMM, van der Aalst WMP (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2):245–304. doi: 10.1007/s10618-006-0061-7 MathSciNetCrossRefGoogle Scholar
  14. de Medeiros AKA, van der Aalst WMP, Weijters AJMM (2008) Quantifying process equivalence based on observed behavior. Data Knowl Eng 64(1):55–74. doi: 10.1016/j.datak.2007.06.010 CrossRefGoogle Scholar
  15. De Weerdt J, Vanden Broucke S (2014) SECPI: searching for explanations for clustered process instances. In: Lecture Notes in Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), LNCS, vol 8659, pp 408–415. doi: 10.1007/978-3-319-10172-9_29 Google Scholar
  16. De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676. doi: 10.1016/ CrossRefGoogle Scholar
  17. De Weerdt J, Vanden Broucke S, Vanthienen J, Baesens B (2013) Active trace clustering for improved process discovery. IEEE Trans Knowl Data Eng 25(12):2708–2720. doi: 10.1109/TKDE.2013.64 CrossRefGoogle Scholar
  18. Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowl-Based Syst 84:203–213. doi: 10.1016/j.knosys.2015.04.012 CrossRefGoogle Scholar
  19. Dijkman R, Dumas M, Van Dongen B, Krik R, Mendling J (2011) Similarity of business process models: metrics and evaluation. Inf Syst 36(2):498–516. doi: 10.1016/ CrossRefGoogle Scholar
  20. Dijkman RM (2007) A classification of differences between similar business processes. In: EDOC, pp 37–50. doi: 10.1109/EDOC.2007.24
  21. Dijkman RM (2008) Diagnosing differences between business process models. In: BPM, pp 261–277. doi: 10.1007/978-3-540-85758-7_20 Google Scholar
  22. Dumas M, La Rosa M, Mendling J, Reijers HA (2013) Fundamentals of business process management. Springer, Heidelberg. doi: 10.1007/978-3-642-33143-5 CrossRefGoogle Scholar
  23. Ekanayake CC, Dumas M, García-Bañuelos L, La Rosa M (2013) Slice, mine and dice: complexity-aware automated discovery of business process models. In: BPM, pp 49–64. doi: 10.1007/978-3-642-40176-3_6 Google Scholar
  24. Evermann J, Thaler T, Fettke P (2016) Clustering traces using sequence alignment. In: Reichert M, Reijers HA (eds.) Business process management workshops: BPM 2015. In: 13th international workshops, Innsbruck, Austria, August 31–September 3, 2015, Revised Papers. Springer International Publishing, Cham, pp 179–190. doi: 10.1007/978-3-319-42887-1_15 CrossRefGoogle Scholar
  25. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874. doi: 10.1038/oby.2011.351 CrossRefzbMATHGoogle Scholar
  26. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) Knowledge discovery and data mining: towards a unifying framework. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, pp 82–88Google Scholar
  27. Ferreira DR, Zacarias M, Malheiros M, Ferreira P (2007) Approaching process mining with sequence clustering: experiments and findings. In: BPM, pp 360–374. doi: 10.1007/978-3-540-75183-0_26
  28. Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172Google Scholar
  29. Folino F, Greco G, Guzzo A, Pontieri L (2011) Mining usage scenarios in business processes: outlier-aware discovery and run-time prediction. Data Knowl Eng 70(12):1005–1029. doi: 10.1016/j.datak.2011.07.002 CrossRefGoogle Scholar
  30. Fred A, Lourenço A (2008) Cluster ensemble methods: from single clusterings to combined solutions. In: Supervised and unsupervised ensemble methods and their applications, Springer, Berlin, pp 3–30. doi: 10.1007/978-3-540-78981-9_1 CrossRefGoogle Scholar
  31. Gansner ER, Hu Y, Kobourov S (2010) Visualizing graphs and clusters as maps. IEEE Comput Graph Appl 30(6):54–66. doi: 10.1109/MCG.2010.101 CrossRefGoogle Scholar
  32. Goedertier S, Martens D, Vanthienen J, Baesens B (2009) Robust process discovery with artificial negative events. J Mach Learn Res 10:1305–1340. doi: 10.1145/1577069.1577113 MathSciNetCrossRefzbMATHGoogle Scholar
  33. Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027. doi: 10.1109/TKDE.2006.123 CrossRefGoogle Scholar
  34. Günther CW, Verbeek H (2014) Xes-standard definition. BPM Center Report BPM-14-09, BPMcenterorgGoogle Scholar
  35. Hidders J, Dumas M, van der Aalst WMP, ter Hofstede AHM, Verelst J (2005) When are two workflows the same? In: Proceedings of the 2005 Australasian symposium on theory of computing, CATS ’05, vol 41, pp 3–11. Australian Computer Society Inc., Darlinghurst.
  36. Kiepuszewski B, ter Hofstede AHM, van der Aalst WMP (2003) Fundamentals of control flow in workflows. Acta Inf 39(3):143–209. doi: 10.1007/s00236-002-0105-4 MathSciNetCrossRefzbMATHGoogle Scholar
  37. Lamma E, Mello P, Riguzzi F, Storari S (2008) Applying inductive logic programming to process mining. In: Blockeel H, Ramon J, Shavlik J, Tadepalli P (eds.) Inductive logic programming: 17th international conference, ILP 2007, Corvallis, June 19–21, 2007, Revised Selected Papers. Springer, Berlin, pp 132–146. doi: 10.1007/978-3-540-78469-2_16
  38. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707–710MathSciNetGoogle Scholar
  39. Martens D, Provost F (2014) Explaining data-driven document classifications. MIS Q 38(1):73–99CrossRefGoogle Scholar
  40. Martens D, Baesens B, Gestel TV, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476. doi: 10.1016/j.ejor.2006.04.051 CrossRefzbMATHGoogle Scholar
  41. Michalski RS, Stepp RE (1983) Learning from observation: conceptual clustering. In: Machine learning. Springer, Berlin, pp 331–363CrossRefGoogle Scholar
  42. Mitchell TM, Keller RM, Kedar-Cabelli ST (1986) Explanation-based generalization: a unifying view. Mach Learn 1(1):47–80. doi: 10.1023/A:1022691120807 CrossRefGoogle Scholar
  43. Pesic M, Schonenberg H, van der Aalst WM (2007) Declare: full support for loosely-structured processes. In: Enterprise distributed object computing conference, 2007. EDOC 2007. 11th IEEE international, pp 287–287. doi: 10.1109/EDOC.2007.14
  44. Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  45. Ribeiro MT, Singh S, Guestrin C (2016) “why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 1135–1144. doi: 10.1145/2939672.2939778
  46. Rozinat A, van der Aalst WMP (2006) Decision mining in ProM. In: Business process management, pp 420–425. doi: 10.1007/11841760_33 Google Scholar
  47. Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95. doi: 10.1016/ CrossRefGoogle Scholar
  48. Sole M, Carmona J (2011) Region-based foldings in process discovery. IEEE Trans Knowl Data Eng 25(1):192–205. doi: 10.1109/TKDE.2011.192 CrossRefGoogle Scholar
  49. Song M, Günther CW, van der Aalst WMP (2008) Trace clustering in process mining. In: BPM workshops, pp 109–120. doi: 10.1007/978-3-642-00328-8_11 CrossRefGoogle Scholar
  50. Song M, Yang H, Siadat SH, Pechenizkiy M (2013) A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Syst Appl 40:3722–3737. doi: 10.1016/j.eswa.2012.12.078 CrossRefGoogle Scholar
  51. Steeman W (2013) BPI challenge 2013. Ghent University, Dataset. doi: 10.4121/uuid:a7ce5c55-03a7-4583-b855-98b86e1a2b07
  52. van der Aalst WMP (1999) Formalization and verification of event-driven process chains. Inf Softw Technol 41(10):639–650. doi: 10.1016/S0950-5849(99)00016-6 CrossRefGoogle Scholar
  53. van der Aalst WMP (2016) Process mining—data science in action, 2nd edn. Springer, Berlin. doi: 10.1007/978-3-662-49851-4 CrossRefGoogle Scholar
  54. van der Aalst WMP, Weijters T, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142. doi: 10.1109/TKDE.2004.47 CrossRefGoogle Scholar
  55. van der Aalst WMP, de Medeiros AKA, Weijters AJMM (2006) Process equivalence: comparing two process models based on observed behavior. In: Business process management, pp 129–144. doi: 10.1007/11841760_10 Google Scholar
  56. van Dongen BF, Dijkman RM, Mendling J (2008) Measuring similarity between business process models. In: CAiSE, pp 450–464. doi: 10.1007/978-3-540-69534-9_34 Google Scholar
  57. van Glabbeek RJ, Goltz U (2001) Refinement of actions and equivalence notions for concurrent systems. Acta Inf 37(4/5):229–327. doi: 10.1007/s002360000041 MathSciNetCrossRefzbMATHGoogle Scholar
  58. Veiga GM, Ferreira DR (2010) Understanding spaghetti models with sequence clustering for prom. In: Rinderle-Ma, S et al (ed.) BPM workshops, Springer, LNBIP, vol 43, pp 92–103. doi: 10.1007/978-3-642-12186-9 Google Scholar
  59. Viau C, McGuffin MJ, Chiricota Y, Jurisica I (2010) The FlowVizMenu and parallel scatterplot matrix: hybrid multidimensional visualizations for network exploration. IEEE Trans Vis Comput Graph 16(6):1100–1108. doi: 10.1109/TVCG.2010.205 CrossRefGoogle Scholar
  60. Wang F, Sun J (2014) Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Discov 29(2):534–564. doi: 10.1007/s10618-014-0356-z MathSciNetCrossRefGoogle Scholar
  61. Weidlich M, Mendling J, Weske M (2011) Efficient consistency measurement based on behavioral profiles of process models. IEEE Trans Softw Eng 37(3):410–429. doi: 10.1109/TSE.2010.96 CrossRefGoogle Scholar
  62. Weijters AJMM, van der Aalst WMP, Alves de Medeiros AK (2006) Process mining with the heuristicsminer algorithm. In: BETA working paper series 166, TU EindhovenGoogle Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  1. 1.KU Leuven - University of Leuven, Research Center for Management Informatics, Faculty of Economics and BusinessLouvainBelgium

Personalised recommendations