Explaining clusterings of process instances

De Koninck, Pieter; De Weerdt, Jochen; vanden Broucke, Seppe K. L. M.

doi:10.1007/s10618-016-0488-4

Explaining clusterings of process instances

Published: 05 December 2016

Volume 31, pages 774–808, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

976 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

This paper presents a technique that aims to increase human understanding of trace clustering solutions. The clustering techniques under scrutiny stem from the process mining domain, where the clustering of process instances is deemed a useful technique to analyse process data with a large variety of behaviour. Until now, the most often used method to inspect clustering solutions in this domain is visual inspection of the clustering results. This paper proposes a more thorough approach based on the post hoc application of supervised learning with support vector machines on cluster results. Our approach learns concise rules to describe why a specific instance is included in a certain cluster based on specific control-flow based feature variables. An extensive experimental evaluation is presented showing that our technique outperforms alternatives. Likewise, we are able to identify features that lead to shorter and more accurate explanations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SECPI: Searching for Explanations for Clustered Process Instances

An Approach for Incorporating Expert Knowledge in Trace Clustering

Expert-driven trace clustering with instance-level constraints

Article 01 March 2021

Notes

Consider \({\textit{Exists}}\mathrm{(}{} { a}\mathrm{)}\) in a case where each trace starts with activity a, then this feature would correspond to a column of ones in the constructed data set. Hence, it can contain no discriminating information and is a redundant feature. In the same data set, with activity a at the start of each trace, \({\textit{Exists(b)}}\) and \({\textit{SometimesWeaklyFollows}}(a,b)\) will be perfectly correlated, making one of these features redundant.
http://www.promtools.org/prom6/.
The plugin itself, screen captures and further explanation can be retrieved from: http://www.processmining.be/svmexplainer.
The event logs are available for download on http://www.processmining.be/svmexplainer/datasets.
Consider for example the explanation “\({\textit{SometimesDirectlyFollows}}(a,b) = 0\) AND \({\textit{SometimesDirectlyFollows}}(b,d) = 0\) AND \({\textit{SometimesDirectlyFollows}}(f,g) = 0\)” denoting that this instance would leave the cluster if the three attributes corresponding to the sometimes directly follows relations listed above would be set to zero. The length of this explanation is thus equal to 3.

References

Abello J, van Ham F, Krishnan Neeraj (2006) ASK-GraphView: A Large Scale Graph Visualization System. IEEE Trans Vis Comput Graph 12(5):669–676. doi:10.1109/TVCG.2006.120
Article Google Scholar
Adriansyah A, van Dongen BF, van der Aalst WMP (2011) Conformance checking using cost-based fitness analysis. In: Proc. IEEE Enterprise Computing Conf. (EDOC-11), pp 55–64. doi:10.1109/EDOC.2011.12
Andrews R, Diederich J, Tickle AB (1995) Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl-Based Syst 8(6):373–389
Article Google Scholar
Appice A, Malerba D (2015) A co-training strategy for multiple view clustering in process mining. IEEE Trans Serv Comput (99): 1–1. doi:10.1109/TSC.2015.2430327
Article Google Scholar
Bose RPJC, van der Aalst WMP (2009) Context aware trace clustering: Towards improving process mining results. In: Proc. SIAM Int. Conf. on Data Mining (SDM-09), pp 401–412. doi:10.1137/1.9781611972795.35
Chapter Google Scholar
Bose RPJC, van der Aalst WMP (2010) Trace clustering based on conserved patterns: towards achieving better process models. In: Lecture Notes in Business Information Processing, LNBIP, vol 43, pp 170–181. doi:10.1007/978-3-642-12186-9_16
Chapter Google Scholar
Buijs J (2014) Environmental permit application process (wabo), coselog project. Eindhoven University of Technology, Dataset. doi:10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270
Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Discov 7(4):399–424. doi:10.1023/A:1024992613384
Article MathSciNet Google Scholar
Chesani F, Lamma E, Mello P, Montali M, Riguzzi F, Storari S (2009) Exploiting inductive logic programming techniques for declarative process mining. In: Jensen K, van der Aalst WMP (eds.) Transactions on petri nets and other models of concurrency II: special issue on concurrency in process-aware information systems, Springer, Berlin, pp 278–295. doi:10.1007/978-3-642-00899-3_16
Chapter Google Scholar
Cohen W (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds.) Proceedings of the 12th international conference on machine learning. Morgan Kaufmann Publishers, Tahoe City, pp 115–123
Chapter Google Scholar
Collins C, Carpendale S (2007) VisLink: Revealing relationships amongst visualizations. IEEE Trans Vis Comput Graph 13(6):1192–1199. doi:10.1109/TVCG.2007.70521
Article Google Scholar
Cook JE, Wolf AL (1998) Discovering models of software processes from event-based data. ACM Trans Softw Eng Methodol 7(3):215–249
Article Google Scholar
de Medeiros AKA, Weijters AJMM, van der Aalst WMP (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2):245–304. doi:10.1007/s10618-006-0061-7
Article MathSciNet Google Scholar
de Medeiros AKA, van der Aalst WMP, Weijters AJMM (2008) Quantifying process equivalence based on observed behavior. Data Knowl Eng 64(1):55–74. doi:10.1016/j.datak.2007.06.010
Article Google Scholar
De Weerdt J, Vanden Broucke S (2014) SECPI: searching for explanations for clustered process instances. In: Lecture Notes in Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), LNCS, vol 8659, pp 408–415. doi:10.1007/978-3-319-10172-9_29
Google Scholar
De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676. doi:10.1016/j.is.2012.02.004
Article Google Scholar
De Weerdt J, Vanden Broucke S, Vanthienen J, Baesens B (2013) Active trace clustering for improved process discovery. IEEE Trans Knowl Data Eng 25(12):2708–2720. doi:10.1109/TKDE.2013.64
Article Google Scholar
Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowl-Based Syst 84:203–213. doi:10.1016/j.knosys.2015.04.012
Article Google Scholar
Dijkman R, Dumas M, Van Dongen B, Krik R, Mendling J (2011) Similarity of business process models: metrics and evaluation. Inf Syst 36(2):498–516. doi:10.1016/j.is.2010.09.006
Article Google Scholar
Dijkman RM (2007) A classification of differences between similar business processes. In: EDOC, pp 37–50. doi:10.1109/EDOC.2007.24
Dijkman RM (2008) Diagnosing differences between business process models. In: BPM, pp 261–277. doi:10.1007/978-3-540-85758-7_20
Google Scholar
Dumas M, La Rosa M, Mendling J, Reijers HA (2013) Fundamentals of business process management. Springer, Heidelberg. doi:10.1007/978-3-642-33143-5
Book Google Scholar
Ekanayake CC, Dumas M, García-Bañuelos L, La Rosa M (2013) Slice, mine and dice: complexity-aware automated discovery of business process models. In: BPM, pp 49–64. doi:10.1007/978-3-642-40176-3_6
Google Scholar
Evermann J, Thaler T, Fettke P (2016) Clustering traces using sequence alignment. In: Reichert M, Reijers HA (eds.) Business process management workshops: BPM 2015. In: 13th international workshops, Innsbruck, Austria, August 31–September 3, 2015, Revised Papers. Springer International Publishing, Cham, pp 179–190. doi:10.1007/978-3-319-42887-1_15
Chapter Google Scholar
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874. doi:10.1038/oby.2011.351
Article MATH Google Scholar
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) Knowledge discovery and data mining: towards a unifying framework. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, pp 82–88
Ferreira DR, Zacarias M, Malheiros M, Ferreira P (2007) Approaching process mining with sequence clustering: experiments and findings. In: BPM, pp 360–374. doi:10.1007/978-3-540-75183-0_26
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172
Google Scholar
Folino F, Greco G, Guzzo A, Pontieri L (2011) Mining usage scenarios in business processes: outlier-aware discovery and run-time prediction. Data Knowl Eng 70(12):1005–1029. doi:10.1016/j.datak.2011.07.002
Article Google Scholar
Fred A, Lourenço A (2008) Cluster ensemble methods: from single clusterings to combined solutions. In: Supervised and unsupervised ensemble methods and their applications, Springer, Berlin, pp 3–30. doi:10.1007/978-3-540-78981-9_1
Chapter Google Scholar
Gansner ER, Hu Y, Kobourov S (2010) Visualizing graphs and clusters as maps. IEEE Comput Graph Appl 30(6):54–66. doi:10.1109/MCG.2010.101
Article Google Scholar
Goedertier S, Martens D, Vanthienen J, Baesens B (2009) Robust process discovery with artificial negative events. J Mach Learn Res 10:1305–1340. doi:10.1145/1577069.1577113
Article MathSciNet MATH Google Scholar
Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027. doi:10.1109/TKDE.2006.123
Article Google Scholar
Günther CW, Verbeek H (2014) Xes-standard definition. BPM Center Report BPM-14-09, BPMcenterorg
Hidders J, Dumas M, van der Aalst WMP, ter Hofstede AHM, Verelst J (2005) When are two workflows the same? In: Proceedings of the 2005 Australasian symposium on theory of computing, CATS ’05, vol 41, pp 3–11. Australian Computer Society Inc., Darlinghurst. http://dl.acm.org/citation.cfm?id=1082260.1082261
Kiepuszewski B, ter Hofstede AHM, van der Aalst WMP (2003) Fundamentals of control flow in workflows. Acta Inf 39(3):143–209. doi:10.1007/s00236-002-0105-4
Article MathSciNet MATH Google Scholar
Lamma E, Mello P, Riguzzi F, Storari S (2008) Applying inductive logic programming to process mining. In: Blockeel H, Ramon J, Shavlik J, Tadepalli P (eds.) Inductive logic programming: 17th international conference, ILP 2007, Corvallis, June 19–21, 2007, Revised Selected Papers. Springer, Berlin, pp 132–146. doi:10.1007/978-3-540-78469-2_16
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707–710
MathSciNet Google Scholar
Martens D, Provost F (2014) Explaining data-driven document classifications. MIS Q 38(1):73–99
Article Google Scholar
Martens D, Baesens B, Gestel TV, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 183(3):1466–1476. doi:10.1016/j.ejor.2006.04.051
Article MATH Google Scholar
Michalski RS, Stepp RE (1983) Learning from observation: conceptual clustering. In: Machine learning. Springer, Berlin, pp 331–363
Chapter Google Scholar
Mitchell TM, Keller RM, Kedar-Cabelli ST (1986) Explanation-based generalization: a unifying view. Mach Learn 1(1):47–80. doi:10.1023/A:1022691120807
Article Google Scholar
Pesic M, Schonenberg H, van der Aalst WM (2007) Declare: full support for loosely-structured processes. In: Enterprise distributed object computing conference, 2007. EDOC 2007. 11th IEEE international, pp 287–287. doi:10.1109/EDOC.2007.14
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco
Ribeiro MT, Singh S, Guestrin C (2016) “why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 1135–1144. doi:10.1145/2939672.2939778
Rozinat A, van der Aalst WMP (2006) Decision mining in ProM. In: Business process management, pp 420–425. doi:10.1007/11841760_33
Google Scholar
Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95. doi:10.1016/j.is.2007.07.001
Article Google Scholar
Sole M, Carmona J (2011) Region-based foldings in process discovery. IEEE Trans Knowl Data Eng 25(1):192–205. doi:10.1109/TKDE.2011.192
Article Google Scholar
Song M, Günther CW, van der Aalst WMP (2008) Trace clustering in process mining. In: BPM workshops, pp 109–120. doi:10.1007/978-3-642-00328-8_11
Chapter Google Scholar
Song M, Yang H, Siadat SH, Pechenizkiy M (2013) A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Syst Appl 40:3722–3737. doi:10.1016/j.eswa.2012.12.078
Article Google Scholar
Steeman W (2013) BPI challenge 2013. Ghent University, Dataset. doi:10.4121/uuid:a7ce5c55-03a7-4583-b855-98b86e1a2b07
van der Aalst WMP (1999) Formalization and verification of event-driven process chains. Inf Softw Technol 41(10):639–650. doi:10.1016/S0950-5849(99)00016-6
Article Google Scholar
van der Aalst WMP (2016) Process mining—data science in action, 2nd edn. Springer, Berlin. doi:10.1007/978-3-662-49851-4
Book Google Scholar
van der Aalst WMP, Weijters T, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142. doi:10.1109/TKDE.2004.47
Article Google Scholar
van der Aalst WMP, de Medeiros AKA, Weijters AJMM (2006) Process equivalence: comparing two process models based on observed behavior. In: Business process management, pp 129–144. doi:10.1007/11841760_10
Google Scholar
van Dongen BF, Dijkman RM, Mendling J (2008) Measuring similarity between business process models. In: CAiSE, pp 450–464. doi:10.1007/978-3-540-69534-9_34
Google Scholar
van Glabbeek RJ, Goltz U (2001) Refinement of actions and equivalence notions for concurrent systems. Acta Inf 37(4/5):229–327. doi:10.1007/s002360000041
Article MathSciNet MATH Google Scholar
Veiga GM, Ferreira DR (2010) Understanding spaghetti models with sequence clustering for prom. In: Rinderle-Ma, S et al (ed.) BPM workshops, Springer, LNBIP, vol 43, pp 92–103. doi:10.1007/978-3-642-12186-9
Google Scholar
Viau C, McGuffin MJ, Chiricota Y, Jurisica I (2010) The FlowVizMenu and parallel scatterplot matrix: hybrid multidimensional visualizations for network exploration. IEEE Trans Vis Comput Graph 16(6):1100–1108. doi:10.1109/TVCG.2010.205
Article Google Scholar
Wang F, Sun J (2014) Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Discov 29(2):534–564. doi:10.1007/s10618-014-0356-z
Article MathSciNet Google Scholar
Weidlich M, Mendling J, Weske M (2011) Efficient consistency measurement based on behavioral profiles of process models. IEEE Trans Softw Eng 37(3):410–429. doi:10.1109/TSE.2010.96
Article Google Scholar
Weijters AJMM, van der Aalst WMP, Alves de Medeiros AK (2006) Process mining with the heuristicsminer algorithm. In: BETA working paper series 166, TU Eindhoven

Download references

Author information

Authors and Affiliations

KU Leuven - University of Leuven, Research Center for Management Informatics, Faculty of Economics and Business, Naamsestraat 69, 3000, Louvain, Belgium
Pieter De Koninck, Jochen De Weerdt & Seppe K. L. M. vanden Broucke

Authors

Pieter De Koninck
View author publications
You can also search for this author in PubMed Google Scholar
Jochen De Weerdt
View author publications
You can also search for this author in PubMed Google Scholar
Seppe K. L. M. vanden Broucke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pieter De Koninck.

Additional information

Responsible editor: Toon Calders.

Appendix

This appendix contains Tables 9, 10 and Figs. 7, 8, 9, 10, 11, 12, 13, 14 and 15.

Table 9 Results of the experimental evaluation comparing SECPI with C4.5 and RIPPER averaged over clustering techniques and datasets, replicated with a cluster number of 6

Full size table

Table 10 Results of the experimental evaluation comparing SECPI with C4.5 and RIPPER averaged over clustering techniques and datasets, replicated with a cluster number of 8

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Koninck, P., De Weerdt, J. & vanden Broucke, S.K.L.M. Explaining clusterings of process instances. Data Min Knowl Disc 31, 774–808 (2017). https://doi.org/10.1007/s10618-016-0488-4

Download citation

Received: 17 February 2016
Accepted: 21 November 2016
Published: 05 December 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10618-016-0488-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explaining clusterings of process instances

Abstract

Access this article

Similar content being viewed by others

SECPI: Searching for Explanations for Clustered Process Instances

An Approach for Incorporating Expert Knowledge in Trace Clustering

Expert-driven trace clustering with instance-level constraints

Notes

References