Similarity-Based Approaches for Determining the Number of Trace Clusters in Process Discovery

De Koninck, Pieter; De Weerdt, Jochen

doi:10.1007/978-3-662-55862-1_2

Similarity-Based Approaches for Determining the Number of Trace Clusters in Process Discovery

Chapter
First Online: 20 September 2017

540 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((TOPNOC,volume 10470))

Abstract

Given the complexity of real-life event logs, several trace clustering techniques have been proposed to partition an event log into subsets with a lower degree of variation. In general, these techniques assume that the number of clusters is known in advance. However, this will rarely be the case in practice. Therefore, this paper presents approaches to determine the appropriate number of clusters in a trace clustering context. In order to fulfil the objective of identifying the most appropriate number of trace clusters, two approaches built on similarity are proposed: a stability- and a separation-based method. The stability-based method iteratively calculates the similarity between clustered versions of perturbed and unperturbed event logs. Alternatively, an approach based on between-cluster dissimilarity, or separation, is proposed. Regarding practical validation, both approaches are tested on multiple real-life datasets to investigate the complementarity of the different components. Our results suggest that both methods are successful in identifying an appropriate number of trace clusters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
This approach is implemented as an experimental ProM-plugin which can be found on http://www.processmining.be/clusterstability/.
2.
For more info on the XES-standard, we refer to http://www.xes-standard.org/.
3.
The first two methods are implemented in the ProM-framework for process mining in the ActiTrac-plugin. The latter five methods are implemented in the GuideTree-Miner-plugin.
4.
The visual representations of the MCRM- and MOA-event logs are available on http://www.processmining.be/clusterstability/ToPNoCResults.

References

van der Aalst, W.: Process Mining: Data Science in Action. Springer, Berlin (2016)
Book Google Scholar
Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12186-9_16
Chapter Google Scholar
Bose, R., Aalst, W.V.D.: Context aware trace clustering: towards improving process mining results. In: SDM, pp. 401–412 (2009)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)
Article Google Scholar
De Koninck, P., De Weerdt, J.: Determining the number of trace clusters: a stability-based approach. In: Proceedings of the International Workshop on Algorithms & Theories for the Analysis of Event Data (ATAED) 2016, vol. 1592, pp. 1–15. CEUR-ws Workshop Proceedings (2016)
Google Scholar
De Koninck, P., De Weerdt, J.: A stability assessment framework for process discovery techniques. In: La Rosa, M., Loos, P., Pastor, O. (eds.) BPM 2016. LNCS, vol. 9850, pp. 57–72. Springer, Cham (2016). doi:10.1007/978-3-319-45348-4_4
Chapter Google Scholar
De Medeiros, A.K.A., Weijters, A.J.M.M., Van Der Aalst, W.M.P.: Genetic process mining: an experimental evaluation. Data Min. Knowl. Discov. 14(2), 245–304 (2007)
Article MathSciNet Google Scholar
De Weerdt, J., De Backer, M., Vanthienen, J., Baesens, B.: A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inform. Syst. 37(7), 654–676 (2012)
Article Google Scholar
De Weerdt, J., Vanden Broucke, S., Vanthienen, J., Baesens, B.: Active trace clustering for improved process discovery. IEEE Trans. Knowl. Data Eng. 25(12), 2708–2720 (2013)
Article Google Scholar
Delias, P., Doumpos, M., Grigoroudis, E., Manolitzas, P., Matsatsinis, N.: Supporting healthcare management decisions via robust clustering of event logs. Knowledge-Based Syst. 84, 203–213 (2015)
Article Google Scholar
Di Ciccio, C., Mecella, M., Mendling, J.: The effect of noise on mined declarative constraints. In: Ceravolo, P., Accorsi, R., Cudre-Mauroux, P. (eds.) SIMPDA 2013. LNBIP, vol. 203, pp. 1–24. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46436-6_1
Google Scholar
Dijkman, R., Dumas, M., Van Dongen, B., Krik, R., Mendling, J.: Similarity of business process models: metrics and evaluation. Inform. Syst. 36(2), 498–516 (2011)
Article Google Scholar
van Dongen, B., Dijkman, R., Mendling, J.: Measuring similarity between business process models. In: Bellahsène, Z., Léonard, M. (eds.) CAiSE 2008. LNCS, vol. 5074, pp. 450–464. Springer, Heidelberg (2008). doi:10.1007/978-3-540-69534-9_34
Chapter Google Scholar
Ekanayake, C.C., Dumas, M., García-Bañuelos, L., La Rosa, M.: Slice, mine and dice: complexity-aware automated discovery of business process models. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 49–64. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40176-3_6
Chapter Google Scholar
Evermann, J., Thaler, T., Fettke, P.: Clustering traces using sequence alignment. In: Reichert, M., Reijers, H.A. (eds.) BPM 2015. LNBIP, vol. 256, pp. 179–190. Springer, Cham (2016). doi:10.1007/978-3-319-42887-1_15
Chapter Google Scholar
Ferreira, D., Zacarias, M., Malheiros, M., Ferreira, P.: Approaching process mining with sequence clustering: experiments and findings. In: Alonso, G., Dadam, P., Rosemann, M. (eds.) BPM 2007. LNCS, vol. 4714, pp. 360–374. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75183-0_26
Chapter Google Scholar
Folino, F., Greco, G., Guzzo, A., Pontieri, L.: Editorial: mining usage scenarios in business processes: outlier-aware discovery and run-time prediction. Data Knowl. Eng. 70, 1005–1029 (2011)
Article Google Scholar
Fred, A., Lourenço, A.: Cluster ensemble methods: from single clusterings to combined solutions. Stud. Comput. Intell. 126, 3–30 (2008)
Google Scholar
Goedertier, S., Martens, D., Vanthienen, J., Baesens, B.: Robust process discovery with artificial negative events. J. Mach. Learn. Res. 10, 1305–1340 (2009)
MathSciNet MATH Google Scholar
Greco, G., Guzzo, A., Pontieri, L., Saccà, D.: Discovering expressive process models by clustering log traces. IEEE Trans. Knowl. Data Eng. 18(8), 1010–1027 (2006)
Article Google Scholar
Jagadeesh Chandra Bose, R.P., van der Aalst, W.M.P.: Abstractions in process mining: a taxonomy of patterns. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 159–175. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03848-8_12
Chapter Google Scholar
Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16(6), 1299–1323 (2004)
Article MATH Google Scholar
Lee, Y., Lee, J.H., Jun, C.H.: Validation measures of bicluster solutions. Ind. Eng. Manag. Syst. 8(2), 101–108 (2009)
MathSciNet Google Scholar
Lee, Y., Lee, J., Jun, C.H.: Stability-based validation of bicluster solutions. Pattern Recognit. 44(2), 252–264 (2011)
Article MATH Google Scholar
Maruster, L.: A machine learning approach to understand business processes. Eindhoven University of Technology (2003)
Google Scholar
Mirkin, B.: Choosing the number of clusters. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1, 252–260 (2011)
Article Google Scholar
Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00328-8_11
Chapter Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B (Statistical Methodol.) 63, 411–423 (2001)
Google Scholar
Van der Aalst, W., Adriansyah, A., Van Dongen, B.: Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2(2), 182–192 (2012)
Article Google Scholar
Weidlich, M., Polyvyanyy, A., Desai, N., Mendling, J., Weske, M.: Process compliance analysis based on behavioural profiles. Inform. Syst. 36(7), 1009–1025 (2011)
Article MATH Google Scholar
Weijters, A.J.M.M., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integr. Comput. Eng. 10, 151–162 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Center for Management Informatics, Faculty of Economics and Business, KU Leuven, Leuven, Belgium
Pieter De Koninck & Jochen De Weerdt

Authors

Pieter De Koninck
View author publications
You can also search for this author in PubMed Google Scholar
Jochen De Weerdt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pieter De Koninck .

Editor information

Editors and Affiliations

Newcastle University, Newcastle upon Tyne, United Kingdom
Maciej Koutny
LIACS, Leiden University, Leiden, The Netherlands
Jetty Kleijn
Polish Academy of Sciences, Institute of Computer Science, Warsaw, Poland
Wojciech Penczek

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

De Koninck, P., De Weerdt, J. (2017). Similarity-Based Approaches for Determining the Number of Trace Clusters in Process Discovery. In: Koutny, M., Kleijn, J., Penczek, W. (eds) Transactions on Petri Nets and Other Models of Concurrency XII. Lecture Notes in Computer Science(), vol 10470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55862-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-55862-1_2
Published: 20 September 2017
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-55861-4
Online ISBN: 978-3-662-55862-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics