Evaluating Trace Encoding Methods in Process Mining

Barbon Junior, Sylvio; Ceravolo, Paolo; Damiani, Ernesto; Marques Tavares, Gabriel

doi:10.1007/978-3-030-70650-0_11

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12611))

Included in the following conference series:

International Symposium: From Data to Models and Back

1164 Accesses
9 Citations
3 Altmetric

Abstract

Encoding methods affect the performance of process mining tasks but little work in the literature focused on quantifying their impact. In this paper, we compare 10 different encoding methods from three different families (trace replay and alignment, graph embeddings, and word embeddings) using measures to evaluate the overlaps in the feature space, the accuracy obtained, and the computational resources (time) consumed with a classification task. Across hundreds of event logs representing four variations of five scenarios and five anomalies, it was possible to identify the edge2vec method as the most accurate and effective in reducing class overlapping in the feature space.

This study was financed in part by Coordination for the National Council for Scientific and Technological Development (CNPq) of Brazil - Grant of Project 420562/2018-4 and Fundação Araucária (Paraná, Brazil). It was also partly supported by the program “Piano di sostegno alla ricerca 2019” funded by Università degli Studi di Milano.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bezerra, F., Wainer, J.: Algorithms for anomaly detection of traces in logs of process aware information systems. Inf. Syst. 38(1), 33–44 (2013)
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Bose, R.J.C., Van der Aalst, W.M.: Context aware trace clustering: towards improving process mining results. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 401–412 (2009)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Burattin, A.: PLG2: multiperspective processes randomization and simulation for online and offline settings (2015)
Google Scholar
Carmona, J., van Dongen, B.F., Solti, A., Weidlich, M.: Conformance Checking - Relating Processes and Models. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-319-99414-7
Book Google Scholar
Ceravolo, P., Tavares, G.M., Junior, S.B., Damiani, E.: Evaluation goals for online process mining: a concept drift perspective. IEEE Trans. Serv. Comput. 1 (2020). https://ieeexplore.ieee.org/abstract/document/9124702
Ceravolo, P., Damiani, E., Torabi, M., Barbon, S.: Toward a new generation of log pre-processing methods for process mining. In: Carmona, J., Engels, G., Kumar, A. (eds.) BPM 2017. LNBIP, vol. 297, pp. 55–70. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65015-9_4
Chapter Google Scholar
Chinosi, M., Trombetta, A.: BPMN: an introduction to the standard. Comput. Stand. Interfaces 34(1), 124–134 (2012)
Article Google Scholar
Cummins, L., Bridge, D.: On dataset complexity for case base maintenance. In: Ram, A., Wiratunga, N. (eds.) ICCBR 2011. LNCS (LNAI), vol. 6880, pp. 47–61. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23291-6_6
Chapter Google Scholar
De Koninck, P., vanden Broucke, S., De Weerdt, J.: act2vec, trace2vec, log2vec, and model2vec: representation learning for business processes. In: Weske, M., Montali, M., Weber, I., vom Brocke, J. (eds.) BPM 2018. LNCS, vol. 11080, pp. 305–321. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98648-7_18
Chapter Google Scholar
Delias, P., Doumpos, M., Grigoroudis, E., Matsatsinis, N.: A non-compensatory approach for trace clustering. Int. Trans. Oper. Res. 26(5), 1828–1846 (2019)
Article MathSciNet Google Scholar
Fani Sani, M., van Zelst, S.J., van der Aalst, W.M.P.: Conformance checking approximation using subset selection and edit distance. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 234–251. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_15
Chapter Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: a survey. Knowl.-Based Syst. 151, 78–94 (2018)
Article Google Scholar
Hake, P., Zapp, M., Fettke, P., Loos, P.: Supporting business process modeling using RNNs for label classification. In: Frasincar, F., Ittoo, A., Nguyen, L.M., Métais, E. (eds.) NLDB 2017. LNCS, vol. 10260, pp. 283–286. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59569-6_35
Chapter Google Scholar
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24, 289–300 (2002)
Article Google Scholar
Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Scalable process discovery with guarantees. In: Gaaloul, K., Schmidt, R., Nurcan, S., Guerreiro, S., Ma, Q. (eds.) CAISE 2015. LNBIP, vol. 214, pp. 85–101. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19237-6_6
Chapter Google Scholar
Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 297–313. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23063-4_21
Chapter Google Scholar
Lorena, A.C., Garcia, L.P.F., Lehmann, J., Souto, M.C.P., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. 52(5), 1–34 (2019)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Nolle, T., Luettgen, S., Seeliger, A., Mühlhäuser, M.: Analyzing business process anomalies using autoencoders. Mach. Learn. 107(11), 1875–1893 (2018). https://doi.org/10.1007/s10994-018-5702-8
Article MathSciNet MATH Google Scholar
Nolle, T., Luettgen, S., Seeliger, A., Mühlhäuser, M.: BINet: multi-perspective business process anomaly classification. Inf. Syst. 101458 (2019). https://www.sciencedirect.com/journal/information-systems/special-issue/10419P9FG88
Polato, M., Sperduti, A., Burattin, A., de Leoni, M.D.: Time and activity sequence prediction of business process instances. Computing 100(9), 1005–1031 (2018). https://doi.org/10.1007/s00607-018-0593-x
Article MathSciNet Google Scholar
Rozinat, A., van der Aalst, W.: Conformance checking of processes based on monitoring real behavior. Inf. Syst. 33(1), 64–95 (2008)
Article Google Scholar
Russell, N., ter Hofstede, A., van der Aalst, W., Mulyar, N.: Workflow control-flow patterns: a revised view. BPM reports (2006)
Google Scholar
van der Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)
Article Google Scholar
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., Attenberg, J.: Feature hashing for large scale multitask learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 1113–1120. Association for Computing Machinery (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Londrina State University (UEL), Londrina, Brazil
Sylvio Barbon Junior
Università degli Studi di Milano (UNIMI), Milan, Italy
Paolo Ceravolo & Gabriel Marques Tavares
Khalifa University (KUST), Abu Dhabi, UAE
Ernesto Damiani

Authors

Sylvio Barbon Junior
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Ceravolo
View author publications
You can also search for this author in PubMed Google Scholar
Ernesto Damiani
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Marques Tavares
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel Marques Tavares .

Editor information

Editors and Affiliations

University of St Andrews, St Andrews, UK
Juliana Bowles
ISTI-CNR, Pisa, Italy
Giovanna Broccia
ISTI-CNR, Pisa, Italy
Mirco Nanni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbon Junior, S., Ceravolo, P., Damiani, E., Marques Tavares, G. (2021). Evaluating Trace Encoding Methods in Process Mining. In: Bowles, J., Broccia, G., Nanni, M. (eds) From Data to Models and Back. DataMod 2020. Lecture Notes in Computer Science(), vol 12611. Springer, Cham. https://doi.org/10.1007/978-3-030-70650-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-70650-0_11
Published: 05 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70649-4
Online ISBN: 978-3-030-70650-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics