Abstract
A great concern for organizations is to detect anomalous process instances within their business processes. For that, conformance checking performs model-aware analysis by comparing process logs to business models for the detection of anomalous process executions. However, in several scenarios, a model is either unavailable or its generation is costly, which requires the employment of alternative methods to allow a confident representation of traces. This work supports the analysis of language inspired process analysis grounded in the word2vec encoding algorithm. We argue that natural language encodings correctly model the behavior of business processes, supporting a proper distinction between common and anomalous behavior. In the experiments, we compared accuracy and time cost among different word2vec setups and classic encoding methods (token-based replay and alignment features), addressing seven different anomaly scenarios. Feature importance values and the impact of different anomalies in seven event logs were also evaluated to bring insights on the trace representation subject. Results show the proposed encoding overcomes representational capability of traditional conformance metrics for the anomaly detection task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
van der Aalst, W., Adriansyah, A., van Dongen, B.: Replaying history on process models for conformance checking and performance analysis. WIREs Data Min. Knowl. Disc. 2(2), 182–192 (2012)
van der Aalst, W.M.P.: Process Mining: Data Science in Action, 2nd edn. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4
Barbon Junior, S., Tavares, G.M., da Costa, V.G.T., Ceravolo, P., Damiani, E.: A framework for human-in-the-loop monitoring of concept-drift detection in event log stream. In: Companion Proceedings of the The Web Conference 2018, pp. 319–326. WWW 2018, International World Wide Web Conferences Steering Committee (2018)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(10), 281–305 (2012)
Berti, A., van Zelst, S.J., van der Aalst, W.: Process mining for python (pm4py): Bridging the gap between process- and data science (2019)
Bezerra, F., Wainer, J.: Algorithms for anomaly detection of traces in logs of process aware information systems. Inf. Syst. 38(1), 33–44 (2013)
Böhmer, K., Rinderle-Ma, S.: Multi-perspective anomaly detection in business process execution events. In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033, pp. 80–98. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48472-3_5
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Burattin, A.: Plg2: Multiperspective processes randomization and simulation for online and offline settings (2015)
De Koninck, P., vanden Broucke, S., De Weerdt, J.: act2vec, trace2vec, log2vec, and model2vec: representation learning for business processes. In: Weske, M., Montali, M., Weber, I., vom Brocke, J. (eds.) BPM 2018. LNCS, vol. 11080, pp. 305–321. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98648-7_18
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32. p. II-1188-II-1196. ICML 2014, JMLR.org (2014)
Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Scalable process discovery with guarantees. In: Gaaloul, K., Schmidt, R., Nurcan, S., Guerreiro, S., Ma, Q. (eds.) Enterprise, Business-Process and Information Systems Modeling, pp. 85–101. Springer, Cham (2015)
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Nolle, T., Luettgen, S., Seeliger, A., Mühlhäuser, M.: Analyzing business process anomalies using autoencoders. Mach. Learn. 107(11), 1875–1893 (2018)
Nolle, T., Luettgen, S., Seeliger, A., Mühlhäuser, M.: Binet: multi-perspective business process anomaly classification. Inf. Syst. 1, 101458 (2019)
Nolle, T., Seeliger, A., Mühlhäuser, M.: Unsupervised anomaly detection in noisy business process event logs using denoising autoencoders. In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 442–456. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_28
Nolle, T., Seeliger, A., Mühlhäuser, M.: BINet: multivariate business process anomaly detection using deep learning. In: Weske, M., Montali, M., Weber, I., vom Brocke, J. (eds.) BPM 2018. LNCS, vol. 11080, pp. 271–287. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98648-7_16
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rozinat, A., van der Aalst, W.: Conformance checking of processes based on monitoring real behavior. Inf. Syst. 33(1), 64–95 (2008)
Tavares, G.M., Ceravolo, P., Turrisi Da Costa, V.G., Damiani, E., Barbon Junior, S.: Overlapping analytic stages in online process mining. In: 2019 IEEE International Conference on Services Computing (SCC), pp. 167–175, July 2019
Tavares, G.M., Turrisi Da Costa, V.G., Martins, V., Ceravolo, P., Barbon Junior, S.: Leveraging anomaly detection in business process with data stream mining. iSys - Revista Brasileira de Sistemas de Informação 12(1), 54–75 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tavares, G.M., Barbon, S. (2020). Analysis of Language Inspired Trace Representation for Anomaly Detection. In: Bellatreche, L., et al. ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium. TPDL ADBIS 2020 2020. Communications in Computer and Information Science, vol 1260. Springer, Cham. https://doi.org/10.1007/978-3-030-55814-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-55814-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55813-0
Online ISBN: 978-3-030-55814-7
eBook Packages: Computer ScienceComputer Science (R0)