Abstract
Real-life event logs are typically much less structured and more complex than the predefined business activities they refer to. Most of the existing process mining techniques assume that there is a one-to-one mapping between process model activities and events recorded during process execution. Unfortunately, event logs and process model activities are defined at different levels of granularity. The challenges posed by this discrepancy can be addressed by means of log-lifting. In this work we develop a machine-learning-based framework aimed at bridging the abstraction level gap between logs and process models. The proposed framework operates of two main phases: log segmentation and machine-learning-based classification. The purpose of the segmentation phase is to identify the potential segment separators in a flow of low-level events, in which each segment corresponds to an unknown high-level activity. For this, we propose a segmentation algorithm based on maximum likelihood with n-gram analysis. In the second phase, event segments are mapped into their corresponding high-level activities using a supervised machine learning technique. Several machine learning classification methods are explored including ANNs, SVMs, and random forest. We demonstrate the applicability of our framework using a real-life event log provided by the SAP company. The results obtained show that a machine learning approach based on the random forest algorithm outperforms the other methods with an accuracy of 96.4%. The testing time was found to be around 0.01s, which makes the algorithm a good candidate for real-time deployment scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
SAP dataset https://doi.org/10.5281/zenodo.2566022.
References
Van der Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)
Van der Aalst, W.M.: Process Mining - Discovery, Conformance and Enhancement of Business Processes. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19345-3
Altendrof, J., Brende, P., Lessard, L.: Fraud detection for online retail using random forests. Technical report (2005)
Boinee, P., De Angelis, A., Foresti, G.L.: Ensembling classifiers-an application to image data classification from Cherenkov telescope experiment. In: IEC (Prague), pp. 394–398 (2005)
Bose, R.P.J.C., Verbeek, E.H.M.W., van der Aalst, W.M.P.: Discovering hierarchical process models using ProM. In: Nurcan, S. (ed.) CAiSE Forum 2011. LNBIP, vol. 107, pp. 33–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29749-6_3
Casati, F., Shan, M.-C.: Semantic analysis of business process executions. In: Jensen, C.S., et al. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 287–296. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45876-X_19
Ceravolo, P., Damiani, E., Torabi, M., Barbon, S.: Toward a new generation of log pre-processing methods for process mining. In: Carmona, J., Engels, G., Kumar, A. (eds.) BPM 2017. LNBIP, vol. 297, pp. 55–70. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65015-9_4
Alves de Medeiros, A.K., van der Aalst, W.M.P.: Process mining towards semantics. In: Dillon, T.S., Chang, E., Meersman, R., Sycara, K. (eds.) Advances in Web Semantics I. LNCS, vol. 4891, pp. 35–80. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89784-2_3
de Medeiros, A.K.A., et al.: An outlook on semantic business process mining and monitoring. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2007. LNCS, vol. 4806, pp. 1244–1255. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76890-6_52
van der Aalst, W.M.P., de Medeiros, A.K.A., Weijters, A.J.M.M.: Genetic process mining. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 48–69. Springer, Heidelberg (2005). https://doi.org/10.1007/11494744_5
Diaconis, P.: The Markov chain Monte Carlo revolution. Bull. Am. Math. Soc. 46(2), 179–205 (2009)
Dumas, M., Van der Aalst, W.M., Ter Hofstede, A.H.: Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, New York (2005)
Fazzinga, B., Flesca, S., Furfaro, F., Masciari, E., Pontieri, L.: Efficiently interpreting traces of low level events in business process logs. Inf. Syst. 73, 1–24 (2018)
Folleco, A., Khoshgoftaar, T.M., Van Hulse, J., Bullard, L.: Software quality modeling: the impact of class noise on the random forest classifier. In: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), CEC 2008, pp. 3853–3859. IEEE (2008)
Grando, M.A., Schonenberg, M., van der Aalst, W.M.: Semantic process mining for the verification of medical recommendations. In: HEALTHINF, pp. 5–16 (2011)
Günther, C.W., van der Aalst, W.M.: Mining activity clusters from low-level event logs. Beta, Research School for Operations Management and Logistics (2006)
Günther, C.W., Rozinat, A., van der Aalst, W.M.P.: Activity mining by global trace segmentation. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 128–139. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12186-9_13
Jareevongpiboon, W., Janecek, P.: Ontological approach to enhance results of business process mining and analysis. Bus. Process. Manag. J. 19(3), 459–476 (2013)
Leonardi, G., Striani, M., Quaglini, S., Cavallini, A., Montani, S.: Towards semantic process mining through knowledge-based trace abstraction. In: Ceravolo, P., van Keulen, M., Stoffel, K. (eds.) SIMPDA 2017. LNBIP, vol. 340, pp. 45–64. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11638-5_3
Li, J., Bose, R.P.J.C., van der Aalst, W.M.P.: Mining context-dependent and interactive business process maps using execution patterns. In: zur Muehlen, M., Su, J. (eds.) BPM 2010. LNBIP, vol. 66, pp. 109–121. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20511-8_10
Ma, Y., Guo, L., Cukic, B.: A statistical framework for the prediction of fault-proneness. In: Advances in Machine Learning Applications in Software Engineering, pp. 237–263. IGI Global (2007)
Mannhardt, F., de Leoni, M., Reijers, H.A., van der Aalst, W.M.P., Toussaint, P.J.: From low-level events to activities - a pattern-based approach. In: La Rosa, M., Loos, P., Pastor, O. (eds.) BPM 2016. LNCS, vol. 9850, pp. 125–141. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45348-4_8
Pérez-Castillo, R., Weber, B., de Guzmán, I.G.R., Piattini, M., Pinggera, J.: Assessing event correlation in non-process-aware information systems. Softw. Syst. Model. 13(3), 1117–1139 (2014)
Veiga, G.M., Ferreira, D.R.: Understanding spaghetti models with sequence clustering for ProM. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 92–103. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12186-9_10
Weijters, A., van der Aalst, W., Alves de Medeiros, A.: Process mining with the heuristics algorithm. Technical report, BETA Working Paper Series 166, TU Eindhoven (2006)
Zhang, J., Zulkernine, M.: Network intrusion detection using random forests. In: PST. Citeseer (2005)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tello, G., Gianini, G., Mizouni, R., Damiani, E. (2019). Machine Learning-Based Framework for Log-Lifting in Business Process Mining Applications. In: Hildebrandt, T., van Dongen, B., Röglinger, M., Mendling, J. (eds) Business Process Management. BPM 2019. Lecture Notes in Computer Science(), vol 11675. Springer, Cham. https://doi.org/10.1007/978-3-030-26619-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-26619-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26618-9
Online ISBN: 978-3-030-26619-6
eBook Packages: Computer ScienceComputer Science (R0)