Advertisement

Event Log Reconstruction Using Autoencoders

  • Hoang Thi Cam Nguyen
  • Marco ComuzziEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11434)

Abstract

Poor quality of process event logs prevents high quality business process analysis and improvement. Process event logs quality decreases because of missing attribute values or after incorrect or irrelevant attribute values are identified and removed. Reconstructing a correct value for these missing attributes is likely to increase the quality of event log-based process analyses. Traditional statistical reconstruction methods work poorly with event logs, because of the complex interrelations among attributes, events and cases. Machine learning approaches appear more suitable in this context, since they can learn complex models of event logs through training. This paper proposes a method for reconstructing missing attribute values in event logs based on the use of autoencoders. Autoencoders are a class of feed-forward neural networks that reconstruct their own input after having learnt a model of its latent distribution. They suit problems of unsupervised learning, such as the one considered in this paper. When reconstructing missing attribute values in an event log, in fact, one cannot assume that a training set with true labels is available for model training. The proposed method is evaluated on two real event logs against baseline methods commonly used in the literature for imputing missing values in large datasets.

Keywords

Event log Business process Data quality Neural network 

References

  1. 1.
    Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev. Comput. Stat. 2(4), 433–459 (2010)CrossRefGoogle Scholar
  2. 2.
    Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41(3), 16 (2009)CrossRefGoogle Scholar
  3. 3.
    Bayomie, D., Helal, I.M.A., Awad, A., Ezat, E., ElBastawissi, A.: Deducing case IDs for unlabeled event logs. In: Reichert, M., Reijers, H.A. (eds.) BPM 2015. LNBIP, vol. 256, pp. 242–254. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-42887-1_20CrossRefGoogle Scholar
  4. 4.
    Beaulieu-Jones, B.K., Moore, J.H.: Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing, pp. 207–218. World Scientific (2017)Google Scholar
  5. 5.
    Bose, R.J.C., Mans, R.S., van der Aalst, W.M.: Wanna improve process mining results? In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 127–134. IEEE (2013)Google Scholar
  6. 6.
    Chen, X.W., Lin, X.: Big data deep learning: challenges and perspectives. IEEE Access 2, 514–525 (2014)CrossRefGoogle Scholar
  7. 7.
    Cheng, H.-J., Kumar, A.: Process mining on noisy logs-can log sanitization help to improve performance? Decis. Support Syst. 79, 138–149 (2015)CrossRefGoogle Scholar
  8. 8.
    Doersch, C.: Tutorial on variational autoencoders. Arxiv preprint (2016)Google Scholar
  9. 9.
    Kingma, D.P., Adam, J.Ba.: A method for stochastic optimization. CoRR, abs/1412.6980 (2014)Google Scholar
  10. 10.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. ArXiv e-prints, December 2013Google Scholar
  11. 11.
    Mans, R.S., van der Aalst, W.M.P., Vanwersch, R.J.B., Moleman, A.J.: Process mining in healthcare: data challenges when answering frequently posed questions. In: Lenz, R., Miksch, S., Peleg, M., Reichert, M., Riaño, D., ten Teije, A. (eds.) KR4HC/ProHealth -2012. LNCS (LNAI), vol. 7738, pp. 140–153. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-36438-9_10CrossRefGoogle Scholar
  12. 12.
    Nolle, T., Seeliger, A., Mühlhäuser, M.: Unsupervised anomaly detection in noisy business process event logs using denoising autoencoders. In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 442–456. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46307-0_28CrossRefGoogle Scholar
  13. 13.
    Rogge-Solti, A., Mans, R.S., van der Aalst, W.M.P., Weske, M.: Improving documentation by repairing event logs. In: Grabis, J., Kirikova, M., Zdravkovic, J., Stirna, J. (eds.) PoEM 2013. LNBIP, vol. 165, pp. 129–144. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41641-5_10CrossRefGoogle Scholar
  14. 14.
    Rogge-Solti, A., Mans, R.S., van der Aalst, W.M.P., Weske, M.: Repairing event logs using timed process models. In: Demey, Y.T., Panetto, H. (eds.) OTM 2013. LNCS, vol. 8186, pp. 705–708. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41033-8_89CrossRefGoogle Scholar
  15. 15.
    Rogge-Solti, A., Senderovich, A., Weidlich, M., Mendling, J., Gal, A.: In log and model we trust? In: EMISA, pp. 91–94 (2016)Google Scholar
  16. 16.
    Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of 2nd Workshop on Machine Learning for Sensory Data Analysis, MLSDA 2014, pp. 4–11 (2014)Google Scholar
  17. 17.
    Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using mice: a caliber study. Am. J. Epidemiol. 179(6), 764–774 (2014)CrossRefGoogle Scholar
  18. 18.
    Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Shawe-Taylor, J., et al. (ed.) Advances in Neural Information Processing Systems, vol. 24, pp. 801–809 (2011)Google Scholar
  19. 19.
    Suriadi, S., Andrews, R., ter Hofstede, A.H., Wynn, M.T.: Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Inf. Syst. 64, 132–150 (2017)CrossRefGoogle Scholar
  20. 20.
    Tax, N., Verenich, I., La Rosa, M., Dumas, M.: Predictive business process monitoring with LSTM neural networks. In: Dubois, E., Pohl, K. (eds.) CAiSE 2017. LNCS, vol. 10253, pp. 477–492. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-59536-8_30CrossRefGoogle Scholar
  21. 21.
    van Eck, M.L., Lu, X., Leemans, S.J.J., van der Aalst, W.M.P.: PM\(^2\): a process mining project methodology. In: Zdravkovic, J., Kirikova, M., Johannesson, P. (eds.) CAiSE 2015. LNCS, vol. 9097, pp. 297–313. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19069-3_19CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Trusting SocialHo Chi Minh CityVietnam
  2. 2.Ulsan National Institute of Science and TechnologyUlsanRepublic of Korea

Personalised recommendations