Skip to main content

Demystifying Noise and Outliers in Event Logs: Review and Future Directions

Part of the Lecture Notes in Business Information Processing book series (LNBIP,volume 436)

Abstract

Various process mining techniques exist, e.g., techniques that automatically discover a descriptive model of the execution of a process, based on event data. Whereas the premise of process mining is clear, i.e., as witnessed by the tremendous growth of the field, data quality issues often hamper the direct applicability of process mining techniques. Several authors have studied data quality issues in process mining, yet, these works primarily propose data pre-processing techniques. An overarching study of the nature of data quality issues, the types of available techniques, and the general possibilities of (semi)-automated outlier/noise detection methods is missing. Therefore, in this paper, we propose a first attempt to structure and study the field of outlier/noise detection in process mining and understand to what degree knowledge on noise and outliers from other domains could advance the process mining field. We do so by answering three central research questions, covering various aspects related to (semi)-automated outlier/noise detection.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-94343-1_10
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-94343-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    To generate the event log we used the Petri Net-based Event Log Generator http://processmining.be/loggenerator/.

  2. 2.

    https://www.promtools.org/doku.php.

  3. 3.

    Available tools do not resolve synonyms nor homonyms. Therefore we restricted our analysis only to attribute noise.

  4. 4.

    https://fluxicon.com/disco/.

References

  1. van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second edn. Springer, Heidelberg (2016)

    Google Scholar 

  2. Augusto, A., et al.: Automated discovery of process models from event logs: review and benchmark. IEEE Trans. Knowl. Data Eng. 31(4), 686–705 (2019)

    CrossRef  Google Scholar 

  3. Conforti, R., Rosa, M.L., ter Hofstede, A.H.M.: Filtering out infrequent behavior from business process event logs. IEEE TKDE 29(2), 300–314 (2017)

    Google Scholar 

  4. van Zelst, S.J., Sani, M.F., Ostovar, A., Conforti, R., Rosa, M.L.: Detection and removal of infrequent behavior from event streams of business processes. Inf. Syst. 90, 101451 (2020)

    CrossRef  Google Scholar 

  5. Fani Sani, M., van Zelst, S.J., van der Aalst, W.M.P.: Applying sequence mining for outlier detection in process mining. In: Panetto, H., Debruyne, C., Proper, H.A., Ardagna, C.A., Roman, D., Meersman, R. (eds.) OTM 2018. LNCS, vol. 11230, pp. 98–116. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02671-4_6

    CrossRef  Google Scholar 

  6. Freedman, D.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2005)

    CrossRef  Google Scholar 

  7. Ord, K.: Outliers in statistical data, 3rd edition, (john wiley & sons, chichester). Int. J. Forecast. 12(1), 175–176 (1996)

    Google Scholar 

  8. Ghionna, L., Greco, G., Guzzo, A., Pontieri, L.: Outlier detection techniques for process mining applications. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) ISMIS 2008. LNCS (LNAI), vol. 4994, pp. 150–159. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68123-6_17

    CrossRef  Google Scholar 

  9. Sani, M.F., van Zelst, S.J., van der Aalst, W.M.P.: Improving process discovery results by filtering outliers using conditional behavioural probabilities. In: Teniente, E., Weidlich, M. (eds.) BPM 2017. LNBIP, vol. 308, pp. 216–229. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74030-0_16

    CrossRef  Google Scholar 

  10. van Zelst, S.J., Sani, M.F., Ostovar, A., Conforti, R., Rosa, M.L.: Filtering spurious events from event streams of business processes. In: CAiSE 2018, Proceedings, pp. 35–52 (2018)

    Google Scholar 

  11. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)

    Google Scholar 

  12. Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE TKDE 26(9), 2250–2267 (2014)

    Google Scholar 

  13. Koschmider, A., Mannhardt, F., Heuser, T.: On the contextualization of event-activity mappings. In: BPM 2018 International Workshops, pp. 445–457 (2018)

    Google Scholar 

  14. Aggarwal, C.C.: Outlier Analysis. 2nd edn. Springer, Heidelberg (2016)

    Google Scholar 

  15. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE Trans. Knowl. Data Eng. 24(5), 823–839 (2012)

    CrossRef  Google Scholar 

  16. Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study of their impacts. Artif. Intell. Rev. 22(3), 177–210 (2004)

    Google Scholar 

  17. SáEz, J.A., Galar, M., Luengo, J., Herrera, F.: Tackling the problem of classification with noisy data using multiple classifier systems: analysis of the performance and robustness. Inf. Sci. 247, 1–20 (2013)

    CrossRef  Google Scholar 

  18. Khoshgoftaar, T.M., Van Hulse, J.: Empirical case studies in attribute noise detection. IEEE Trans. Syst. Man Cybern. 39(4), 379–388 (2009)

    CrossRef  Google Scholar 

  19. Gupta, S., Gupta, A.: Dealing with noise problem in machine learning data-sets: a systematic review. Procedia Comput. Sci. 161, 466–474 (2019). The Fifth Information Systems International Conference

    Google Scholar 

  20. Dixit, P.M., et al.: Detection and interactive repair of event ordering imperfection in process logs. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 274–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_17

    CrossRef  Google Scholar 

  21. Andrews, R., Suriadi, S., Ouyang, C., Poppe, E.: Towards event log querying for data quality. In: Panetto, H., Debruyne, C., Proper, H.A., Ardagna, C.A., Roman, D., Meersman, R. (eds.) OTM 2018. LNCS, vol. 11229, pp. 116–134. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02610-3_7

    CrossRef  Google Scholar 

  22. van Zelst, S.J., Mannhardt, F., de Leoni, M., Koschmider, A.: Event abstraction in process mining - literature review and taxonomy. Granul. Comput. (2020)

    Google Scholar 

  23. Bose, R.P.J.C., Mans, R.S., van der Aalst, W.M.P.: Wanna improve process mining results? In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 127–134 (2013)

    Google Scholar 

  24. Ziolkowski, T., Brandt, L., Koschmider, A.: Elogqp: an event log quality pointer. In: ZEUS 2021. Volume 2839 of CEUR Workshop Proceedings, pp. 42–45. CEUR-WS.org (2021)

    Google Scholar 

  25. Martin, N., Martinez-Millana, A., Valdivieso, B., Fernandez-Llatas, C.: Interactive data cleaning for process mining: a case study of an outpatient clinic’s appointment system, pp. 532–544, September 2019

    Google Scholar 

  26. Tax, N., Sidorova, N., van der Aalst, W.M.P.: Discovering more precise process models from event logs by filtering out chaotic activities. J. Intell. Inf. Syst. 52(1), 107–139 (2018). https://doi.org/10.1007/s10844-018-0507-6

    CrossRef  Google Scholar 

  27. Sun, X., Hou, W., Yu, D., Wang, J., Pan, J.: Filtering out noise logs for process modelling based on event dependency. In: ICWS 2019, pp. 388–392. IEEE (2019)

    Google Scholar 

  28. Böhmer, K., Rinderle-Ma, S.: Mining association rules for anomaly detection in dynamic process runtime behavior and explaining the root cause to users. Inf. Syst. (2019)

    Google Scholar 

  29. Fani Sani, M., van Zelst, S.J., van der Aalst, W.M.P.: Repairing outlier behaviour in event logs. In: Abramowicz, W., Paschke, A. (eds.) BIS 2018. LNBIP, vol. 320, pp. 115–131. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93931-5_9

    CrossRef  Google Scholar 

  30. Chapela-Campa, D., Mucientes, M., Lama, M.: Simplification of complex process models by abstracting infrequent behaviour, pp. 415–430, October 2019

    Google Scholar 

  31. Nolle, T., Seeliger, A., Mühlhäuser, M.: Binet: multivariate business process anomaly detection using deep learning. In: BPM 2018, Proceedings, pp. 271–287 (2018)

    Google Scholar 

  32. Chapela-Campa, D., Mucientes, M., Lama, M.: Discovering infrequent behavioral patterns in process models. In: Carmona, J., Engels, G., Kumar, A. (eds.) BPM 2017. LNCS, vol. 10445, pp. 324–340. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65000-5_19

    CrossRef  Google Scholar 

  33. Mannhardt, F., De Leoni, M., Reijers, H.A., van der Aalst, W.M.P.: Data-driven process discovery - revealing conditional infrequent behavior from event logs. In: CAiSE 2017, Proceedings, pp. 545–560 (2017)

    Google Scholar 

  34. Ghionna, L., Greco, G., Guzzo, A., Pontieri, L.: Outlier detection techniques for process mining applications. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) ISMIS 2008. LNCS (LNAI), vol. 4994, pp. 150–159. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68123-6_17

    CrossRef  Google Scholar 

  35. Nolle, T., Seeliger, A., Mühlhäuser, M.: Unsupervised anomaly detection in noisy business process event logs using denoising autoencoders. In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 442–456. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_28

    CrossRef  Google Scholar 

  36. Cheng, H.J., Kumar, A.: Process mining on noisy logs - can log sanitization help to improve performance? Decis. Support Syst. 79, 138–149 (2015)

    CrossRef  Google Scholar 

  37. Conforti, R., La Rosa, M., ter Hofstede, A.: Timestamp repair for business process event logs. Technical report (2018)

    Google Scholar 

  38. Sadeghianasl, S., ter Hofstede, A.H.M., Wynn, M.T., Suriadi, S.: A contextual approach to detecting synonymous and polluted activity labels in process event logs. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (eds.) OTM 2019. LNCS, vol. 11877, pp. 76–94. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33246-4_5

    CrossRef  Google Scholar 

  39. Nguyen, H.T.C., Comuzzi, M.: Event log reconstruction using autoencoders. In: Liu, X., et al. (eds.) ICSOC 2018. LNCS, vol. 11434, pp. 335–350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17642-6_28

    CrossRef  Google Scholar 

  40. Sarno, R., Sinaga, F., Sungkono, K.: Anomaly detection in business processes using process mining and fuzzy association rule learning. J. Big Data 7 (2020)

    Google Scholar 

  41. Wang, J., Song, S., Lin, X., Zhu, X., Pei, J.: Cleaning structured event logs: a graph repair approach. In: Proceedings - International Conference on Data Engineering 2015, pp. 30–41, May 2015

    Google Scholar 

  42. Sadeghianasl, S., ter Hofstede, A.H.M., Wynn, M.T., Suriadi, S.: A contextual approach to detecting synonymous and polluted activity labels in process event logs. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (eds.) OTM 2019. LNCS, vol. 11877, pp. 76–94. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33246-4_5

    CrossRef  Google Scholar 

  43. Böhmer, K., Rinderle-Ma, S.: Anomaly detection in business process runtime behavior - challenges and limitations. CoRR abs/1705.06659 (2017)

    Google Scholar 

  44. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from incomplete event logs. In: Ciardo, G., Kindler, E. (eds.) PETRI NETS 2014. LNCS, vol. 8489, pp. 91–110. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07734-5_6

    CrossRef  Google Scholar 

  45. van der Aalst, W.: A practitioner’s guide to process mining: limitations of the directly-follows graph. Procedia Comput. Sci. 164, 321–328 (2019). CENTERIS 2019

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agnes Koschmider .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Koschmider, A., Kaczmarek, K., Krause, M., van Zelst, S.J. (2022). Demystifying Noise and Outliers in Event Logs: Review and Future Directions. In: Marrella, A., Weber, B. (eds) Business Process Management Workshops. BPM 2021. Lecture Notes in Business Information Processing, vol 436. Springer, Cham. https://doi.org/10.1007/978-3-030-94343-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-94343-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-94342-4

  • Online ISBN: 978-3-030-94343-1

  • eBook Packages: Computer ScienceComputer Science (R0)