Skip to main content

A Distance Measure for Privacy-Preserving Process Mining Based on Feature Learning

  • 474 Accesses

Part of the Lecture Notes in Business Information Processing book series (LNBIP,volume 436)

Abstract

To enable process analysis based on an event log without compromising the privacy of individuals involved in process execution, a log may be anonymized. Such anonymization strives to transform a log so that it satisfies provable privacy guarantees, while largely maintaining its utility for process analysis. Existing techniques perform anonymization using simple, syntactic measures to identify suitable transformation operations. This way, the semantics of the activities referenced by the events in a trace are neglected, potentially leading to transformations in which events of unrelated activities are merged. To avoid this and incorporate the semantics of activities during anonymization, we propose to instead incorporate a distance measure based on feature learning. Specifically, we show how embeddings of events enable the definition of a distance measure for traces to guide event log anonymization. Our experiments with real-world data indicate that anonymization using this measure, compared to a syntactic one, yields logs that are closer to the original log in various dimensions and, hence, have higher utility for process analysis.

Keywords

  • Privacy
  • Anonymization
  • Trace distance
  • Feature learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-94343-1_6
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-94343-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

Notes

  1. 1.

    https://github.com/roeselfa/FeatureLearningBasedDistanceMetrics.

References

  1. BPI challenge 2020: Prepaid travel costs. https://data.4tu.nl/articles/dataset/BPI_Challenge_2020_Prepaid_Travel_Costs/12696722. Accessed 12 May 2020

  2. Receipt phase of an environmental permit application process (‘wabo’), coselog project. https://data.4tu.nl/collections/Environmental_permit_application_process_WABO_CoSeLoG_project/5065529. Accessed 11 May 2020

  3. Sepsis cases - event log. https://data.4tu.nl/articles/dataset/Sepsis_Cases_-_Event_Log/12707639. Accessed 03 Apr 2020

  4. Batista, E., Solanas, A.: A uniformization-based approach to preserve individuals’ privacy during process mining analyses. Peer Peer Netw. Appl. 14, 1–20 (2021). https://doi.org/10.1007/s12083-020-01059-1

    CrossRef  Google Scholar 

  5. Bauer, M., Fahrenkrog-Petersen, S.A., Koschmider, A., Mannhardt, F., van der Aa, H., Weidlich, M.: ELPaaS: event log privacy as a service. In: BPM Demos, pp. 159–163 (2019)

    Google Scholar 

  6. De Koninck, P., vanden Broucke, S., De Weerdt, J.: act2vec, trace2vec, log2vec, and model2vec: representation learning for business processes. In: Weske, M., Montali, M., Weber, I., vom Brocke, J. (eds.) BPM 2018. LNCS, vol. 11080, pp. 305–321. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98648-7_18

    CrossRef  Google Scholar 

  7. Elkoumy, G., Fahrenkrog-Petersen, S.A., Dumas, M., Laud, P., Pankova, A., Weidlich, M.: Secure multi-party computation for inter-organizational process mining. In: Nurcan, S., Reinhartz-Berger, I., Soffer, P., Zdravkovic, J. (eds.) BPMDS/EMMSAD -2020. LNBIP, vol. 387, pp. 166–181. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49418-6_11

    CrossRef  Google Scholar 

  8. Elkoumy, G., Fahrenkrog-Petersen, S.A., Dumas, M., Laud, P., Pankova, A., Weidlich, M.: Shareprom: a tool for privacy-preserving inter-organizational process mining. In: BPM Demos, pp. 72–76 (2020)

    Google Scholar 

  9. Elkoumy, G., et al.: Privacy and confidentiality in process mining-threats and research challenges. arXiv:2106.00388 (2021)

  10. Elkoumy, G., Pankova, A., Dumas, M.: Mine me but don’t single me out: differentially private event logs for process mining. arXiv:2103.11739 (2021)

  11. Fahrenkrog-Petersen, S., van der Aa, H., Weidlich, M.: PRETSA: event log sanitization for privacy-aware process discovery. In: ICPM (2019)

    Google Scholar 

  12. Fahrenkrog-Petersen, S.A.: Providing privacy guarantees in process mining. In: CAiSE (Doctoral Consortium), pp. 23–30 (2019)

    Google Scholar 

  13. Fahrenkrog-Petersen, S.A., van der Aa, H., Weidlich, M.: PRIPEL: privacy-preserving event log publishing including contextual information. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds.) BPM 2020. LNCS, vol. 12168, pp. 111–128. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58666-9_7

    CrossRef  Google Scholar 

  14. Kabierski, M., Fahrenkrog-Petersen, S.A., Weidlich, M.: Privacy-aware process performance indicators: framework and release mechanisms. In: La Rosa, M., Sadiq, S., Teniente, E. (eds.) CAiSE 2021. LNCS, vol. 12751, pp. 19–36. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79382-1_2

    CrossRef  Google Scholar 

  15. Knols, B., van der Werf, J.M.E.M.: Measuring the behavioral quality of log sampling. In: ICPM. pp. 97–104 (2019)

    Google Scholar 

  16. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE. IEEE (2007)

    Google Scholar 

  17. Liu, C., Duan, H., Zeng, Q., Zhou, M., Lu, F., Cheng, J.: Towards comprehensive support for privacy preservation cross-organization business process mining. IEEE Trans. Serv. Comput. 12(4), 639–653 (2016)

    CrossRef  Google Scholar 

  18. Mannhardt, F., Koschmider, A., Baracaldo, N., Weidlich, M., Michael, J.: Privacy-preserving process mining. BISE 61(5), 595–614 (2019). https://doi.org/10.1007/s12599-019-00613-3

    CrossRef  Google Scholar 

  19. Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL, pp. 746–751 (2013)

    Google Scholar 

  20. Pika, A., Wynn, M.T., Budiono, S., Ter Hofstede, A.H., van der Aalst, W., Reijers, H.A.: Privacy-preserving process mining in healthcare. Int. J. Environ. Res. Public Health 17(5), 1612 (2020)

    CrossRef  Google Scholar 

  21. Rafiei, M., van der Aalst, W.M.P.: Mining roles from event logs while preserving privacy. In: Di Francescomarino, C., Dijkman, R., Zdun, U. (eds.) BPM 2019. LNBIP, vol. 362, pp. 676–689. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37453-2_54

    CrossRef  Google Scholar 

  22. Rafiei, M., van der Aalst, W.: Practical aspect of privacy-preserving data publishing in process mining. In: BPM Demos, pp. 92–96 (2020)

    Google Scholar 

  23. Rafiei, M., van der Aalst, W.: Group-based privacy preservation techniques for process mining. arXiv preprint arXiv:2105.11983 (2021)

  24. Rafiei, M., Wagner, M., van der Aalst, W.M.P.: TLKC-privacy model for process mining. In: Dalpiaz, F., Zdravkovic, J., Loucopoulos, P. (eds.) RCIS 2020. LNBIP, vol. 385, pp. 398–416. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50316-1_24

    CrossRef  Google Scholar 

  25. Rozinat, A., Aalst, W.: Conformance checking of processes based on monitoring real behavior. Inf. Syst. 33, 64–95 (2008)

    CrossRef  Google Scholar 

  26. Stefanini, A., Aloini, D., Benevento, E., Dulmin, R., Mininno, V.: Performance analysis in emergency departments: a data-driven approach. Measuring Bus. Excell. (2018)

    Google Scholar 

  27. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzz. Knowl.-Based Syst. 10(05), 557–570 (2002)

    CrossRef  Google Scholar 

  28. Van Der Aalst, W.: Process mining: overview and opportunities. ACM Trans. Manag. Inf. Syst. (TMIS) 3(2), 1–17 (2012)

    CrossRef  Google Scholar 

  29. Nuñez von Voigt, S., et al.: Quantifying the re-identification risk of event logs for process mining. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 252–267. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_16

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan A. Fahrenkog-Petersen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Rösel, F., Fahrenkog-Petersen, S.A., van der Aa, H., Weidlich, M. (2022). A Distance Measure for Privacy-Preserving Process Mining Based on Feature Learning. In: Marrella, A., Weber, B. (eds) Business Process Management Workshops. BPM 2021. Lecture Notes in Business Information Processing, vol 436. Springer, Cham. https://doi.org/10.1007/978-3-030-94343-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-94343-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-94342-4

  • Online ISBN: 978-3-030-94343-1

  • eBook Packages: Computer ScienceComputer Science (R0)