Abstract
Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques for releasing trace variants still do not fulfill all the requirements of industry-scale usage. Moreover, providing privacy guarantees when there exists a high rate of infrequent trace variants is still a challenge. In this paper, we introduce TraVaG as a new approach for releasing differentially private trace variants based on Generative Adversarial Networks (GANs) that provides industry-scale benefits and enhances the level of privacy guarantees when there exists a high ratio of infrequent variants. Moreover, TraVaG overcomes shortcomings of conventional privacy preservation techniques such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data show that our approach outperforms state-of-the-art techniques in terms of privacy guarantees, plain data utility preservation, and result utility preservation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Note that also other clipping strategies exist, as highlighted in [22].
- 3.
Note that in [25], TraVaS was already compared with SaCoFa [11] and benchmark [21] and showed better performance. Here, the benchmark method is included for easier comparison. Moreover, Libra [8] does not take \(\epsilon \) as an input parameter but computes it based on \(\alpha \) as an RDP parameter and its sampling strategy. This makes the comparison based on exact \(\epsilon \) and \(\delta \) parameters very difficult. Nevertheless, an important observation in contrast to TraVaG is that Libra returns an empty log for event logs with many infrequent variants, such as Sepsis when \(\delta \le 10^{-3}\).
- 4.
- 5.
References
van der Aalst, W.M.P.: Process Mining - Data Science in Action, 2nd edn. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016, pp. 308–318. ACM (2016)
Ács, G., Melis, L., Castelluccia, C., Cristofaro, E.D.: Differentially private mixture of generative neural networks. IEEE Trans. Knowl. Data Eng. 31(6), 1109–1121 (2019)
Chen, Q., et al.: Differentially private data generative models. CoRR abs/1812.02274 (2018)
Cohen, A., Nissim, K.: Towards formalizing the GDPR’s notion of singling out. Proc. Natl. Acad. Sci. USA 117(15), 8344–8352 (2020)
van Dongen, B.F., Weber, B., Ferreira, D.R., Weerdt, J.D.: BPI challenge 2013. In: Proceedings of the 3rd Business Process Intelligence Challenge (2013)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Elkoumy, G., Dumas, M.: Libra: high-utility anonymization of event logs for process mining via subsampling. In: 4th International Conference on Process Mining, ICPM. IEEE (2022)
Elkoumy, G., Pankova, A., Dumas, M.: Mine me but don’t single me out: differentially private event logs for process mining. In: 3rd International Conference on Process Mining, ICPM 2021, pp. 80–87. IEEE (2021)
EU: EU General Data Protection. OJ L 119(1) (2016)
Fahrenkrog-Petersen, S.A., Kabierski, M., Rösel, F., van der Aa, H., Weidlich, M.: Sacofa: semantics-aware control-flow anonymization for process mining. In: 3rd International Conference on Process Mining, ICPM 2021, Eindhoven, The Netherlands, 31 October–4 November 2021, pp. 72–79. IEEE (2021)
Frigerio, L., de Oliveira, A.S., Gomez, L., Duverger, P.: Differentially private generative adversarial networks for time series, continuous, and discrete open data. In: Dhillon, G., Karlsson, F., Hedström, K., Zúquete, A. (eds.) SEC 2019. IAICT, vol. 562, pp. 151–164. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22312-0_11
Goodfellow, I.J., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems (2017)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, Conference Track Proceedings (2014)
Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from incomplete event logs. In: Ciardo, G., Kindler, E. (eds.) PETRI NETS 2014. LNCS, vol. 8489, pp. 91–110. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07734-5_6
Li, K., Yang, S., Sullivan, T.M., Burd, R.S., Marsic, I.: Generating privacy-preserving process data with deep generative models. CoRR abs/2203.07949 (2022)
Liashchynskyi, P., Liashchynskyi, P.: Grid search, random search, genetic algorithm: a big comparison for NAS. CoRR abs/1912.06059 (2019)
Lu, Y., Chen, Q., Poon, S.K.: A deep learning approach for repairing missing activity labels in event logs for process mining. Information 13(5), 234 (2022)
Mannhardt, F.: Sepsis cases (2016). https://doi.org/10.4121/UUID:915D2BFB-7E84-49AD-A286-DC35F063A460
Mannhardt, F., Koschmider, A., Baracaldo, N., Weidlich, M., Michael, J.: Privacy-preserving process mining - differential privacy for event logs. Bus. Inf. Syst. Eng. 61(5), 595–614 (2019)
McMahan, H.B., Andrew, G.: A general approach to adding differential privacy to iterative training procedures. CoRR abs/1812.06210 (2018)
Mironov, I.: Rényi differential privacy. In: 30th IEEE Computer Security Foundations Symposium, CSF 2017, pp. 263–275. IEEE Computer Society (2017)
Rafiei, M., van der Aalst, W.M.P.: Towards quantifying privacy in process mining. In: Leemans, S., Leopold, H. (eds.) ICPM 2020. LNBIP, vol. 406, pp. 385–397. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72693-5_29
Rafiei, M., Wangelik, F., van der Aalst, W.M.P.: TraVaS: differentially private trace variant selection for process mining. In: Montali, M., Senderovich, A., Weidlich, M. (eds.) ICPM 2022. LNBIP, vol. 468. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-27815-0_9
Tang, J., Korolova, A., Bai, X., Wang, X., Wang, X.: Privacy loss in apple’s implementation of differential privacy on macos 10.12. CoRR abs/1709.02753 (2017)
Tantipongpipat, U.T., Waites, C., Boob, D., Siva, A.A., Cummings, R.: Differentially private synthetic mixed-type data generation for unsupervised learning. Intell. Decis. Technol. 15(4), 779–807 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rafiei, M., Wangelik, F., Pourbafrani, M., van der Aalst, W.M.P. (2023). TraVaG: Differentially Private Trace Variant Generation Using GANs. In: Nurcan, S., Opdahl, A.L., Mouratidis, H., Tsohou, A. (eds) Research Challenges in Information Science: Information Science and the Connected World. RCIS 2023. Lecture Notes in Business Information Processing, vol 476. Springer, Cham. https://doi.org/10.1007/978-3-031-33080-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-33080-3_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33079-7
Online ISBN: 978-3-031-33080-3
eBook Packages: Computer ScienceComputer Science (R0)