Abstract
De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information. k-anonymization, which processes data so that at least k users have the same records, is one of the representative methods of de-identification. One of the methods of k-anonymization is adding dummy records into the data to protect users who have unique histories. For this method, the cost for k-anonymization is the difference in the number of records between the original data and the processed data, and it can be calculated only after deciding the parameter k and processing data. However, we want to calculate the cost before processing and find the optimal value of k because processing the big data with various k is very costly. In this paper, we propose a new model of transaction data that gives us a probability distribution and an expected value of values in data under the assumption that all values occur independently with uniform probability. Applying our data model, it is possible to evaluate the cost of k-anonymized data even before processing.
Similar content being viewed by others
Notes
Japanese version of de-identified information has slight differences to common anonymized data.
References
Basu A, Monreale A, Trasarti R, Corena JC, Giannotti F, Pedreschi D, Kiyomoto S, Miyake Y, Yanagihara T (2015) A risk model for privacy in trajectory data. J Trust Manag 2:9
Bayardo RJ, Agrawal R (2005) Data privacy through optimal \(k\)-anonymization. ICDE 05:217–228
Duncan G, Elliot M, Salazar J (2011) Statistical confidentially. Springer, New York
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt E, Spicer K, Wolf P (2012) Statistical disclosure control. Wiley, New York
ISO (2018) Privacy enhancing data de-identification terminology and classification of techniques ISO Technical Specification ISO/TS 20889
Ito S, Harada R, Kikuchi H (2020) De-identification for transaction data secure against re-identification risk based on payment records. to be published in JIP
Kikuchi H, Yamaguchi T, Hamada K, Yamaoka Y, Oguri H, Sakuma J (2016) What is the best anonymization method?—a study from the data anonymization competition pwscup 2015. Data Priv Manag Secur Assur (DPM2016) LNCS 9963:230–237
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain \(k\)-anonymity. SIGMOD 05:49–60
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional \(k\)-anonymity. ICDE 06:1–11
Meyerson A, Williams R (2004) On the complexity of optimal \(k\)-anonymity. In: Proceedings of ACM PODS, pp 223–228
Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, United Kingdom, pp 32–34 (Section 2.4.1)
Personal Information Protection Commission Secretariat (2017) Report by the personal information protection commission secretariat: anonymously processed information–towords balanced promotion of personal data utilization and consumer trust
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Computer Science Laboratory, SRI International Technical Report SRI-CSL-98-04
Sweeney L (2006) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570
Torra V (2017) Data privacy: foundations, new developments and the big data challenge. Studies in big data. Springer, Switzerland, p 28
UCI Machine learning repository (2020) Online retail data set [online]. https://archive.ics.uci.edu/ml/datasets/online+retail. [Accessed 3 Dec 2020]
Xiao X, Tao Y (2007) \(m\)-invariance: toward privacy preserving republication of dynamic datasets. Proc SIGMOD 07:689–700
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ito, S., Kikuchi, H. Estimation of cost of k–anonymity in the number of dummy records. J Ambient Intell Human Comput 14, 15885–15894 (2023). https://doi.org/10.1007/s12652-021-03369-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03369-5