Estimation of cost of k–anonymity in the number of dummy records

Ito, Satoshi; Kikuchi, Hiroaki

doi:10.1007/s12652-021-03369-5

Estimation of cost of k–anonymity in the number of dummy records

Original Research
Published: 21 March 2022

Volume 14, pages 15885–15894, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Satoshi Ito¹ &
Hiroaki Kikuchi¹

135 Accesses
2 Citations
Explore all metrics

Abstract

De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information. k-anonymization, which processes data so that at least k users have the same records, is one of the representative methods of de-identification. One of the methods of k-anonymization is adding dummy records into the data to protect users who have unique histories. For this method, the cost for k-anonymization is the difference in the number of records between the original data and the processed data, and it can be calculated only after deciding the parameter k and processing data. However, we want to calculate the cost before processing and find the optimal value of k because processing the big data with various k is very costly. In this paper, we propose a new model of transaction data that gives us a probability distribution and an expected value of values in data under the assumption that all values occur independently with uniform probability. Applying our data model, it is possible to evaluate the cost of k-anonymized data even before processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On anonymizing transactions with sensitive items

Article 03 September 2014

A Hybrid Optimization Approach for Anonymizing Transactional Data

An Efficient k-Anonymization Algorithm with Low Information Loss

Notes

Japanese version of de-identified information has slight differences to common anonymized data.

References

Basu A, Monreale A, Trasarti R, Corena JC, Giannotti F, Pedreschi D, Kiyomoto S, Miyake Y, Yanagihara T (2015) A risk model for privacy in trajectory data. J Trust Manag 2:9
Article Google Scholar
Bayardo RJ, Agrawal R (2005) Data privacy through optimal \(k\)-anonymization. ICDE 05:217–228
Google Scholar
Duncan G, Elliot M, Salazar J (2011) Statistical confidentially. Springer, New York
Book Google Scholar
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt E, Spicer K, Wolf P (2012) Statistical disclosure control. Wiley, New York
Book Google Scholar
ISO (2018) Privacy enhancing data de-identification terminology and classification of techniques ISO Technical Specification ISO/TS 20889
Ito S, Harada R, Kikuchi H (2020) De-identification for transaction data secure against re-identification risk based on payment records. to be published in JIP
Kikuchi H, Yamaguchi T, Hamada K, Yamaoka Y, Oguri H, Sakuma J (2016) What is the best anonymization method?—a study from the data anonymization competition pwscup 2015. Data Priv Manag Secur Assur (DPM2016) LNCS 9963:230–237
Article Google Scholar
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain \(k\)-anonymity. SIGMOD 05:49–60
Article Google Scholar
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional \(k\)-anonymity. ICDE 06:1–11
Google Scholar
Meyerson A, Williams R (2004) On the complexity of optimal \(k\)-anonymity. In: Proceedings of ACM PODS, pp 223–228
Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, United Kingdom, pp 32–34 (Section 2.4.1)
Book Google Scholar
Personal Information Protection Commission Secretariat (2017) Report by the personal information protection commission secretariat: anonymously processed information–towords balanced promotion of personal data utilization and consumer trust
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Computer Science Laboratory, SRI International Technical Report SRI-CSL-98-04
Sweeney L (2006) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570
Article MathSciNet Google Scholar
Torra V (2017) Data privacy: foundations, new developments and the big data challenge. Studies in big data. Springer, Switzerland, p 28
Google Scholar
UCI Machine learning repository (2020) Online retail data set [online]. https://archive.ics.uci.edu/ml/datasets/online+retail. [Accessed 3 Dec 2020]
Xiao X, Tao Y (2007) \(m\)-invariance: toward privacy preserving republication of dynamic datasets. Proc SIGMOD 07:689–700
Google Scholar

Download references

Author information

Authors and Affiliations

Meiji University Graduate School, Tokyo, 164-8525, Japan
Satoshi Ito & Hiroaki Kikuchi

Authors

Satoshi Ito
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Kikuchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satoshi Ito.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ito, S., Kikuchi, H. Estimation of cost of k–anonymity in the number of dummy records. J Ambient Intell Human Comput 14, 15885–15894 (2023). https://doi.org/10.1007/s12652-021-03369-5

Download citation

Received: 31 August 2020
Accepted: 22 June 2021
Published: 21 March 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s12652-021-03369-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of cost of k–anonymity in the number of dummy records

Abstract

Access this article

Similar content being viewed by others

On anonymizing transactions with sensitive items

A Hybrid Optimization Approach for Anonymizing Transactional Data

An Efficient k-Anonymization Algorithm with Low Information Loss

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation of cost of k–anonymity in the number of dummy records

Abstract

Access this article

Similar content being viewed by others

On anonymizing transactions with sensitive items

A Hybrid Optimization Approach for Anonymizing Transactional Data

An Efficient k-Anonymization Algorithm with Low Information Loss

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation