Skip to main content
Log in

Estimation of cost of k–anonymity in the number of dummy records

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information. k-anonymization, which processes data so that at least k users have the same records, is one of the representative methods of de-identification. One of the methods of k-anonymization is adding dummy records into the data to protect users who have unique histories. For this method, the cost for k-anonymization is the difference in the number of records between the original data and the processed data, and it can be calculated only after deciding the parameter k and processing data. However, we want to calculate the cost before processing and find the optimal value of k because processing the big data with various k is very costly. In this paper, we propose a new model of transaction data that gives us a probability distribution and an expected value of values in data under the assumption that all values occur independently with uniform probability. Applying our data model, it is possible to evaluate the cost of k-anonymized data even before processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Japanese version of de-identified information has slight differences to common anonymized data.

References

  • Basu A, Monreale A, Trasarti R, Corena JC, Giannotti F, Pedreschi D, Kiyomoto S, Miyake Y, Yanagihara T (2015) A risk model for privacy in trajectory data. J Trust Manag 2:9

    Article  Google Scholar 

  • Bayardo RJ, Agrawal R (2005) Data privacy through optimal \(k\)-anonymization. ICDE 05:217–228

    Google Scholar 

  • Duncan G, Elliot M, Salazar J (2011) Statistical confidentially. Springer, New York

    Book  Google Scholar 

  • Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt E, Spicer K, Wolf P (2012) Statistical disclosure control. Wiley, New York

    Book  Google Scholar 

  • ISO (2018) Privacy enhancing data de-identification terminology and classification of techniques ISO Technical Specification ISO/TS 20889

  • Ito S, Harada R, Kikuchi H (2020) De-identification for transaction data secure against re-identification risk based on payment records. to be published in JIP

  • Kikuchi H, Yamaguchi T, Hamada K, Yamaoka Y, Oguri H, Sakuma J (2016) What is the best anonymization method?—a study from the data anonymization competition pwscup 2015. Data Priv Manag Secur Assur (DPM2016) LNCS 9963:230–237

    Article  Google Scholar 

  • LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain \(k\)-anonymity. SIGMOD 05:49–60

    Article  Google Scholar 

  • LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional \(k\)-anonymity. ICDE 06:1–11

    Google Scholar 

  • Meyerson A, Williams R (2004) On the complexity of optimal \(k\)-anonymity. In: Proceedings of ACM PODS, pp 223–228

  • Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, United Kingdom, pp 32–34 (Section 2.4.1)

    Book  Google Scholar 

  • Personal Information Protection Commission Secretariat (2017) Report by the personal information protection commission secretariat: anonymously processed information–towords balanced promotion of personal data utilization and consumer trust

  • Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Computer Science Laboratory, SRI International Technical Report SRI-CSL-98-04

  • Sweeney L (2006) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570

    Article  MathSciNet  Google Scholar 

  • Torra V (2017) Data privacy: foundations, new developments and the big data challenge. Studies in big data. Springer, Switzerland, p 28

    Google Scholar 

  • UCI Machine learning repository (2020) Online retail data set [online]. https://archive.ics.uci.edu/ml/datasets/online+retail. [Accessed 3 Dec 2020]

  • Xiao X, Tao Y (2007) \(m\)-invariance: toward privacy preserving republication of dynamic datasets. Proc SIGMOD 07:689–700

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Satoshi Ito.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ito, S., Kikuchi, H. Estimation of cost of k–anonymity in the number of dummy records. J Ambient Intell Human Comput 14, 15885–15894 (2023). https://doi.org/10.1007/s12652-021-03369-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03369-5

Keywords

Navigation