Skip to main content

(\(k,\varepsilon ,\delta \))-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy

Abstract

The General Data Protection Regulation came into effect on May 25, 2018, and has rapidly become a touchstone model for modern privacy law. It empowers consumers with unprecedented control over the use of their personal information. However, new guarantees of consumer privacy adversely affect data sharing and data application markets because service companies (e.g., Apple, Google, Microsoft) cannot provide immediate and optimized services through analysis of collected consumer experiences. Therefore, data de-identification technology (e.g., k-anonymity and differential privacy) is a candidate solution to protect sharing data privacy. Various workarounds based on existing methods such as k-anonymity and differential privacy technologies have been proposed. However, they are limited in data utility, and their data sets have high dimensionality (the so-called curse of dimensionality). In this paper, we propose the (\(k,\varepsilon ,\delta \))-anonymization synthetic data set generation mechanism (called (\(k,\varepsilon ,\delta \))-anonymization for short) to protect data privacy before releasing data sets to be analyzed. Synthetic data sets generated by (\(k,\varepsilon ,\delta \))-anonymization satisfy the definitions of k-anonymity and differential privacy by applying KD-tree and random sampling mechanisms. Moreover, (\(k,\varepsilon ,\delta \))-anonymization uses principle component analysis to rationally replace high-dimensional data sets with lower-dimensional data sets for consideration of efficient computation. Finally, we confirm the relationships between parameters k, \(\varepsilon \), and \(\delta \) for k-anonymity and (\(\varepsilon ,\delta \))-differential privacy and estimate the utility of (\(k,\varepsilon ,\delta \))-anonymization synthetic data sets. We report a privacy analysis and a series of experiments that prove that (\(k,\varepsilon ,\delta \))-anonymization is feasible and efficient.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. 1.

    Bache K, Lichman M (2018) UCI machine learning repository. Accessed: Apr. [Online]. Available: https://archive.ics.uci.edu/ml/datasets.html/

  2. 2.

    European Union (2016) New Regulation of The European Union on The Protection of Personal Data (from 2018). [Online]. Available: https://data.europa.eu/eli/reg/2016/679/oj

  3. 3.

    FTC Report (2018) “Protecting Consumer Privacy in An Era of Rapid Change.” Accessed Apr 2018. [Online]. Available: https://www.ftc.gov/sites/default/files/documents/reports/

  4. 4.

    Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: The sulq framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 128–138

  5. 5.

    Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: the ACM symposium on theory of computing, pp 609–618

  6. 6.

    Korolova A, Kenthapadi K, Mishra N, Ntoulas A (2009) Releasing search queries and clicks privately. In: Proceedings of International World Wide Web Conference, pp 171–180

  7. 7.

    Machanavajjhala A, Geheke J, Kifer D, Venkitasubramaniam M (2007) \(l\)-diversity: Privacy beyond \(k\)-anonymity. ACM Transa Knowl Discovery Data (TKDD) 1(3):1–47

    Google Scholar 

  8. 8.

    Apple, (2017) Learning with privacy at scale. Mach Learn J 1(8):1–25

  9. 9.

    Machanavajjhala A, Kifer D, Abowd JM, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the Map. In: Proceedings of IEEE international conference on data engineering, pp 277–286

  10. 10.

    Dwork C (2006) Differential privacy. In: Proceeding of the 33rd International Colloquium on Automata, Languages and Programming (ICALP), pp 1–12

  11. 11.

    Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: 3rd Theory of Cryptography Conference, pp 265–284

  12. 12.

    Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9(3):211–407

    MathSciNet  MATH  Google Scholar 

  13. 13.

    Xu C, Ren J, Zhang Y, Qin Z, Ren K (2017) DPPro: differentially private high-dimensional data release via random projection. IEEE Trans Inf Forensics Secur 12(12):3081–3093

    Article  Google Scholar 

  14. 14.

    Kifer D, Lin B-R (2010) Towards an axiomatization of statistical privacy and utility. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems of Data, pp 147–158

  15. 15.

    Kifer D, Machanavajjhala A (2011) No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 193–204

  16. 16.

    Josep D-F, Jordi S-C (2018) Connecting randomized response, post-randomization, differential privacy and \(t\)-closeness via deniability and permutation. arXiv:1803.02139v1 [cs.CR], pp 1–5

  17. 17.

    Zhao D, Chen H, Zhao S, Zhang X, Li C, Liu R (2019) Local differential privacy with \(k\)-anonymous for frequency estimation. In: Proceedings of IEEE international conference on Big Data (Big Data). https://doi.org/10.1109/BigData47090.2019.9006022

  18. 18.

    Health Records (2018) Accessed Dec 2018. [Online]. Available: https://github.com/m0607077/RoD

  19. 19.

    Wang J, Cai Z, Li Y, Yang D, Li J, Gao H (2018) Protecting query privacy with differentially private \(k\)-anonymity in location-based services. J Person Ubiquitous Comput 22:453–469

    Article  Google Scholar 

  20. 20.

    Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: The ACM symposium on theory of computing, pp 75–84

  21. 21.

    Chaudhuri K, Mishra N (2006) When random sampling preserves privacy. In: CRYPTO, pp 198–213

  22. 22.

    Sweeney L (2002) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Syst 10(5):557–570

    MathSciNet  Article  Google Scholar 

  23. 23.

    Gotz M, Machanavajjhala A, Wang G, Xiao X, Gehrke J (2011) Publishing Search Logs¡XA Comparative Study of Privacy Guarantees. IEEE Trans Knowl Data Eng 24(3):520–532

    Article  Google Scholar 

  24. 24.

    Holohan N, Antonatos S, Braghin S, Aonghusa PM (2017) (k,\(\varepsilon \))-Anonymity: \(k\)-Anonymity with \(\varepsilon \)-Differential Privacy. arXiv:1710.01615v1 [cs.CR], pp 1–12

  25. 25.

    Li N, Li T (2007) \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: Proceedings of the 23nd international conference on data engineering, pp 106–115

  26. 26.

    Li N, Qardaji W, Su D (2012) On sampling, anonymization, and differential privacy or, K-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pp 32–42

  27. 27.

    Li N, Lyu M, Su D, Yang W (2016) Differential privacy: from theory to practice. Synthesis Lect Inform Secur Privacy Trust 8(4):1–138

    Article  Google Scholar 

  28. 28.

    Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the 17th ACMSIGACT-SIGMOD-SIGART symposium on principles of database systems, p 188

  29. 29.

    Fletcher S, Islam MZ (2017) Differentially private random decision Forests using smooth sensitivity. In: Expert systems with applications, pp 16–31, at arXiv:1606.03572. https://doi.org/10.1016/j.eswa.2017.01.034

Download references

Acknowledgements

This work is supported by the Ministry of Science and Technology, Taiwan, under grant MOST 107-2221-E-035-020-MY3 and MOST 109-2221-E-001-019-MY3. This work is supported by Academia Sinica AS-KPQ-109-DSTCP. This research work is supported by the Research Council (TRC), Sultanate of Oman (Block Fund-Research Grant).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yao-Tung Tsou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tsou, YT., Alraja, M.N., Chen, LS. et al. (\(k,\varepsilon ,\delta \))-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy. SOCA 15, 175–185 (2021). https://doi.org/10.1007/s11761-021-00324-2

Download citation

Keywords

  • Differential privacy
  • k-anonymity
  • Data privacy
  • Synthetic data set