Skip to main content

Privacy-Preserving Clustering for Multi-dimensional Data Randomization Under LDP

  • Conference paper
  • First Online:
ICT Systems Security and Privacy Protection (SEC 2023)

Abstract

Randomization of multi-dimensional data under local differential privacy is a significant and practical application of big data. Because of the dimensionality issues, most existing works suffer from low accuracy when estimating joint probability distributions. In this paper, a set of attributes is divided into smaller clusters where the attributes are associated in terms of their dependencies. A privacy-preserving algorithm is proposed to estimate the dependencies of an attribute without disclosing the private values in the multi-dimensional data. Local differential privacy is guaranteed in the scheme. Using the clusters of attributes, the joint probabilities for multi-dimensional data can be estimated efficiently using two building blocks, called RR-independent and RR-Ind-Joint schemes. The experiments using some open datasets demonstrate that the dependencies of attributes can be estimated accurately and that the proposed algorithm outperforms existing state-of-the-art schemes in cases where the dimensionality is high.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bache, K., Lichman, M.: Adult data set (2013). https://archive.ics.uci.edu/ml/datasets/adult

  2. Chen, R., Xiao, Q., Zhang, Y., Xu, J.: Differentially private high-dimensional data publication via sampling-based inference (2015)

    Google Scholar 

  3. Cramér, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)

    Google Scholar 

  4. Ding, B., Kulkarni, J., Yekhanin, S.: Collecting telemetry data privately. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 3574–3583. Curran Associates, Inc (2017)

    Google Scholar 

  5. Domingo-Ferrer, J., Soria-Comas, J.: Multi-dimensional randomized response. IEEE Trans. Knowl. Data Eng. 34(10), 4933–4946 (2022)

    Article  Google Scholar 

  6. Dwork, C., Roth, A.: The Algorithmic Foundations of Differential Privacy, vol. 9. Now Publishers Inc., Hanover (2014)

    Google Scholar 

  7. Erlingsson, U., Pihur, V., Korolova, A.: Rappor: Randomized Aggregatable Privacy-preserving Ordinal Response. Association for Computing Machinery, New York (2014)

    Book  Google Scholar 

  8. Hofmann, H.: Statlog (german credit data) data set. https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

  9. Jiang, X., Zhou, X., Grossklags, J.: Privacy-preserving high-dimensional data collection with federated generative autoencoder. Proc. Priv. Enhancing Technol 2022(1), 481–500 (2022)

    Article  Google Scholar 

  10. Kikuchi, H.: Castell: scalable joint probability estimation of multi-dimensional data randomized with local differential privacy (2022). https://doi.org/10.48550/arXiv.2212.01627, arXiv:2212.01627

  11. McSherry, F.D.: Privacy Integrated Queries: An Extensible Platform for Privacy-preserving Data Analysis. Association for Computing Machinery, New York (2009)

    Book  Google Scholar 

  12. Meek, C., Thiesson, B., Heckerman, D.: The learning curve method applied to clustering. In: Richardson, T.S., Jaakkola, T.S. (eds.) Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. R3, pp. 196–202. PMLR (2001)

    Google Scholar 

  13. Qardaji, W., Yang, W., Li, N.: Priview: Practical Differentially Private Release of Marginal Contingency Tables. Association for Computing Machinery, New York (2014)

    Book  Google Scholar 

  14. Rajkovic, V., et al.: Nursery data set. https://archive.ics.uci.edu/ml/datasets/Nursery

  15. Ren, X., et al.: LoPub: high-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forensics Secur. 13(9), 2151–2166 (2018)

    Article  Google Scholar 

  16. Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 4, 354–356 (1969)

    Article  MathSciNet  Google Scholar 

  17. Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: International Conference on Learning Representations (2018)

    Google Scholar 

  18. Wang, T., Blocki, J., Li, N., Jha, S.: Locally differentially private protocols for frequency estimation. In: 26th USENIX Security Symposium (USENIX Security 17), pp. 729–745. Vancouver, BC (2017)

    Google Scholar 

  19. Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–69 (1965)

    Article  Google Scholar 

  20. Xu, C., Ren, J., Zhang, Y., Qin, Z., Ren, K.: DPPro: differentially private high-dimensional data release via random projection. IEEE Trans. Inf. Forensics Secur. 12(12), 3081–3093 (2017)

    Article  Google Scholar 

  21. Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. ACM Trans. Database Syst. (TODS) 42(4), 1–41 (2017)

    Article  MathSciNet  Google Scholar 

  22. Zhang, Z., Wang, T., Li, N., He, S., Chen, J.: CALM: Consistent Adaptive Local Marginal for Marginal Release Under Local Differential Privacy. Association for Computing Machinery, New York (2018)

    Google Scholar 

Download references

Acknowledgment

Part of this work was supported by JSPS KAKENHI Grant Number JP18H04099, 23K11110 and JST, CREST Grant Number JPMJCR21M1, Japan. Author thanks prof. Josep Domingo-Ferrer and Dr. Jordi Soria-Comas for discussion and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroaki Kikuchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kikuchi, H. (2024). Privacy-Preserving Clustering for Multi-dimensional Data Randomization Under LDP. In: Meyer, N., Grocholewska-Czuryło, A. (eds) ICT Systems Security and Privacy Protection. SEC 2023. IFIP Advances in Information and Communication Technology, vol 679. Springer, Cham. https://doi.org/10.1007/978-3-031-56326-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56326-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56325-6

  • Online ISBN: 978-3-031-56326-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics