Abstract
Randomization of multi-dimensional data under local differential privacy is a significant and practical application of big data. Because of the dimensionality issues, most existing works suffer from low accuracy when estimating joint probability distributions. In this paper, a set of attributes is divided into smaller clusters where the attributes are associated in terms of their dependencies. A privacy-preserving algorithm is proposed to estimate the dependencies of an attribute without disclosing the private values in the multi-dimensional data. Local differential privacy is guaranteed in the scheme. Using the clusters of attributes, the joint probabilities for multi-dimensional data can be estimated efficiently using two building blocks, called RR-independent and RR-Ind-Joint schemes. The experiments using some open datasets demonstrate that the dependencies of attributes can be estimated accurately and that the proposed algorithm outperforms existing state-of-the-art schemes in cases where the dimensionality is high.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bache, K., Lichman, M.: Adult data set (2013). https://archive.ics.uci.edu/ml/datasets/adult
Chen, R., Xiao, Q., Zhang, Y., Xu, J.: Differentially private high-dimensional data publication via sampling-based inference (2015)
Cramér, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)
Ding, B., Kulkarni, J., Yekhanin, S.: Collecting telemetry data privately. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 3574–3583. Curran Associates, Inc (2017)
Domingo-Ferrer, J., Soria-Comas, J.: Multi-dimensional randomized response. IEEE Trans. Knowl. Data Eng. 34(10), 4933–4946 (2022)
Dwork, C., Roth, A.: The Algorithmic Foundations of Differential Privacy, vol. 9. Now Publishers Inc., Hanover (2014)
Erlingsson, U., Pihur, V., Korolova, A.: Rappor: Randomized Aggregatable Privacy-preserving Ordinal Response. Association for Computing Machinery, New York (2014)
Hofmann, H.: Statlog (german credit data) data set. https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
Jiang, X., Zhou, X., Grossklags, J.: Privacy-preserving high-dimensional data collection with federated generative autoencoder. Proc. Priv. Enhancing Technol 2022(1), 481–500 (2022)
Kikuchi, H.: Castell: scalable joint probability estimation of multi-dimensional data randomized with local differential privacy (2022). https://doi.org/10.48550/arXiv.2212.01627, arXiv:2212.01627
McSherry, F.D.: Privacy Integrated Queries: An Extensible Platform for Privacy-preserving Data Analysis. Association for Computing Machinery, New York (2009)
Meek, C., Thiesson, B., Heckerman, D.: The learning curve method applied to clustering. In: Richardson, T.S., Jaakkola, T.S. (eds.) Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. R3, pp. 196–202. PMLR (2001)
Qardaji, W., Yang, W., Li, N.: Priview: Practical Differentially Private Release of Marginal Contingency Tables. Association for Computing Machinery, New York (2014)
Rajkovic, V., et al.: Nursery data set. https://archive.ics.uci.edu/ml/datasets/Nursery
Ren, X., et al.: LoPub: high-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forensics Secur. 13(9), 2151–2166 (2018)
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 4, 354–356 (1969)
Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: International Conference on Learning Representations (2018)
Wang, T., Blocki, J., Li, N., Jha, S.: Locally differentially private protocols for frequency estimation. In: 26th USENIX Security Symposium (USENIX Security 17), pp. 729–745. Vancouver, BC (2017)
Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–69 (1965)
Xu, C., Ren, J., Zhang, Y., Qin, Z., Ren, K.: DPPro: differentially private high-dimensional data release via random projection. IEEE Trans. Inf. Forensics Secur. 12(12), 3081–3093 (2017)
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. ACM Trans. Database Syst. (TODS) 42(4), 1–41 (2017)
Zhang, Z., Wang, T., Li, N., He, S., Chen, J.: CALM: Consistent Adaptive Local Marginal for Marginal Release Under Local Differential Privacy. Association for Computing Machinery, New York (2018)
Acknowledgment
Part of this work was supported by JSPS KAKENHI Grant Number JP18H04099, 23K11110 and JST, CREST Grant Number JPMJCR21M1, Japan. Author thanks prof. Josep Domingo-Ferrer and Dr. Jordi Soria-Comas for discussion and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Kikuchi, H. (2024). Privacy-Preserving Clustering for Multi-dimensional Data Randomization Under LDP. In: Meyer, N., Grocholewska-Czuryło, A. (eds) ICT Systems Security and Privacy Protection. SEC 2023. IFIP Advances in Information and Communication Technology, vol 679. Springer, Cham. https://doi.org/10.1007/978-3-031-56326-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-56326-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56325-6
Online ISBN: 978-3-031-56326-3
eBook Packages: Computer ScienceComputer Science (R0)