Efficient Privacy Preserving Distributed K-Means for Non-IID Data

Brandão, André; Mendes, Ricardo; Vilela, João P.

doi:10.1007/978-3-030-74251-5_35

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12695))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

931 Accesses
2 Citations

Abstract

Privacy is becoming a crucial requirement in many machine learning systems. In this paper we introduce an efficient and secure distributed K-Means algorithm, that is robust to non-IID data. The base idea of our proposal consists in each client computing the K-Means algorithm locally, with a variable number of clusters. The server will use the resultant centroids to apply the K-Means algorithm again, discovering the global centroids. To maintain the client’s privacy, homomorphic encryption and secure aggregation is used in the process of learning the global centroids. This algorithm is efficient and reduces transmission costs, since only the local centroids are used to find the global centroids. In our experimental evaluation, we demonstrate that our strategy achieves a similar performance to the centralized version even in cases where the data follows an extreme non-IID form.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/deric/clustering-benchmark.
2.
We assume a constant time complexity for multiplication between the encrypted centroids and the plaintext global centroids, according to [20].

References

Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191 (2017)
Google Scholar
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013)
Article Google Scholar
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 88–93. IEEE (2013)
Google Scholar
Farrand, T., Mireshghallah, F., Singh, S., Trask, A.: Neither private nor fair: impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, pp. 15–19 (2020)
Google Scholar
Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_1
Chapter Google Scholar
Hu, X., et al.: Privacy-preserving K-means clustering upon negative databases. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11304, pp. 191–204. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04212-7_17
Chapter Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Article Google Scholar
Jahangiri, A., Rakha, H.A.: Applying machine learning techniques to transportation mode recognition using mobile phone sensor data. IEEE Trans. Intell. Transp. Syst. 16(5), 2406–2417 (2015)
Article Google Scholar
Januzaj, E., Kriegel, H.P., Pfeifle, M.: Towards effective and efficient distributed clustering. In: Workshop on Clustering Large Data Sets (ICDM2003), Vol. 60 (2003)
Google Scholar
Jiang, Z.L., et al.: Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing. Inf. Sci. 518, 168–180 (2020)
Article MathSciNet Google Scholar
Liu, B., et al.: Follow my recommendations: a personalized privacy assistant for mobile app permissions. In: Twelfth Symposium on Usable Privacy and Security (SOUPS 2016), pp. 27–41 (2016)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Lu, Z., Shen, H.: A convergent differentially private k-means clustering algorithm. In: Yang, Q., Zhou, Z.-H., Gong, Z., Zhang, M.-L., Huang, S.-J. (eds.) PAKDD 2019. LNCS (LNAI), vol. 11439, pp. 612–624. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16148-4_47
Chapter Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 54, pp. 1273–1282. PMLR (2017)
Google Scholar
Navidi, W., Murphy Jr., W.S., Hereman, W.: Statistical methods in surveying by trilateration. Comput. Stat. Data Anal. 27(2), 209–227 (1998)
Article Google Scholar
Palacio-Niño, J., Berzal, F.: Evaluation metrics for unsupervised learning algorithms. arXiv preprint arXiv:1905.05667 (2019)
Sarker, I.H., Hoque, M.M., Uddin, M.K., Alsanoosy, T.: Mobile data science and intelligent apps: concepts, AI-based modeling and research directions. Mob. Netw. Appl. 1–19 (2020). https://doi.org/10.1007/s11036-020-01650-z
Schellekens, V., Chatalic, A., Houssiau, F., De Montjoye, Y.A., Jacques, L., Gribonval, R.: Differentially private compressive k-means. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7933–7937. IEEE (2019)
Google Scholar
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. p. 1177–1178. WWW 2010, Association for Computing Machinery (2010)
Google Scholar
Microsoft SEAL (release 3.5), Microsoft Research, Redmond, WA (2020)
Google Scholar
Soliman, A., Girdzijauskas, S., Bouguelia, M.-R., Pashami, S., Nowaczyk, S.: Decentralized and adaptive K-means clustering for non-IID data using hyperLogLog counters. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 343–355. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_27
Chapter Google Scholar
Su, D., Cao, J., Li, N., Bertino, E., Jin, H.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 26–37. ACM (2016)
Google Scholar
Thiagarajan, A., et al.: Vtrack: accurate, energy-aware road traffic delay estimation using mobile phones. In: Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, SenSys 2009, pp. 85–98. Association for Computing Machinery (2009)
Google Scholar
Triebe, O.J., Rajagopal, R.: Federated K-Means: clustering algorithm and proof of concept (2020)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. KDD 2003 (2003)
Google Scholar
Xing, K., Hu, C., Yu, J., Cheng, X., Zhang, F.: Mutual privacy preserving \( k \)-means clustering in social participatory sensing. IEEE Trans. Ind. Inform. 13(4), 2066–2076 (2017)
Article Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Article Google Scholar
Yin, H., Zhang, J., Xiong, Y., Huang, X., Deng, T.: PPK-means: achieving privacy-preserving clustering over encrypted multi-dimensional cloud data. Electronics 7(11), 310 (2018)
Article Google Scholar
Yuan, C., Yang, H.: Research on k-value selection method of k-means clustering algorithm. J. Multi. Sci. J. 2(2), 226–235 (2019)
Google Scholar
Yuan, J., Tian, Y.: Practical privacy-preserving mapreduce based k-means clustering over large-scale dataset. IEEE Trans. Cloud Comput. 7(2), 568–579 (2019)
Article Google Scholar
Zhang, W., Li, C., Peng, G., Chen, Y., Zhang, Z.: A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 100, 439–453 (2018)
Article Google Scholar

Download references

Acknowledgements

The work presented in this paper was carried out in the scope of project COP-MODE, that has received funding from the European Union’s Horizon 2020 research and innovation programme under the NGI_TRUST grant agreement no 825618, and the project AIDA: Adaptive, Intelligent and Distributed Assurance Platform (POCI-01-0247-FEDER-045907), co-financed by the European Regional Development Fund through the COMPETE2020 program and by the Portuguese Foundation for Science and Technology (FCT) under the CMU Portugal Program. Ricardo Mendes wishes to acknowledge the Portuguese funding institution FCT - Foundation for Science and Technology for supporting his research under the Ph.D. grant SFRH/BD/128599/2017.

Author information

Authors and Affiliations

CRACS/INESCTEC, CISUC and Department of Computer Science, Faculty of Sciences, University of Porto, Porto, Portugal
André Brandão & João P. Vilela
CISUC, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
Ricardo Mendes

Authors

André Brandão
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Mendes
View author publications
You can also search for this author in PubMed Google Scholar
João P. Vilela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André Brandão .

Editor information

Editors and Affiliations

University of Coimbra, Coimbra, Portugal
Pedro Henriques Abreu
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
University of Granada, Granada, Spain
Alberto Fernández
University of Porto, Porto, Portugal
João Gama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brandão, A., Mendes, R., Vilela, J.P. (2021). Efficient Privacy Preserving Distributed K-Means for Non-IID Data. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-74251-5_35
Published: 13 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74250-8
Online ISBN: 978-3-030-74251-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics