Privacy-preserving k-means clustering with local synchronization in peer-to-peer networks

Abstract

k-means clustering, which partitions data records into different clusters such that the records in the same cluster are close to each other, has many important applications such as image segmentation and genes detection. While the k-means clustering has been well-studied by a significant amount of works, most of the existing schemes are not designed for peer-to-peer (P2P) networks. P2P networks impose several efficiency and security challenges for performing clustering over distributed data. In this paper, we propose a novel privacy-preserving k-means clustering scheme over distributed data in P2P networks, which achieves local synchronization and privacy protection. Specifically, we design a secure aggregation protocol and a secure division protocol based on homomorphic encryption to securely compute clusters without revealing the privacy of individual peer. Moreover, we propose a novel massage encoding method to improve the performance of our aggregation protocol. We formally prove that the proposed scheme is secure under the semi-honest model and demonstrate the performance of our proposed scheme.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    http://archive.ics.uci.edu/ml

  2. 2.

    http://python-paillier.readthedocs.io/en/stable/index.html.

References

  1. 1.

    Ang HH, Gopalkrishnan V, Hoi SC, Ng WK (2008) Cascade rsvm in peer-to-peer networks. In: Joint European conference on machine learning and knowledge discovery in databases, pp 55–70. Springer

  2. 2.

    Bandyopadhyay S, Giannella C, Maulik U, Kargupta H, Liu K, Datta S (2006) Clustering distributed data streams in peer-to-peer environments. Inform Sci 176(14):1952–1985

    Article  Google Scholar 

  3. 3.

    Bhuyan HK, Kamila NK (2015) Privacy preserving sub-feature selection in distributed data mining. Appl Soft Comput 36:552–569

    Article  Google Scholar 

  4. 4.

    Chien Y (1974) Pattern classification and scene analysis. IEEE Trans Autom Control 19(4):462–463

    Article  Google Scholar 

  5. 5.

    Das K, Bhaduri K, Kargupta H (2010) A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks. Knowl Inf Syst 24(3):341–367

    Article  Google Scholar 

  6. 6.

    Datta S, Bhaduri K, Giannella C, Wolff R, Kargupta H (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput 10(4):18–26

    Article  Google Scholar 

  7. 7.

    Datta S, Giannella C, Kargupta H (2008) Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans Knowl Data Eng 21(10):1372–1388

    Article  Google Scholar 

  8. 8.

    Doganay MC, Pedersen TB, Saygin Y, Savaṡ E, Levi A (2008) Distributed privacy preserving k-means clustering with additive secret sharing. In: Proceedings of the 2008 international workshop on privacy and anonymity in information society, pp 3–11. ACM

  9. 9.

    Gligorijević V, Pržulj N (2015) Methods for biological data integration: perspectives and challenges. J R Soc Interface 12(112):20150,571

    Article  Google Scholar 

  10. 10.

    Goldreich O (2004) Foundations of cryptography: Volume II, Basic Applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  11. 11.

    Hao M, Li H, Luo X, Xu G, Yang H, Liu S (2019) Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Transactions on Industrial Informatics pp 1–1. https://doi.org/10.1109/TII.2019.2945367

  12. 12.

    Huang Y, Evans D, Katz J, Malka L (2011) Faster secure two-party computation using garbled circuits. In: USENIX security symposium, vol 201, pp 331–335

  13. 13.

    Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining, pp 593–599. ACM

  14. 14.

    Jha S, Kruger L, McDaniel P (2005) Privacy preserving clustering. In: European symposium on research in computer security, pp 397–417. Springer

  15. 15.

    Jia Q, Guo L, Jin Z, Fang Y (2018) Preserving model privacy for machine learning in distributed systems. IEEE Trans Parallel and Distrib Syst 29(8):1808–1822

    Article  Google Scholar 

  16. 16.

    Jiang W, Li H, Xu G, Wen M, Dong G, Lin X (2019) Ptas: Privacy-preserving thin-client authentication scheme in blockchain-based pki. Futur Gener Comput Syst 96:185– 195

    Article  Google Scholar 

  17. 17.

    Khan U, Schmidt-Thieme L, Nanopoulos A (2017) Collaborative svm classification in scale-free peer-to-peer networks. Expert Syst Appl 69:74–86

    Article  Google Scholar 

  18. 18.

    Koskela T, Kassinen O, Harjula E, Ylianttila M (2013) P2p group management systems: A conceptual analysis. ACM Comput Surv (CSUR) 45(2):20

    Article  Google Scholar 

  19. 19.

    Levitin A (2012) Introduction to the design & analysis of algorithms. Pearson Education

  20. 20.

    Li H, Liu D, Dai Y, Luan TH, Yu S (2018) Personalized search over encrypted data with efficient and secure updates in mobile clouds. IEEE Trans Emerg Topics Comput 6(1):97–109

    Article  Google Scholar 

  21. 21.

    Li H, Yang Y, Dai Y, Yu S, Xiang Y (2017) Achieving secure and efficient dynamic searchable symmetric encryption over medical cloud data. IEEE Transactions on Cloud Computing pp 1–1. https://doi.org/10.1109/TCC.2017.2769645

  22. 22.

    Li X, Zhu Y, Wang J (2019) Highly efficient privacy preserving location-based services with enhanced one-round blind filter. IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/TETC.2019.2926385

  23. 23.

    Li X, Zhu Y, Wang J, Liu Z, Liu Y, Zhang M (2018) On the soundness and security of privacy-preserving svm for outsourcing data classification. IEEE Trans Depend Secure Comput 15(5):906–912

    Article  Google Scholar 

  24. 24.

    Liu Y, Zhao Q (2019) E-voting scheme using secret sharing and k-anonymity. World Wide Web 22 (4):1657–1667

    Article  Google Scholar 

  25. 25.

    Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137

    MathSciNet  Article  Google Scholar 

  26. 26.

    Luo P, Xiong H, Lü K, Shi Z (2007) Distributed classification in peer-to-peer networks. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 968–976. ACM

  27. 27.

    Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, Van Steen M (2015) Gdcluster: A general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905

    Article  Google Scholar 

  28. 28.

    Mohassel P, Zhang Y (2017) Secureml: A system for scalable privacy-preserving machine learning. In: 2017 IEEE symposium on security and privacy (SP), pp 19–38. IEEE

  29. 29.

    Muller WT, Eisenhardt M, Henrich A (2003) Efficient content-based p2p image retrieval using peer content descriptions. In: Internet Imaging V, vol 5304, pp. 57–68. International Society for Optics and Photonics

  30. 30.

    Ormándi R, Hegedu̇s I, Jelasity M (2013) Gossip learning with linear models on fully distributed data. Concurr Comput Pract Exp 25(4):556–571

    Article  Google Scholar 

  31. 31.

    Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: International conference on the theory and applications of cryptographic techniques, pp. 223–238. Springer

  32. 32.

    Papapetrou O, Siberski W, Siersdorfer S (2015) Efficient model sharing for scalable collaborative classification. Peer-to-Peer Netw Appl 8(3):384–398

    Article  Google Scholar 

  33. 33.

    Ren H, Li H, Dai Y, Yang K, Lin X (2018) Querying in internet of things with privacy preserving: Challenges, solutions and opportunities. IEEE Netw 32(6):144–151

    Article  Google Scholar 

  34. 34.

    Shokri R, Shmatikov V (2015) Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp 1310–1321. ACM

  35. 35.

    Song J, Liu Y, Shao J, Tang C (2019) A dynamic membership data aggregation (dmda) protocol for smart grid. IEEE Systems Journal. https://doi.org/10.1109/JSYST.2019.2912415

  36. 36.

    Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 206–215. ACM

  37. 37.

    Vaidya J, Clifton C (2004) Privacy preserving naive bayes classifier for vertically partitioned data. In: Proceedings of the 2004 SIAM international conference on data mining, pp 522–526. SIAM

  38. 38.

    Vaidya J, Kantarcıoġlu M, Clifton C (2008) Privacy-preserving naive bayes classification. The VLDB J 17(4):879–898

    Article  Google Scholar 

  39. 39.

    Wolff R, Bhaduri K, Kargupta H (2008) A generic local algorithm for mining data streams in large distributed systems. IEEE Trans Knowl Data Eng 21(4):465–478

    Article  Google Scholar 

  40. 40.

    Xing K, Hu C, Yu J, Cheng X, Zhang F (2017) Mutual privacy preserving k-means clustering in social participatory sensing. IEEE Trans Ind Inform 13(4):2066–2076

    Article  Google Scholar 

  41. 41.

    Xu G, Li H, Dai Y, Yang K, Lin X (2019) Enabling efficient and geometric range query with access control over encrypted spatial data. IEEE Trans Inf Forensics Secur 14(4):870–885

    Article  Google Scholar 

  42. 42.

    Xu G, Li H, Liu S, Wen M, Lu R (2019) Efficient and privacy-preserving truth discovery in mobile crowd sensing systems. IEEE Trans Veh Technol 68(4):3854–3865

    Article  Google Scholar 

  43. 43.

    Xu G, Li H, Liu S, Yang K, Lin X (2020) Verifynet: Secure and verifiable federated learning. IEEE Trans Inf Forensics Secur 15(1):911–926

    Article  Google Scholar 

  44. 44.

    Xu G, Li H, Ren H, Yang K, Deng RH (2019) Data security issues in deep learning: Attacks, countermeasures and opportunities. IEEE Commun Mag 57(11):116–122. https://doi.org/10.1109/MCOM.001.1900091

    Article  Google Scholar 

  45. 45.

    Xu M, Guo M, Shang L, Jia X (2016) Multi-value image segmentation based on fcm algorithm and graph cut theory. In: 2016 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1333–1340. IEEE

  46. 46.

    Xue Q, Zhu Y, Wang J (2019) Joint distribution estimation and naïve bayes classification under local differential privacy. IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/TETC.2019.2959581

  47. 47.

    Yu H, Vaidya J, Jiang X (2006) Privacy-preserving svm classification on vertically partitioned data. In: Pacific-asia conference on knowledge discovery and data mining, pp 647–656. Springer

  48. 48.

    Yu TK, Lee D, Chang SM, Zhan J (2010) Multi-party k-means clustering with privacy consideration. In: International symposium on parallel and distributed processing with applications, pp 200–207. IEEE

  49. 49.

    Zhu Y, Li X, Wang J, Liu Y, Qu Z (2017) Practical secure naïve bayesian classification over encrypted big data in cloud. Int J Found Comput Sci 28(06):683–703

    Article  Google Scholar 

  50. 50.

    Zhu Y, Zhang Y, Li X, Yan H, Li J (2018) Improved collusion-resisting secure nearest neighbor query over encrypted data in cloud. Concurrency and Computation Practice and Experience. https://doi.org/10.1002/cpe.4681

Download references

Acknowledgements

This work is partly supported by the National Key Research and Development Program of China (No. 2017YFB0802300), the Natural Science Foundation of China (No. 61602240), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX18_0305), and the Research Fund of Guangxi Key Laboratory of Trusted Software (No. kx201906).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Xingxin Li.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: Special Issue on Security and Privacy in Machine Learning Assisted P2P Networks

Guest Editors: Hongwei Li, Rongxing Lu and Mohamed Mahmoud

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Li, X. Privacy-preserving k-means clustering with local synchronization in peer-to-peer networks. Peer-to-Peer Netw. Appl. 13, 2272–2284 (2020). https://doi.org/10.1007/s12083-020-00881-x

Download citation

Keywords

  • Privacy-preserving
  • k-means clustering
  • Peer-to-peer networks