Knowledge and Information Systems

, Volume 25, Issue 2, pp 253–279 | Cite as

Large-scale k-means clustering with user-centric privacy-preservation

  • Jun SakumaEmail author
  • Shigenobu Kobayashi
Regular Paper


A k-means clustering with a new privacy-preserving concept, user-centric privacy preservation, is presented. In this framework, users can conduct data mining using their private information by storing them in their local storage. After the computation, they obtain only the mining result without disclosing private information to others. In most cases, the number of parties that can join conventional privacy-preserving data mining has been assumed to be only two. In our framework, we assume large numbers of parties join the protocol; therefore, not only scalability but also asynchronism and fault-tolerance is important. Considering this, we propose a k-mean algorithm combined with a decentralized cryptographic protocol and a gossip-based protocol. The computational complexity is O(log n) with respect to the number of parties n, and experimental results show that our protocol is scalable even with one million parties.


Privacy Privacy-preserving data mining Clustering k-means Peer-to-peer 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breese J, Heckerman D (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence (UAI), pp 43–52Google Scholar
  2. 2.
    Dåmgard I, Jurik M (2001) A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Public key cryptography. Springer, BerlinGoogle Scholar
  3. 3.
    Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Proceedings of the IEEE international conference on privacy, security and data mining, vol 14, pp 1–8. Australian Computer Society, DarlinghurstGoogle Scholar
  4. 4.
    Evfimievski A et al (2004) Privacy preserving mining of association rules. Inf Syst 29(4): 343–364CrossRefGoogle Scholar
  5. 5.
    Goldreich O (2004) Foundations of Cryptography: basic applications, vol 2. Cambridge University Press, LondonGoogle Scholar
  6. 6.
    Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 593–599. ACM Press, New YorkGoogle Scholar
  7. 7.
    Jelasity M et al (2005) Gossip-based aggregation in large dynamic networks. ACM Trans Comput Syst (TOCS) 23(3): 219–252CrossRefGoogle Scholar
  8. 8.
    Jha S et al (2005) Privacy preserving clustering. Lect Notes Comput Sci 3679: 397CrossRefGoogle Scholar
  9. 9.
    Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, pp 1026–1037Google Scholar
  10. 10.
    Kearns M et al (2007) Privacy-preserving belief propagation and sampling. In: NIPS 20, vol 20. MIT Press, CambridgeGoogle Scholar
  11. 11.
    Kempe D et al (2003) Gossip-based computation of aggregate information. In: Proceedings of 44th annual IEEE symposium on foundations of computer science 2003 (FOCS), pp 482–491Google Scholar
  12. 12.
    Kowalczyk W, Vlassis N (2005) Newscast EM. In: Proceedings of neural information processing system, vol 17. MIT Press, Cambridge, pp 713–720Google Scholar
  13. 13.
    Laur S et al (2006) Cryptographically private support vector machines. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, pp 618–624Google Scholar
  14. 14.
    Lin X et al (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inform Syst 8(1): 68–81CrossRefGoogle Scholar
  15. 15.
    Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15(3): 177–206zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Malkhi D et al (2004) Fairplay: a secure two-party computation system. In: Proceedings of the 13th USENIX security symposium, pp 287–302Google Scholar
  17. 17.
    Merugu S, Ghosh J (2003) Privacy-preserving distributed clustering using generative models. In: Proceedings of third IEEE international conference on data mining (ICDM), pp 211–218Google Scholar
  18. 18.
    Padmanabhan V et al (2003) Resilient peer-to-peer streaming. In: Proceedings of eleventh IEEE international conference on network protocols, pp 16–27Google Scholar
  19. 19.
    Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Eurocrypt’99, Springer, Berlin, pp 223–238Google Scholar
  20. 20.
    Pedersen T et al (1991) A threshold cryptosystem without a trusted party. Eurocrypt 91: 129–140Google Scholar
  21. 21.
    Sakuma J et al (2008) Privacy-preserving reinforcement learning. In: Proceedings of the 25th international conference on machine learning (ICML). ACM Press, New York, pp 864–871Google Scholar
  22. 22.
    Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 557–570zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Teng Z, Du W (2009) A hybrid multi-group approach for privacy-preserving data mining. Knowl Inform Syst 19(2): 133–157CrossRefGoogle Scholar
  24. 24.
    Tran D et al (2003) ZIGZAG: an efficient peer-to-peer scheme for media streaming. In: Proceedings of twenty-second annual joint conference of the IEEE computer and communications societies 2003 (INFOCOM), vol 2, pp 1283–1292Google Scholar
  25. 25.
    Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 206–215Google Scholar
  26. 26.
    Vaidya J et al (2008) Privacy-preserving Naïve Bayes Classification. VLDB J 17(4): 879–898CrossRefGoogle Scholar
  27. 27.
    Vaidya J et al (2008) Privacy-preserving SVM classification. Knowl Inform Syst 14(2): 161–178CrossRefGoogle Scholar
  28. 28.
    Yang Z et al (2005) Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the 5th international conference on data mining (ICDM). Society for Industrial MathematicsGoogle Scholar
  29. 29.
    Yao AC-C (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, pp 162–167Google Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TsukubaTsukubaJapan
  2. 2.Department of Computational Intelligence and Systems ScienceTokyo Institute of TechnologyYokohamaJapan

Personalised recommendations