Advertisement

A Convergent Differentially Private k-Means Clustering Algorithm

  • Zhigang Lu
  • Hong ShenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11439)

Abstract

Preserving differential privacy (DP) for the iterative clustering algorithms has been extensively studied in the interactive and the non-interactive settings. However, existing interactive differentially private clustering algorithms suffer from a non-convergence problem, i.e., these algorithms may not terminate without a predefined number of iterations. This problem severely impacts the clustering quality and the efficiency of the algorithm. To resolve this problem, we propose a novel iterative approach in the interactive settings which controls the orientation of the centroids movement over the iterations to ensure the convergence by injecting DP noise in a selected area. We prove that, in the expected case, our approach converges to the same centroids as Lloyd’s algorithm in at most twice the iterations of Lloyd’s algorithm. We perform experimental evaluations on real-world datasets to show that our algorithm outperforms the state-of-the-art of the interactive differentially private clustering algorithms with a guaranteed convergence and better clustering quality to meet the same DP requirement.

Keywords

Differential privacy Adversarial machine learning k-means clustering 

Notes

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments. This work is supported by Australian Government Research Training Program Scholarship, Australian Research Council Discovery Project DP150104871, National Key R & D Program of China Project #2017YFB0203201, and supported with supercomputing resources provided by the Phoenix HPC service at the University of Adelaide. The corresponding author is Hong Shen.

References

  1. 1.
    Andrés, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 901–914. ACM (2013)Google Scholar
  2. 2.
    Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM (2005)Google Scholar
  3. 3.
    Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
  4. 4.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006).  https://doi.org/10.1007/11787006_1CrossRefGoogle Scholar
  5. 5.
    Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)CrossRefGoogle Scholar
  6. 6.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006).  https://doi.org/10.1007/11681878_14CrossRefGoogle Scholar
  7. 7.
    Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006). http://cs.uef.fi/sipu/datasets/CrossRefGoogle Scholar
  8. 8.
    Fränti, P., Virmajoki, O.: Clustering datasets (2018). http://cs.uef.fi/sipu/datasets/
  9. 9.
    Gupta, A., Ligett, K., McSherry, F., Roth, A., Talwar, K.: Differentially private combinatorial optimization. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1106–1125 (2010)Google Scholar
  10. 10.
    Komarek, P.: Komarix datasets (2018). http://komarix.org/ac/ds/
  11. 11.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982)MathSciNetCrossRefGoogle Scholar
  12. 12.
    McSherry, F.: Privacy integrated queries. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM (2009)Google Scholar
  13. 13.
    McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 2007 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 94–103. IEEE (2007)Google Scholar
  14. 14.
    Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D.: GUPT: privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 349–360. ACM (2012)Google Scholar
  15. 15.
    Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM (2007)Google Scholar
  16. 16.
    Park, M., Foulds, J., Choudhary, K., Welling, M.: DP-EM: differentially private expectation maximization. In: Artificial Intelligence and Statistics, pp. 896–904 (2017)Google Scholar
  17. 17.
    Su, D., Cao, J., Li, N., Bertino, E., Lyu, M., Jin, H.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Secur. 20(4), 16 (2017)CrossRefGoogle Scholar
  18. 18.
    Zhang, J., Xiao, X., Yang, Y., Zhang, Z., Winslett, M.: PrivGene: differentially private model fitting using genetic algorithms. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 665–676. ACM (2013)Google Scholar
  19. 19.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997). http://cs.uef.fi/sipu/datasets/CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer ScienceThe University of AdelaideAdelaideAustralia
  2. 2.School of Data and Computer ScienceSun Yat-sen UniversityGuangzhouChina

Personalised recommendations