A Convergent Differentially Private k-Means Clustering Algorithm
Preserving differential privacy (DP) for the iterative clustering algorithms has been extensively studied in the interactive and the non-interactive settings. However, existing interactive differentially private clustering algorithms suffer from a non-convergence problem, i.e., these algorithms may not terminate without a predefined number of iterations. This problem severely impacts the clustering quality and the efficiency of the algorithm. To resolve this problem, we propose a novel iterative approach in the interactive settings which controls the orientation of the centroids movement over the iterations to ensure the convergence by injecting DP noise in a selected area. We prove that, in the expected case, our approach converges to the same centroids as Lloyd’s algorithm in at most twice the iterations of Lloyd’s algorithm. We perform experimental evaluations on real-world datasets to show that our algorithm outperforms the state-of-the-art of the interactive differentially private clustering algorithms with a guaranteed convergence and better clustering quality to meet the same DP requirement.
KeywordsDifferential privacy Adversarial machine learning k-means clustering
The authors would like to thank the anonymous reviewers for their valuable comments. This work is supported by Australian Government Research Training Program Scholarship, Australian Research Council Discovery Project DP150104871, National Key R & D Program of China Project #2017YFB0203201, and supported with supercomputing resources provided by the Phoenix HPC service at the University of Adelaide. The corresponding author is Hong Shen.
- 1.Andrés, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 901–914. ACM (2013)Google Scholar
- 2.Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138. ACM (2005)Google Scholar
- 3.Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
- 8.Fränti, P., Virmajoki, O.: Clustering datasets (2018). http://cs.uef.fi/sipu/datasets/
- 9.Gupta, A., Ligett, K., McSherry, F., Roth, A., Talwar, K.: Differentially private combinatorial optimization. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1106–1125 (2010)Google Scholar
- 10.Komarek, P.: Komarix datasets (2018). http://komarix.org/ac/ds/
- 12.McSherry, F.: Privacy integrated queries. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM (2009)Google Scholar
- 13.McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 2007 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 94–103. IEEE (2007)Google Scholar
- 14.Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D.: GUPT: privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 349–360. ACM (2012)Google Scholar
- 15.Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM (2007)Google Scholar
- 16.Park, M., Foulds, J., Choudhary, K., Welling, M.: DP-EM: differentially private expectation maximization. In: Artificial Intelligence and Statistics, pp. 896–904 (2017)Google Scholar
- 18.Zhang, J., Xiao, X., Yang, Y., Zhang, Z., Winslett, M.: PrivGene: differentially private model fitting using genetic algorithms. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 665–676. ACM (2013)Google Scholar