Adaptive Sampling for k-Means Clustering
Conference paper
Abstract
We show that adaptively sampled O(k) centers give a constant factor bi-criteria approximation for the k-means problem, with a constant probability. Moreover, these O(k) centers contain a subset of k centers which give a constant factor approximation, and can be found using LP-based techniques of Jain and Vazirani [JV01] and Charikar et al. [CGTS02]. Both these algorithms run in effectively O(nkd) time and extend the O(logk)-approximation achieved by the k-means++ algorithm of Arthur and Vassilvitskii [AV07].
Keywords
Adaptive Sampling Constant Probability Constant Factor Approximation Constant Factor Approximation Algorithm Parallel Axis Theorem
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- [ADHP09]Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Eu- clidean sum-of-squares clustering. Machine Learning 75(2), 245–248 (2009)CrossRefGoogle Scholar
- [AMR09]Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity (2009), http://arxiv.org/abs/0904.1113
- [AV06]Arthur, D., Vassilvitskii, S.: How slow is the k-means method?. In: Annual Symposium on Computational Geometry (SOCG) (2006)Google Scholar
- [AV07]Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007)Google Scholar
- [CGTS02]Charikar, M., Guha, S., Tardos, M., Shmoys, D.: A constant factor approximation for the k-median problem. Journal of Computer and System Sciences (2002)Google Scholar
- [Che09]Chen, K.: On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. Submitted to SIAM Journal on Computing (SICOMP) (2009)Google Scholar
- [COP03]Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algo- rithms for clustering problems. In: ACM Symposium on Theory of Computing (STOC), pp. 30–39 (2003)Google Scholar
- [Das08]Dasgupta, S.: The hardness of k-means clustering, Tech. Report CS2008- 0916, UC San Diego (2008)Google Scholar
- [dlVKKR03]de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: ACM Symposium on Theory of Computing (STOC), pp. 50–58. ACM Press, New York (2003)Google Scholar
- [GI03]Guruswami, V., Indyk, P.: Embeddings and non-approximability of geometric problems. In: ACM-SIAM Symposium on Discrete Algorithms (SODA) (2003)Google Scholar
- [GMM+03]Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clus- tering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (2003)CrossRefGoogle Scholar
- [Gon85]Gonzalez, T.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 293–306 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
- [HPM04]Har-Peled, S., Mazumdar, S.: On core-sets for k-means and k-median clustering. In: ACM Symposium on Theory of Computing (STOC), pp. 291–300 (2004)Google Scholar
- [JV01]Jain, K., Vazirani, V.: Approximation algorithms for metric facility loca- tion and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of ACM 48, 274–296 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
- [KMN+04]Kanugo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: A local search approximation algorithm for k-means clustering. Computational Geometry 28(2-3), 89–112 (2004)MathSciNetCrossRefGoogle Scholar
- [KNV08]Kanade, G., Nimbhorkar, P., Varadarajan, K.: On the NP-hardness of the 2-means problem (unpublished manuscript) (2008)Google Scholar
- [KSS04]Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)- approximation algorithm for k-means clustering in any dimensions. In: IEEE Symposium on Foundations of Computer Science (FOCS), pp. 454–462 (2004)Google Scholar
- [Llo82]Lloyd, S.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28(2), 129–136 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
- [Mat00]Matoušek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24(1), 61–84 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
- [Mey01]Meyerson, A.: Online facility location. In: IEEE Symposium on Foundations of Computer Science (FOCS) (2001)Google Scholar
- [MNV09]Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp. 274–285. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- [MP02]Mettu, R., Plaxton, C.: Optimal time bounds for approximate clustering. Machine Learning, 344–351 (2002)Google Scholar
- [ORSS06]Ostrovsky, R., Rabani, Y., Schulman, L., Swamy, C.: The effectiveness of Lloyd-type methods for the k-means problem. In: IEEE Symposium on Foundations of Computer Science (FOCS), pp. 165–176 (2006)Google Scholar
- [Vat09]Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Annual Symposium on Computational Geometry (SOCG) (2009)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2009