Advertisement

Abstract

We show that adaptively sampled O(k) centers give a constant factor bi-criteria approximation for the k-means problem, with a constant probability. Moreover, these O(k) centers contain a subset of k centers which give a constant factor approximation, and can be found using LP-based techniques of Jain and Vazirani [JV01] and Charikar et al. [CGTS02]. Both these algorithms run in effectively O(nkd) time and extend the O(logk)-approximation achieved by the k-means++ algorithm of Arthur and Vassilvitskii [AV07].

Keywords

Adaptive Sampling Constant Probability Constant Factor Approximation Constant Factor Approximation Algorithm Parallel Axis Theorem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [ADHP09]
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Eu- clidean sum-of-squares clustering. Machine Learning 75(2), 245–248 (2009)CrossRefGoogle Scholar
  2. [AMR09]
    Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity (2009), http://arxiv.org/abs/0904.1113
  3. [AV06]
    Arthur, D., Vassilvitskii, S.: How slow is the k-means method?. In: Annual Symposium on Computational Geometry (SOCG) (2006)Google Scholar
  4. [AV07]
    Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007)Google Scholar
  5. [CGTS02]
    Charikar, M., Guha, S., Tardos, M., Shmoys, D.: A constant factor approximation for the k-median problem. Journal of Computer and System Sciences (2002)Google Scholar
  6. [Che09]
    Chen, K.: On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. Submitted to SIAM Journal on Computing (SICOMP) (2009)Google Scholar
  7. [COP03]
    Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algo- rithms for clustering problems. In: ACM Symposium on Theory of Computing (STOC), pp. 30–39 (2003)Google Scholar
  8. [Das08]
    Dasgupta, S.: The hardness of k-means clustering, Tech. Report CS2008- 0916, UC San Diego (2008)Google Scholar
  9. [dlVKKR03]
    de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: ACM Symposium on Theory of Computing (STOC), pp. 50–58. ACM Press, New York (2003)Google Scholar
  10. [GI03]
    Guruswami, V., Indyk, P.: Embeddings and non-approximability of geometric problems. In: ACM-SIAM Symposium on Discrete Algorithms (SODA) (2003)Google Scholar
  11. [GMM+03]
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clus- tering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (2003)CrossRefGoogle Scholar
  12. [Gon85]
    Gonzalez, T.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 293–306 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  13. [HPM04]
    Har-Peled, S., Mazumdar, S.: On core-sets for k-means and k-median clustering. In: ACM Symposium on Theory of Computing (STOC), pp. 291–300 (2004)Google Scholar
  14. [JV01]
    Jain, K., Vazirani, V.: Approximation algorithms for metric facility loca- tion and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of ACM 48, 274–296 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  15. [KMN+04]
    Kanugo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: A local search approximation algorithm for k-means clustering. Computational Geometry 28(2-3), 89–112 (2004)MathSciNetCrossRefGoogle Scholar
  16. [KNV08]
    Kanade, G., Nimbhorkar, P., Varadarajan, K.: On the NP-hardness of the 2-means problem (unpublished manuscript) (2008)Google Scholar
  17. [KSS04]
    Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)- approximation algorithm for k-means clustering in any dimensions. In: IEEE Symposium on Foundations of Computer Science (FOCS), pp. 454–462 (2004)Google Scholar
  18. [Llo82]
    Lloyd, S.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28(2), 129–136 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  19. [Mat00]
    Matoušek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24(1), 61–84 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  20. [Mey01]
    Meyerson, A.: Online facility location. In: IEEE Symposium on Foundations of Computer Science (FOCS) (2001)Google Scholar
  21. [MNV09]
    Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp. 274–285. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  22. [MP02]
    Mettu, R., Plaxton, C.: Optimal time bounds for approximate clustering. Machine Learning, 344–351 (2002)Google Scholar
  23. [ORSS06]
    Ostrovsky, R., Rabani, Y., Schulman, L., Swamy, C.: The effectiveness of Lloyd-type methods for the k-means problem. In: IEEE Symposium on Foundations of Computer Science (FOCS), pp. 165–176 (2006)Google Scholar
  24. [Vat09]
    Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Annual Symposium on Computational Geometry (SOCG) (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ankit Aggarwal
    • 1
  • Amit Deshpande
    • 2
  • Ravi Kannan
    • 2
  1. 1.IIT DelhiIndia
  2. 2.Microsoft ResearchIndia

Personalised recommendations