Skip to main content

Abstract

We show that adaptively sampled O(k) centers give a constant factor bi-criteria approximation for the k-means problem, with a constant probability. Moreover, these O(k) centers contain a subset of k centers which give a constant factor approximation, and can be found using LP-based techniques of Jain and Vazirani [JV01] and Charikar et al. [CGTS02]. Both these algorithms run in effectively O(nkd) time and extend the O(logk)-approximation achieved by the k-means++ algorithm of Arthur and Vassilvitskii [AV07].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Eu- clidean sum-of-squares clustering. Machine Learning 75(2), 245–248 (2009)

    Article  Google Scholar 

  2. Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity (2009), http://arxiv.org/abs/0904.1113

  3. Arthur, D., Vassilvitskii, S.: How slow is the k-means method?. In: Annual Symposium on Computational Geometry (SOCG) (2006)

    Google Scholar 

  4. Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007)

    Google Scholar 

  5. Charikar, M., Guha, S., Tardos, M., Shmoys, D.: A constant factor approximation for the k-median problem. Journal of Computer and System Sciences (2002)

    Google Scholar 

  6. Chen, K.: On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. Submitted to SIAM Journal on Computing (SICOMP) (2009)

    Google Scholar 

  7. Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algo- rithms for clustering problems. In: ACM Symposium on Theory of Computing (STOC), pp. 30–39 (2003)

    Google Scholar 

  8. Dasgupta, S.: The hardness of k-means clustering, Tech. Report CS2008- 0916, UC San Diego (2008)

    Google Scholar 

  9. de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: ACM Symposium on Theory of Computing (STOC), pp. 50–58. ACM Press, New York (2003)

    Google Scholar 

  10. Guruswami, V., Indyk, P.: Embeddings and non-approximability of geometric problems. In: ACM-SIAM Symposium on Discrete Algorithms (SODA) (2003)

    Google Scholar 

  11. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clus- tering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (2003)

    Article  Google Scholar 

  12. Gonzalez, T.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 293–306 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  13. Har-Peled, S., Mazumdar, S.: On core-sets for k-means and k-median clustering. In: ACM Symposium on Theory of Computing (STOC), pp. 291–300 (2004)

    Google Scholar 

  14. Jain, K., Vazirani, V.: Approximation algorithms for metric facility loca- tion and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of ACM 48, 274–296 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kanugo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: A local search approximation algorithm for k-means clustering. Computational Geometry 28(2-3), 89–112 (2004)

    Article  MathSciNet  Google Scholar 

  16. Kanade, G., Nimbhorkar, P., Varadarajan, K.: On the NP-hardness of the 2-means problem (unpublished manuscript) (2008)

    Google Scholar 

  17. Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)- approximation algorithm for k-means clustering in any dimensions. In: IEEE Symposium on Foundations of Computer Science (FOCS), pp. 454–462 (2004)

    Google Scholar 

  18. Lloyd, S.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28(2), 129–136 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  19. Matoušek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24(1), 61–84 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  20. Meyerson, A.: Online facility location. In: IEEE Symposium on Foundations of Computer Science (FOCS) (2001)

    Google Scholar 

  21. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp. 274–285. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  22. Mettu, R., Plaxton, C.: Optimal time bounds for approximate clustering. Machine Learning, 344–351 (2002)

    Google Scholar 

  23. Ostrovsky, R., Rabani, Y., Schulman, L., Swamy, C.: The effectiveness of Lloyd-type methods for the k-means problem. In: IEEE Symposium on Foundations of Computer Science (FOCS), pp. 165–176 (2006)

    Google Scholar 

  24. Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Annual Symposium on Computational Geometry (SOCG) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aggarwal, A., Deshpande, A., Kannan, R. (2009). Adaptive Sampling for k-Means Clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2009 2009. Lecture Notes in Computer Science, vol 5687. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03685-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03685-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03684-2

  • Online ISBN: 978-3-642-03685-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics