Skip to main content

A Simple D 2-Sampling Based PTAS for k-Means and other Clustering Problems

  • Conference paper
Computing and Combinatorics (COCOON 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7434))

Included in the following conference series:

Abstract

Given a set of points P ⊂ ℝd, the k-means clustering problem is to find a set of k centers C = {c 1,...,c k }, c i  ∈ ℝd, such that the objective function ∑  x ∈ P d(x,C)2, where d(x,C) denotes the distance between x and the closest center in C, is minimized. This is one of the most prominent objective functions that have been studied with respect to clustering.

D 2-sampling [1] is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points P ⊆ ℝd, the first point is chosen uniformly at random from P. Subsequently, a point from P is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled points.

D 2-sampling has been shown to have nice properties with respect to the k-means clustering problem. Arthur and Vassilvitskii [1] show that k points chosen as centers from P using D 2-sampling gives an O(logk) approximation in expectation. Ailon et. al. [2] and Aggarwal et. al. [3] extended results of [1] to show that O(k) points chosen as centers using D 2-sampling give O(1) approximation to the k-means objective function with high probability. In this paper, we further demonstrate the power of D 2-sampling by giving a simple randomized (1 + ε)-approximation algorithm that uses the D 2-sampling in its core.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)

    Google Scholar 

  2. Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Advances in Neural Information Processing Systems, vol. 22, pp. 10–18 (2009)

    Google Scholar 

  3. Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive Sampling for k-Means Clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX 2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the web

    Google Scholar 

  5. Faloutsos, C., Barber, R., Flickner, M., Hafner, J.: Efficient and effective querying by image content. Journal of Intelligent Information Systems (1994)

    Google Scholar 

  6. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science (1990)

    Google Scholar 

  7. Swain, M., Ballard, D.: Color indexing. International Journal of Computer Vision (1991)

    Google Scholar 

  8. Dasgupta, S.: The hardness of k-means clustering. Technical Report CS2008-0916, Department of Computer Science and Engineering. University of California San Diego (2008)

    Google Scholar 

  9. Lloyd, S.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  10. Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proc. 22nd Annual Symposium on Computational Geometry, pp. 144–153 (2006)

    Google Scholar 

  11. Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of lloyd-type methods for the k-means problem. In: Proc. 47th IEEE FOCS, pp. 165–176 (2006)

    Google Scholar 

  12. Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for bregman divergences. In: ACM SIAM Symposium on Discrete Algorithms, pp. 1088–1097 (2009)

    Google Scholar 

  13. Chen, K.: On k-median clustering in high dimensions. In: SODA, pp. 1177–1185 (2006)

    Google Scholar 

  14. Feldman, D., Monemizadeh, M., Sohler, C.: A ptas for k-means clustering based on weak coresets. In: Symposium on Computational Geometry, pp. 11–18 (2007)

    Google Scholar 

  15. Inaba, M., Katoh, N., Imai, H.: Applications of weighted voronoi diagrams and randomization to variance based k-clustering. In: Proceedings of the Tenth Annual Symposium on Computational Geometry, pp. 332–339 (1994)

    Google Scholar 

  16. Matousek, J.: On approximate geometric k-clustering. In: Discrete and Computational Geometry (2000)

    Google Scholar 

  17. Badoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: STOC, pp. 250–257 (2002)

    Google Scholar 

  18. de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: ACM Symposium on Theory of Computing, pp. 50–58 (2003)

    Google Scholar 

  19. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: ACM Symposium on Theory of Computing, pp. 291–300 (2004)

    Google Scholar 

  20. Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2) (2010)

    Google Scholar 

  21. Awasthi, P., Blum, A., Sheffet, O.: Stability yields a ptas for k-median and k-means clustering. In: FOCS, pp. 309–318 (2010)

    Google Scholar 

  22. Har-Peled, S., Sadri, B.: How fast is the k-means method? In: ACM SIAM Symposium on Discrete Algorithms, pp. 877–885 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jaiswal, R., Kumar, A., Sen, S. (2012). A Simple D 2-Sampling Based PTAS for k-Means and other Clustering Problems. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds) Computing and Combinatorics. COCOON 2012. Lecture Notes in Computer Science, vol 7434. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32241-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32241-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32240-2

  • Online ISBN: 978-3-642-32241-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics