A Simple D 2-Sampling Based PTAS for k-Means and other Clustering Problems

Jaiswal, Ragesh; Kumar, Amit; Sen, Sandeep

doi:10.1007/978-3-642-32241-9_2

Ragesh Jaiswal¹⁷,
Amit Kumar¹⁷ &
Sandeep Sen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7434))

Included in the following conference series:

International Computing and Combinatorics Conference

1094 Accesses
5 Citations

Abstract

Given a set of points P ⊂ ℝ^d, the k-means clustering problem is to find a set of k centers C = {c ₁,...,c _k}, c _i ∈ ℝ^d, such that the objective function ∑ _x ∈ P d(x,C)², where d(x,C) denotes the distance between x and the closest center in C, is minimized. This is one of the most prominent objective functions that have been studied with respect to clustering.

D ²-sampling [1] is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points P ⊆ ℝ^d, the first point is chosen uniformly at random from P. Subsequently, a point from P is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled points.

D ²-sampling has been shown to have nice properties with respect to the k-means clustering problem. Arthur and Vassilvitskii [1] show that k points chosen as centers from P using D ²-sampling gives an O(logk) approximation in expectation. Ailon et. al. [2] and Aggarwal et. al. [3] extended results of [1] to show that O(k) points chosen as centers using D ²-sampling give O(1) approximation to the k-means objective function with high probability. In this paper, we further demonstrate the power of D ²-sampling by giving a simple randomized (1 + ε)-approximation algorithm that uses the D ²-sampling in its core.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improved PTAS for the constrained k-means problem

Article 23 August 2018

k-means++ under Approximation Stability

Faster Algorithms for the Constrained k-means Problem

Article 06 November 2017

References

Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Google Scholar
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Advances in Neural Information Processing Systems, vol. 22, pp. 10–18 (2009)
Google Scholar
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive Sampling for k-Means Clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX 2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009)
Chapter Google Scholar
Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the web
Google Scholar
Faloutsos, C., Barber, R., Flickner, M., Hafner, J.: Efficient and effective querying by image content. Journal of Intelligent Information Systems (1994)
Google Scholar
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science (1990)
Google Scholar
Swain, M., Ballard, D.: Color indexing. International Journal of Computer Vision (1991)
Google Scholar
Dasgupta, S.: The hardness of k-means clustering. Technical Report CS2008-0916, Department of Computer Science and Engineering. University of California San Diego (2008)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proc. 22nd Annual Symposium on Computational Geometry, pp. 144–153 (2006)
Google Scholar
Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of lloyd-type methods for the k-means problem. In: Proc. 47th IEEE FOCS, pp. 165–176 (2006)
Google Scholar
Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for bregman divergences. In: ACM SIAM Symposium on Discrete Algorithms, pp. 1088–1097 (2009)
Google Scholar
Chen, K.: On k-median clustering in high dimensions. In: SODA, pp. 1177–1185 (2006)
Google Scholar
Feldman, D., Monemizadeh, M., Sohler, C.: A ptas for k-means clustering based on weak coresets. In: Symposium on Computational Geometry, pp. 11–18 (2007)
Google Scholar
Inaba, M., Katoh, N., Imai, H.: Applications of weighted voronoi diagrams and randomization to variance based k-clustering. In: Proceedings of the Tenth Annual Symposium on Computational Geometry, pp. 332–339 (1994)
Google Scholar
Matousek, J.: On approximate geometric k-clustering. In: Discrete and Computational Geometry (2000)
Google Scholar
Badoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: STOC, pp. 250–257 (2002)
Google Scholar
de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: ACM Symposium on Theory of Computing, pp. 50–58 (2003)
Google Scholar
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: ACM Symposium on Theory of Computing, pp. 291–300 (2004)
Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2) (2010)
Google Scholar
Awasthi, P., Blum, A., Sheffet, O.: Stability yields a ptas for k-median and k-means clustering. In: FOCS, pp. 309–318 (2010)
Google Scholar
Har-Peled, S., Sadri, B.: How fast is the k-means method? In: ACM SIAM Symposium on Discrete Algorithms, pp. 877–885 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Delhi, India
Ragesh Jaiswal, Amit Kumar & Sandeep Sen

Authors

Ragesh Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar
Amit Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Sen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of IT, University of Sydney, Building J12, 2006, Sydney, NSW, Australia
Joachim Gudmundsson , Julián Mestre & Taso Viglas , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaiswal, R., Kumar, A., Sen, S. (2012). A Simple D ²-Sampling Based PTAS for k-Means and other Clustering Problems. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds) Computing and Combinatorics. COCOON 2012. Lecture Notes in Computer Science, vol 7434. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32241-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-32241-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32240-2
Online ISBN: 978-3-642-32241-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Simple D ²-Sampling Based PTAS for k-Means and other Clustering Problems

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improved PTAS for the constrained k-means problem

k-means++ under Approximation Stability

Faster Algorithms for the Constrained k-means Problem

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Simple D 2-Sampling Based PTAS for k-Means and other Clustering Problems

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improved PTAS for the constrained k-means problem

k-means++ under Approximation Stability

Faster Algorithms for the Constrained k-means Problem

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

A Simple D ²-Sampling Based PTAS for k-Means and other Clustering Problems