Adaptive Sampling for k-Means Clustering

Aggarwal, Ankit; Deshpande, Amit; Kannan, Ravi

doi:10.1007/978-3-642-03685-9_2

Ankit Aggarwal²⁰,
Amit Deshpande²¹ &
Ravi Kannan²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5687))

Included in the following conference series:

International Workshop on Approximation Algorithms for Combinatorial Optimization
International Workshop on Randomization and Approximation Techniques in Computer Science

2365 Accesses
42 Citations

Abstract

We show that adaptively sampled O(k) centers give a constant factor bi-criteria approximation for the k-means problem, with a constant probability. Moreover, these O(k) centers contain a subset of k centers which give a constant factor approximation, and can be found using LP-based techniques of Jain and Vazirani [JV01] and Charikar et al. [CGTS02]. Both these algorithms run in effectively O(nkd) time and extend the O(logk)-approximation achieved by the k-means++ algorithm of Arthur and Vassilvitskii [AV07].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Eu- clidean sum-of-squares clustering. Machine Learning 75(2), 245–248 (2009)
Article Google Scholar
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity (2009), http://arxiv.org/abs/0904.1113
Arthur, D., Vassilvitskii, S.: How slow is the k-means method?. In: Annual Symposium on Computational Geometry (SOCG) (2006)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007)
Google Scholar
Charikar, M., Guha, S., Tardos, M., Shmoys, D.: A constant factor approximation for the k-median problem. Journal of Computer and System Sciences (2002)
Google Scholar
Chen, K.: On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. Submitted to SIAM Journal on Computing (SICOMP) (2009)
Google Scholar
Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algo- rithms for clustering problems. In: ACM Symposium on Theory of Computing (STOC), pp. 30–39 (2003)
Google Scholar
Dasgupta, S.: The hardness of k-means clustering, Tech. Report CS2008- 0916, UC San Diego (2008)
Google Scholar
de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: ACM Symposium on Theory of Computing (STOC), pp. 50–58. ACM Press, New York (2003)
Google Scholar
Guruswami, V., Indyk, P.: Embeddings and non-approximability of geometric problems. In: ACM-SIAM Symposium on Discrete Algorithms (SODA) (2003)
Google Scholar
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clus- tering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (2003)
Article Google Scholar
Gonzalez, T.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 293–306 (1985)
Article MathSciNet MATH Google Scholar
Har-Peled, S., Mazumdar, S.: On core-sets for k-means and k-median clustering. In: ACM Symposium on Theory of Computing (STOC), pp. 291–300 (2004)
Google Scholar
Jain, K., Vazirani, V.: Approximation algorithms for metric facility loca- tion and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of ACM 48, 274–296 (2001)
Article MathSciNet MATH Google Scholar
Kanugo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: A local search approximation algorithm for k-means clustering. Computational Geometry 28(2-3), 89–112 (2004)
Article MathSciNet Google Scholar
Kanade, G., Nimbhorkar, P., Varadarajan, K.: On the NP-hardness of the 2-means problem (unpublished manuscript) (2008)
Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)- approximation algorithm for k-means clustering in any dimensions. In: IEEE Symposium on Foundations of Computer Science (FOCS), pp. 454–462 (2004)
Google Scholar
Lloyd, S.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28(2), 129–136 (1982)
Article MathSciNet MATH Google Scholar
Matoušek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24(1), 61–84 (2000)
Article MathSciNet MATH Google Scholar
Meyerson, A.: Online facility location. In: IEEE Symposium on Foundations of Computer Science (FOCS) (2001)
Google Scholar
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp. 274–285. Springer, Heidelberg (2009)
Chapter Google Scholar
Mettu, R., Plaxton, C.: Optimal time bounds for approximate clustering. Machine Learning, 344–351 (2002)
Google Scholar
Ostrovsky, R., Rabani, Y., Schulman, L., Swamy, C.: The effectiveness of Lloyd-type methods for the k-means problem. In: IEEE Symposium on Foundations of Computer Science (FOCS), pp. 165–176 (2006)
Google Scholar
Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Annual Symposium on Computational Geometry (SOCG) (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

IIT Delhi, India
Ankit Aggarwal
Microsoft Research, India
Amit Deshpande & Ravi Kannan

Authors

Ankit Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Amit Deshpande
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Kannan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Applied Math and Computer Science, The Weizmann Institute of Science, 76100, Rehovot, Israel
Irit Dinur
Institute for Computer Science and Applied Mathematics, University of Kiel, Olshausenstr. 40, 24098, Kiel, Germany
Klaus Jansen
Technion, Computer Science Department, 32000, Haifa, Israel
Joseph Naor
University of Geneva, Centre Universitaire d’Informatique, Battelle Bat. A, 7 rte de Drize, 1227, Carouge, Switzerland
José Rolim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aggarwal, A., Deshpande, A., Kannan, R. (2009). Adaptive Sampling for k-Means Clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2009 2009. Lecture Notes in Computer Science, vol 5687. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03685-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-03685-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03684-2
Online ISBN: 978-3-642-03685-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics