A Tight Lower Bound Instance for k-means++ in Constant Dimension

Bhattacharya, Anup; Jaiswal, Ragesh; Ailon, Nir

doi:10.1007/978-3-319-06089-7_2

Anup Bhattacharya¹⁹,
Ragesh Jaiswal¹⁹ &
Nir Ailon²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8402))

Included in the following conference series:

International Conference on Theory and Applications of Models of Computation

869 Accesses
1 Citations

Abstract

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows:

Pick the first center randomly from the given points. For i > 1, pick a point to be the i ^th center with probability proportional to the square of the Euclidean distance of this point to the closest previously (i − 1) chosen centers.

The k-means++ seeding algorithm is not only simple and fast but also gives an O(logk) approximation in expectation as shown by Arthur and Vassilvitskii [7]. There are datasets [7,3] on which this seeding algorithm gives an approximation factor of Ω(logk) in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably high probability (say 1/poly(k)). Brunsch and Röglin [9] gave a dataset where the k-means++ seeding algorithm achieves an O(logk) approximation ratio with probability that is exponentially small in k. However, this and all other known lower-bound examples [7,3] are high dimensional. So, an open problem was to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an O(logk) approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. [13] and by Brunsch and Röglin [9].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ackermann, M.R., Blömer, J.: Bregman clustering for separable instances. In: Kaplan, H. (ed.) SWAT 2010. LNCS, vol. 6139, pp. 212–223. Springer, Heidelberg (2010)
Chapter Google Scholar
Agarwal, M., Jaiswal, R., Pal, A.: k-means++ under approximation stability. In: Chan, T.-H.H., Lau, L.C., Trevisan, L. (eds.) TAMC 2013. LNCS, vol. 7876, pp. 84–95. Springer, Heidelberg (2013)
Chapter Google Scholar
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) Approx and Random 2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009)
Chapter Google Scholar
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: NIPS, pp. 10–18 (2009)
Google Scholar
Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proceedings of the Twenty-second Annual Symposium on Computational Geometry, SCG 2006, pp. 144–153. ACM, New York (2006)
Chapter Google Scholar
Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 153–164. IEEE Computer Society, Washington, DC (2006)
Chapter Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)
Google Scholar
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)
Google Scholar
Brunsch, T., Röglin, H.: A bad instance for k-means++. Theoretical Computer Science (2012)
Google Scholar
Dasgupta, S.: The hardness of k-means clustering. Technical report, University of California San Diego
Google Scholar
Jaiswal, R., Garg, N.: Analysis of k-means++ for separable data. In: Gupta, A., Jansen, K., Rolim, J., Servedio, R. (eds.) APPROX/RANDOM 2012. LNCS, vol. 7408, pp. 591–602. Springer, Heidelberg (2012)
Chapter Google Scholar
Jaiswal, R., Kumar, A., Sen, S.: A simple D ²-sampling based PTAS for k-means and other clustering problems. Algorithmica (2013)
Google Scholar
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theoretical Computer Science 442, 13–21 (2012); Special Issue on the Workshop on Algorithms and Computation (WALCOM 2009)
Google Scholar
Vattani, A.: The planar k-means problem is NP-hard. Manuscript (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

IIT Delhi, India
Anup Bhattacharya & Ragesh Jaiswal
Technion, Haifa, Israel
Nir Ailon

Authors

Anup Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Ragesh Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar
Nir Ailon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, College of Engineering, Anna University, Chennai, 600 025, Chennai, India
T. V. Gopal
Department of Computer Science and Engineering, Indian Institute of Technology, 208016, Kanpur, India
Manindra Agrawal
State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Zhongguancun, Haidian District, 100190, Beijing, China
Angsheng Li
School of Mathematics, University of Leeds, LS2 9JT, Leeds, UK
S. Barry Cooper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhattacharya, A., Jaiswal, R., Ailon, N. (2014). A Tight Lower Bound Instance for k-means++ in Constant Dimension. In: Gopal, T.V., Agrawal, M., Li, A., Cooper, S.B. (eds) Theory and Applications of Models of Computation. TAMC 2014. Lecture Notes in Computer Science, vol 8402. Springer, Cham. https://doi.org/10.1007/978-3-319-06089-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-06089-7_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06088-0
Online ISBN: 978-3-319-06089-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics