Skip to main content

A Tight Lower Bound Instance for k-means++ in Constant Dimension

  • Conference paper
Theory and Applications of Models of Computation (TAMC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8402))

Abstract

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows:

Pick the first center randomly from the given points. For i > 1, pick a point to be the i th center with probability proportional to the square of the Euclidean distance of this point to the closest previously (i − 1) chosen centers.

The k-means++ seeding algorithm is not only simple and fast but also gives an O(logk) approximation in expectation as shown by Arthur and Vassilvitskii [7]. There are datasets [7,3] on which this seeding algorithm gives an approximation factor of Ω(logk) in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably high probability (say 1/poly(k)). Brunsch and Röglin [9] gave a dataset where the k-means++ seeding algorithm achieves an O(logk) approximation ratio with probability that is exponentially small in k. However, this and all other known lower-bound examples [7,3] are high dimensional. So, an open problem was to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an O(logk) approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. [13] and by Brunsch and Röglin [9].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackermann, M.R., Blömer, J.: Bregman clustering for separable instances. In: Kaplan, H. (ed.) SWAT 2010. LNCS, vol. 6139, pp. 212–223. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  2. Agarwal, M., Jaiswal, R., Pal, A.: k-means++ under approximation stability. In: Chan, T.-H.H., Lau, L.C., Trevisan, L. (eds.) TAMC 2013. LNCS, vol. 7876, pp. 84–95. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) Approx and Random 2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: NIPS, pp. 10–18 (2009)

    Google Scholar 

  5. Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proceedings of the Twenty-second Annual Symposium on Computational Geometry, SCG 2006, pp. 144–153. ACM, New York (2006)

    Chapter  Google Scholar 

  6. Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 153–164. IEEE Computer Society, Washington, DC (2006)

    Chapter  Google Scholar 

  7. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)

    Google Scholar 

  8. Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)

    Google Scholar 

  9. Brunsch, T., Röglin, H.: A bad instance for k-means++. Theoretical Computer Science (2012)

    Google Scholar 

  10. Dasgupta, S.: The hardness of k-means clustering. Technical report, University of California San Diego

    Google Scholar 

  11. Jaiswal, R., Garg, N.: Analysis of k-means++ for separable data. In: Gupta, A., Jansen, K., Rolim, J., Servedio, R. (eds.) APPROX/RANDOM 2012. LNCS, vol. 7408, pp. 591–602. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Jaiswal, R., Kumar, A., Sen, S.: A simple D 2-sampling based PTAS for k-means and other clustering problems. Algorithmica (2013)

    Google Scholar 

  13. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theoretical Computer Science 442, 13–21 (2012); Special Issue on the Workshop on Algorithms and Computation (WALCOM 2009)

    Google Scholar 

  14. Vattani, A.: The planar k-means problem is NP-hard. Manuscript (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bhattacharya, A., Jaiswal, R., Ailon, N. (2014). A Tight Lower Bound Instance for k-means++ in Constant Dimension. In: Gopal, T.V., Agrawal, M., Li, A., Cooper, S.B. (eds) Theory and Applications of Models of Computation. TAMC 2014. Lecture Notes in Computer Science, vol 8402. Springer, Cham. https://doi.org/10.1007/978-3-319-06089-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06089-7_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06088-0

  • Online ISBN: 978-3-319-06089-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics