A Bad Instance for k-Means++

  • Tobias Brunsch
  • Heiko Röglin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6648)


k-means++ is a seeding technique for the k-means method with an expected approximation ratio of O(logk), where k denotes the number of clusters. Examples are known on which the expected approximation ratio of k-means++ is Ω(logk), showing that the upper bound is asymptotically tight. However, it remained open whether k-means++ yields an O(1)-approximation with probability 1/poly(k) or even with constant probability. We settle this question and present instances on which k-means++ achieves an approximation ratio of (2/3 − ε)·logk only with exponentially small probability.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX 2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  2. 2.
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75(2), 245–248 (2009)CrossRefGoogle Scholar
  3. 3.
    Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proc. of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 405–414 (2009)Google Scholar
  4. 4.
    Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035. SIAM, Philadelphia (2007)Google Scholar
  5. 5.
    Berkhin, P.: Survey of Clustering Data Mining Techniques. Technical report, Accrue Software (2002)Google Scholar
  6. 6.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–136 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proc. of the 25th ACM Symposium on Computational Geometry (SoCG), pp. 324–332. ACM, New York (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tobias Brunsch
    • 1
  • Heiko Röglin
    • 1
  1. 1.Department of Computer ScienceUniversity of BonnGermany

Personalised recommendations